copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

TinyBERT: Distilling BERT for Natural Language Understanding

X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu. (Sep 23, 2019)
DOI: 10.18653/v1/2020.findings-emnlp.372

Abstract

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT. Then, we introduce a new two-stage learning framework for TinyBERT, which performs Transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture the general-domain as well as the task-specific knowledge in BERT. TinyBERT4 with 4 layers is empirically effective and achieves more than 96.8% the performance of its teacher BERT-Base on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference. TinyBERT4 is also significantly better than 4-layer state-of-the-art baselines on BERT distillation, with only ~28% parameters and ~31% inference time of them. Moreover, TinyBERT6 with 6 layers performs on-par with its teacher BERT-Base.

Description

This paper introduces TinyBERT, a distilled version of the original BERT model, focusing on natural language understanding. It represents a significant advancement in NLP by providing a more efficient yet effective model.

Links and resources

BibTeX key: Xiaoqi2019
entry type: JournalArticle
year: 2019
month: 9
day: 23
pages: 4163-4174
DOI: 10.18653/v1/2020.findings-emnlp.372
url: https://www.semanticscholar.org/paper/0cbf97173391b0430140117027edcaf1a37968c7

@tomvoelker's tags highlighted

Cite this publication

@JournalArticle{Xiaoqi2019, abstract = {Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT. Then, we introduce a new two-stage learning framework for TinyBERT, which performs Transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture the general-domain as well as the task-specific knowledge in BERT. TinyBERT4 with 4 layers is empirically effective and achieves more than 96.8% the performance of its teacher BERT-Base on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference. TinyBERT4 is also significantly better than 4-layer state-of-the-art baselines on BERT distillation, with only ~28% parameters and ~31% inference time of them. Moreover, TinyBERT6 with 6 layers performs on-par with its teacher BERT-Base.}, added-at = {2024-01-05T23:13:52.000+0100}, author = {Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin and Chen, Xiao and Li, Linlin and Wang, F. and Liu, Qun}, biburl = {https://www.bibsonomy.org/bibtex/29ff3da0fbd99c4322ca71c932d2df118/tomvoelker}, day = 23, description = {This paper introduces TinyBERT, a distilled version of the original BERT model, focusing on natural language understanding. It represents a significant advancement in NLP by providing a more efficient yet effective model.}, doi = {10.18653/v1/2020.findings-emnlp.372}, interhash = {b66bc8077eb34a01a4e40790ac8c0fa3}, intrahash = {9ff3da0fbd99c4322ca71c932d2df118}, keywords = {TinyBERT NLP BERT LanguageModel AI posted_with_chatgpt}, month = {9}, pages = {4163-4174}, timestamp = {2024-01-05T23:13:52.000+0100}, title = {TinyBERT: Distilling BERT for Natural Language Understanding}, url = {https://www.semanticscholar.org/paper/0cbf97173391b0430140117027edcaf1a37968c7}, year = 2019 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

TinyBERT: Distilling BERT for Natural Language Understanding

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML TinyBERT: Distilling BERT for Natural Language Understanding

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

TinyBERT: Distilling BERT for Natural Language Understanding

Comments and Reviews
(0)