copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

B. van Aken, B. Winter, A. Löser, and F. Gers. (2019)cite arxiv:1909.04925Comment: Accepted at CIKM 2019.
DOI: 10.1145/3357384.3358028

Abstract

Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden states contain equally valuable information. Specifically, our analysis focuses on models fine-tuned on the task of Question Answering (QA) as an example of a complex downstream task. We inspect how QA models transform token vectors in order to find the correct answer. To this end, we apply a set of general and QA-specific probing tasks that reveal the information stored in each representation layer. Our qualitative analysis of hidden state visualizations provides additional insights into BERT's reasoning process. Our results show that the transformations within BERT go through phases that are related to traditional pipeline tasks. The system can therefore implicitly incorporate task-specific information into its token representations. Furthermore, our analysis reveals that fine-tuning has little impact on the models' semantic abilities and that prediction errors can be recognized in the vector representations of even early layers.

Description

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Links and resources

BibTeX key: vanaken2019answer
entry type: misc
year: 2019
DOI: 10.1145/3357384.3358028
url: http://arxiv.org/abs/1909.04925
note: cite arxiv:1909.04925Comment: Accepted at CIKM 2019

@albinzehe's tags highlighted

Cite this publication

@misc{vanaken2019answer, abstract = {Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden states contain equally valuable information. Specifically, our analysis focuses on models fine-tuned on the task of Question Answering (QA) as an example of a complex downstream task. We inspect how QA models transform token vectors in order to find the correct answer. To this end, we apply a set of general and QA-specific probing tasks that reveal the information stored in each representation layer. Our qualitative analysis of hidden state visualizations provides additional insights into BERT's reasoning process. Our results show that the transformations within BERT go through phases that are related to traditional pipeline tasks. The system can therefore implicitly incorporate task-specific information into its token representations. Furthermore, our analysis reveals that fine-tuning has little impact on the models' semantic abilities and that prediction errors can be recognized in the vector representations of even early layers.}, added-at = {2020-02-12T19:47:42.000+0100}, author = {van Aken, Betty and Winter, Benjamin and Löser, Alexander and Gers, Felix A.}, biburl = {https://www.bibsonomy.org/bibtex/269df150134bbaf291dcf99af82d43f99/albinzehe}, description = {How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations}, doi = {10.1145/3357384.3358028}, interhash = {7cfa7bb6d7d86536345b6b38012bb8b9}, intrahash = {69df150134bbaf291dcf99af82d43f99}, keywords = {analysis bert dmir-readinggroup nlp}, note = {cite arxiv:1909.04925Comment: Accepted at CIKM 2019}, timestamp = {2020-02-12T19:47:42.000+0100}, title = {How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations}, url = {http://arxiv.org/abs/1909.04925}, year = 2019 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Comments and Reviews
(0)