BERTnesia: Investigating the capture and forgetting of knowledge in BERT.
J. Wallat, J. Singh, and A. Anand. Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, abs/2010.09313, page 174--183. Online, Association for Computational Linguistics, (November 2020)
DOI: 10.18653/v1/2020.blackboxnlp-1.17
Abstract
Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60\%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer.
Description
BERTnesia: Investigating the capture and forgetting of knowledge in BERT - ACL Anthology
%0 Conference Paper
%1 wallat-etal-2020-bertnesia
%A Wallat, Jonas
%A Singh, Jaspreet
%A Anand, Avishek
%B Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
%C Online
%D 2020
%I Association for Computational Linguistics
%K l3s leibnizailab
%P 174--183
%R 10.18653/v1/2020.blackboxnlp-1.17
%T BERTnesia: Investigating the capture and forgetting of knowledge in BERT.
%U http://dblp.uni-trier.de/db/journals/corr/corr2010.html#abs-2010-09313
%V abs/2010.09313
%X Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60\%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer.
@inproceedings{wallat-etal-2020-bertnesia,
abstract = {Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT{'}s final layers. Intermediate layers contribute a significant amount (17-60{\%}) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer.},
added-at = {2021-07-19T16:56:53.000+0200},
address = {Online},
author = {Wallat, Jonas and Singh, Jaspreet and Anand, Avishek},
biburl = {https://www.bibsonomy.org/bibtex/2a412273717ad70f6d915cdc840ff5e36/sophieschr},
booktitle = {Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP},
description = {BERTnesia: Investigating the capture and forgetting of knowledge in BERT - ACL Anthology},
doi = {10.18653/v1/2020.blackboxnlp-1.17},
interhash = {ffd3cc507b49956f5a570d97f52ef9c1},
intrahash = {a412273717ad70f6d915cdc840ff5e36},
keywords = {l3s leibnizailab},
month = nov,
pages = {174--183},
publisher = {Association for Computational Linguistics},
timestamp = {2021-07-19T16:56:53.000+0200},
title = {BERTnesia: Investigating the capture and forgetting of knowledge in BERT.},
url = {http://dblp.uni-trier.de/db/journals/corr/corr2010.html#abs-2010-09313},
volume = {abs/2010.09313},
year = 2020
}