JournalArticle,

SpanBERT: Improving Pre-training by Representing and Predicting Spans

M. Joshi, D. Chen, Y. Liu, D. Weld, L. Zettlemoyer, and O. Levy.
(Jul 24, 2019)
DOI: 10.1162/tacl_a_00300

Abstract

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE.1

BibTeX key: Mandar2019
entry type: JournalArticle
year: 2019
month: 7
day: 24
journal: Transactions of the Association for Computational Linguistics
pages: 64-77
volume: 8
DOI: 10.1162/tacl_a_00300
url: https://www.semanticscholar.org/paper/81f5810fbbab9b7203b9556f4ce3c741875407bc

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@JournalArticle{Mandar2019, abstract = {We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE.1}, added-at = {2024-01-10T16:20:03.000+0100}, author = {Joshi, Mandar and Chen, Danqi and Liu, Yinhan and Weld, Daniel S. and Zettlemoyer, Luke and Levy, Omer}, biburl = {https://www.bibsonomy.org/bibtex/24ef6999cafc1562517208350bd7a7753/chiir_demo}, day = 24, description = {SpanBERT, an enhancement of BERT, focuses on representing and predicting spans of text. This novel approach involves masking contiguous random spans and training span boundary representations. It shows substantial gains in tasks like question answering and coreference resolution, outperforming BERT in many aspects.}, doi = {10.1162/tacl_a_00300}, interhash = {57364a324f93fc65e3cbaf6bbfd52566}, intrahash = {4ef6999cafc1562517208350bd7a7753}, journal = {Transactions of the Association for Computational Linguistics}, keywords = {DeepLearning LanguageModel NLP PreTraining SpanBERT edited_with_chatgpt posted_with_chatgpt sys:related_work:06be18119e083067ac69e8d9c02769f1}, month = {7}, pages = {64-77}, timestamp = {2024-01-10T17:59:30.000+0100}, title = {SpanBERT: Improving Pre-training by Representing and Predicting Spans}, url = {https://www.semanticscholar.org/paper/81f5810fbbab9b7203b9556f4ce3c741875407bc}, volume = 8, year = 2019 }

BibSonomy

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on