copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Learning multilingual named entity recognition from Wikipedia

J. Nothman, N. Ringland, W. Radford, T. Murphy, and J. Curran. Artificial Intelligence, 194 (0): 151 - 175 (2013)Artificial Intelligence, Wikipedia and Semi-Structured Resources.
DOI: 10.1016/j.artint.2012.03.006

Abstract

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify and classify names of people, locations and organisations in text. This dependence on expensive annotation is the knowledge bottleneck our work overcomes. We first classify each Wikipedia article into named entity (ne) types, training and evaluating on 7200 manually-labelled Wikipedia articles across nine languages. Our cross-lingual approach achieves up to 95% accuracy. We transform the links between articles into ne annotations by projecting the target articleʼs classifications onto the anchor text. This approach yields reasonable annotations, but does not immediately compete with existing gold-standard data. By inferring additional links and heuristically tweaking the Wikipedia corpora, we better align our automatic annotations to gold standards. We annotate millions of words in nine languages, evaluating English, German, Spanish, Dutch and Russian Wikipedia-trained models against conll shared task data and other gold-standard corpora. Our approach outperforms other approaches to automatic ne annotation (Richman and Schone, 2008 61, Mika et al., 2008 46) competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text.

Description

ScienceDirect.com - Artificial Intelligence - Learning multilingual named entity recognition from Wikipedia

Links and resources

BibTeX key: nothman2013learning
entry type: article
year: 2013
journal: Artificial Intelligence
number: 0
pages: 151 - 175
volume: 194
issn: 0004-3702
DOI: 10.1016/j.artint.2012.03.006
url: http://www.sciencedirect.com/science/article/pii/S0004370212000276
note: Artificial Intelligence, Wikipedia and Semi-Structured Resources

@folke's tags highlighted

Cite this publication

@article{nothman2013learning, abstract = {We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify and classify names of people, locations and organisations in text. This dependence on expensive annotation is the knowledge bottleneck our work overcomes. We first classify each Wikipedia article into named entity (ne) types, training and evaluating on 7200 manually-labelled Wikipedia articles across nine languages. Our cross-lingual approach achieves up to 95% accuracy. We transform the links between articles into ne annotations by projecting the target articleʼs classifications onto the anchor text. This approach yields reasonable annotations, but does not immediately compete with existing gold-standard data. By inferring additional links and heuristically tweaking the Wikipedia corpora, we better align our automatic annotations to gold standards. We annotate millions of words in nine languages, evaluating English, German, Spanish, Dutch and Russian Wikipedia-trained models against conll shared task data and other gold-standard corpora. Our approach outperforms other approaches to automatic ne annotation (Richman and Schone, 2008 [61], Mika et al., 2008 [46]) competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text.}, added-at = {2013-02-25T23:14:33.000+0100}, author = {Nothman, Joel and Ringland, Nicky and Radford, Will and Murphy, Tara and Curran, James R.}, biburl = {https://www.bibsonomy.org/bibtex/29947c3fc9996ea608aaf806197832e5e/folke}, description = {ScienceDirect.com - Artificial Intelligence - Learning multilingual named entity recognition from Wikipedia}, doi = {10.1016/j.artint.2012.03.006}, interhash = {025dedfdf81c5a3052b40d94cb35606b}, intrahash = {9947c3fc9996ea608aaf806197832e5e}, issn = {0004-3702}, journal = {Artificial Intelligence}, keywords = {entity named ner onomastics wikipedia}, note = {Artificial Intelligence, Wikipedia and Semi-Structured Resources}, number = 0, pages = {151 - 175}, timestamp = {2013-02-25T23:14:33.000+0100}, title = {Learning multilingual named entity recognition from Wikipedia}, url = {http://www.sciencedirect.com/science/article/pii/S0004370212000276}, volume = 194, year = 2013 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Learning multilingual named entity recognition from Wikipedia

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Learning multilingual named entity recognition from Wikipedia

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Learning multilingual named entity recognition from Wikipedia

Comments and Reviews
(0)