Article,

Cross-lingual Named Entity Recognition

R. Steinberger, and B. Pouliquen.
Lingvisticae Investigationes, 30 (1): 135-162 (2007)
DOI: 10.1075/li.30.1.09ste

Abstract

Named Entity Recognition and Classification (NERC) is a known and well-explored text analysis application that has been applied to various languages. We are presenting an automatic, highly multilingual news analysis system that fully integrates NERC for locations, persons and organisations with document clustering, multi-label categorisation, name attribute extraction, name variant merging and the calculation of social networks. The proposed application goes beyond the state-of-the-art by automatically merging the information found in news written in ten different languages, and by using the aggregated name information to automatically link related news documents across languages for all 45 language pair combinations. While state-of-the-art approaches for cross-lingual name variant merging and document similarity calculation require bilingual resources, the methods proposed here are mostly language-independent and require a minimal amount of monolingual language-specific effort. The development of resources for additional languages is therefore kept to a minimum and new languages can be plugged into the system effortlessly. The presented online news analysis application is fully functional and has, at the end of the year 2006, reached average usage statistics of 600,000 hits per day.

BibTeX key: steinberger2007crosslingual
entry type: article
year: 2007
journal: Lingvisticae Investigationes
number: 1
pages: 135-162
volume: 30
DOI: 10.1075/li.30.1.09ste
url: http://www.ingentaconnect.com/content/jbp/li/2007/00000030/00000001/art00008

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{steinberger2007crosslingual, abstract = {Named Entity Recognition and Classification (NERC) is a known and well-explored text analysis application that has been applied to various languages. We are presenting an automatic, highly multilingual news analysis system that fully integrates NERC for locations, persons and organisations with document clustering, multi-label categorisation, name attribute extraction, name variant merging and the calculation of social networks. The proposed application goes beyond the state-of-the-art by automatically merging the information found in news written in ten different languages, and by using the aggregated name information to automatically link related news documents across languages for all 45 language pair combinations. While state-of-the-art approaches for cross-lingual name variant merging and document similarity calculation require bilingual resources, the methods proposed here are mostly language-independent and require a minimal amount of monolingual language-specific effort. The development of resources for additional languages is therefore kept to a minimum and new languages can be plugged into the system effortlessly. The presented online news analysis application is fully functional and has, at the end of the year 2006, reached average usage statistics of 600,000 hits per day.}, added-at = {2013-02-26T09:55:22.000+0100}, author = {Steinberger, Ralf and Pouliquen, Bruno}, biburl = {https://www.bibsonomy.org/bibtex/2827c11a56b4a0e43c43d202757df945b/folke}, description = {ingentaconnect Cross-lingual Named Entity Recognition}, doi = {10.1075/li.30.1.09ste}, interhash = {7f91316b520e025f189ddcfde542c21f}, intrahash = {827c11a56b4a0e43c43d202757df945b}, journal = {Lingvisticae Investigationes}, keywords = {entity name named onomastics recognition variants}, number = 1, pages = {135-162}, timestamp = {2013-02-26T09:55:22.000+0100}, title = {Cross-lingual Named Entity Recognition}, url = {http://www.ingentaconnect.com/content/jbp/li/2007/00000030/00000001/art00008}, volume = 30, year = 2007 }

BibSonomy

Cross-lingual Named Entity Recognition

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on