Named entity recognition in tweets: an experimental study
A. Ritter, S. Clark, Mausam, and O. Etzioni. Proceedings of the Conference on Empirical Methods in Natural Language Processing, page 1524--1534. Stroudsburg, PA, USA, Association for Computational Linguistics, (2011)
Abstract
People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F<sub>1</sub> score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F<sub>1</sub> by 25% over ten common entity types.</p> <p>Our NLP tools are available at: http://github.com/aritter/twitter_nlp
%0 Conference Paper
%1 Ritter:2011:NER:2145432.2145595
%A Ritter, Alan
%A Clark, Sam
%A Mausam,
%A Etzioni, Oren
%B Proceedings of the Conference on Empirical Methods in Natural Language Processing
%C Stroudsburg, PA, USA
%D 2011
%I Association for Computational Linguistics
%K l3s_twitter
%P 1524--1534
%T Named entity recognition in tweets: an experimental study
%U http://dl.acm.org/citation.cfm?id=2145432.2145595
%X People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F<sub>1</sub> score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F<sub>1</sub> by 25% over ten common entity types.</p> <p>Our NLP tools are available at: http://github.com/aritter/twitter_nlp
%@ 978-1-937284-11-4
@inproceedings{Ritter:2011:NER:2145432.2145595,
abstract = {People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F<sub>1</sub> score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F<sub>1</sub> by 25% over ten common entity types.</p> <p>Our NLP tools are available at: http://github.com/aritter/twitter_nlp},
acmid = {2145595},
added-at = {2013-08-27T05:42:48.000+0200},
address = {Stroudsburg, PA, USA},
author = {Ritter, Alan and Clark, Sam and Mausam and Etzioni, Oren},
biburl = {https://www.bibsonomy.org/bibtex/2a231967a78f178402ad13171fc3672d2/antoine-tran},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing},
description = {Named entity recognition in tweets},
interhash = {3e3fe3a49491fcec718e78757730d6b7},
intrahash = {a231967a78f178402ad13171fc3672d2},
isbn = {978-1-937284-11-4},
keywords = {l3s_twitter},
location = {Edinburgh, United Kingdom},
numpages = {11},
pages = {1524--1534},
publisher = {Association for Computational Linguistics},
series = {EMNLP '11},
timestamp = {2013-08-27T05:42:48.000+0200},
title = {Named entity recognition in tweets: an experimental study},
url = {http://dl.acm.org/citation.cfm?id=2145432.2145595},
year = 2011
}