Named Entity Recognition in Tweets: An Experimental Study
A. Ritter, S. Clark, Mausam, and O. Etzioni. Proceedings of the Conference on Empirical Methods in Natural Language Processing, page 1524--1534. Stroudsburg, PA, USA, Association for Computational Linguistics, (2011)
Abstract
People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F1 score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F1 by 25% over ten common entity types. Our NLP tools are available at: http://github.com/aritter/twitter_nlp
%0 Conference Paper
%1 Ritter:2011:NER:2145432.2145595
%A Ritter, Alan
%A Clark, Sam
%A Mausam,
%A Etzioni, Oren
%B Proceedings of the Conference on Empirical Methods in Natural Language Processing
%C Stroudsburg, PA, USA
%D 2011
%I Association for Computational Linguistics
%K ner phdproposal twitter
%P 1524--1534
%T Named Entity Recognition in Tweets: An Experimental Study
%U http://dl.acm.org/citation.cfm?id=2145432.2145595
%X People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F1 score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F1 by 25% over ten common entity types. Our NLP tools are available at: http://github.com/aritter/twitter_nlp
%@ 978-1-937284-11-4
@inproceedings{Ritter:2011:NER:2145432.2145595,
abstract = {People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F1 score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F1 by 25% over ten common entity types. Our NLP tools are available at: http://github.com/aritter/twitter_nlp},
acmid = {2145595},
added-at = {2015-01-23T10:54:17.000+0100},
address = {Stroudsburg, PA, USA},
author = {Ritter, Alan and Clark, Sam and Mausam and Etzioni, Oren},
biburl = {https://www.bibsonomy.org/bibtex/2bbc287fe4da02036bd2ef971a5d3aaaa/asmelash},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing},
description = {Named entity recognition in tweets},
interhash = {3e3fe3a49491fcec718e78757730d6b7},
intrahash = {bbc287fe4da02036bd2ef971a5d3aaaa},
isbn = {978-1-937284-11-4},
keywords = {ner phdproposal twitter},
location = {Edinburgh, United Kingdom},
numpages = {11},
pages = {1524--1534},
publisher = {Association for Computational Linguistics},
series = {EMNLP '11},
timestamp = {2015-01-23T10:54:17.000+0100},
title = {Named Entity Recognition in Tweets: An Experimental Study},
url = {http://dl.acm.org/citation.cfm?id=2145432.2145595},
year = 2011
}