@schwemmlein

Named entity recognition in tweets: an experimental study

, , , and . Proceedings of the Conference on Empirical Methods in Natural Language Processing, page 1524--1534. Stroudsburg, PA, USA, Association for Computational Linguistics, (2011)

Abstract

People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F<sub>1</sub> score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F<sub>1</sub> by 25% over ten common entity types.</p> <p>Our NLP tools are available at: http://github.com/aritter/twitter_nlp

Description

Named entity recognition in tweets

Links and resources

Tags

community

  • @schwemmlein
  • @asmelash
  • @antoine-tran
  • @dblp
@schwemmlein's tags highlighted