@antoine-tran

TwiNER: named entity recognition in targeted twitter stream

, , , , , , and . Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, page 721--730. New York, NY, USA, ACM, (2012)
DOI: 10.1145/2348283.2348380

Abstract

Many private and/or public organizations have been reported to create and monitor targeted <i>Twitter</i> streams to collect and understand users' opinions about the organizations. Targeted <i>Twitter</i> stream is usually constructed by filtering tweets with user-defined selection criteria e.g. tweets published by users from a selected region, or tweets that match one or more predefined keywords. Targeted <i>Twitter</i> stream is then monitored to collect and understand users' opinions about the organizations. There is an emerging need for early crisis detection and response with such target stream. Such applications require a good named entity recognition (NER) system for <i>Twitter</i>, which is able to automatically discover emerging named entities that is potentially linked to the crisis. In this paper, we present a novel 2-step unsupervised NER system for targeted <i>Twitter</i> stream, called TwiNER. In the first step, it leverages on the <i>global context</i> obtained from Wikipedia and Web N-Gram corpus to partition tweets into valid segments (phrases) using a dynamic programming algorithm. Each such tweet segment is a candidate named entity. It is observed that the named entities in the targeted stream usually exhibit a <i>gregarious</i> property, due to the way the targeted stream is constructed. In the second step, TwiNER constructs a random walk model to exploit the <i>gregarious</i> property in the <i>local context</i> derived from the <i>Twitter</i> stream. The highly-ranked segments have a higher chance of being true named entities. We evaluated TwiNER on two sets of real-life tweets simulating two targeted streams. Evaluated using labeled ground truth, TwiNER achieves comparable performance as with conventional approaches in both streams. Various settings of TwiNER have also been examined to verify our <i>global context + local context</i> combo idea.

Links and resources

Tags

community

  • @asmelash
  • @antoine-tran
  • @dblp
@antoine-tran's tags highlighted