@yulanhe

Semantic Smoothing for Twitter Sentiment Analysis

, , and . The 10th International Semantic Web Conference (ISWC), Bonn, Germany, (2011)

Abstract

Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classi ers from tweets data often faces the data sparsity problem partly due to the large variety of short forms introduced to tweets because of the 140-character limit. In this work we propose using semantic smoothing to alleviate the data sparseness problem. Our approach extracts semantically hidden concepts from the training documents and then incorporates these concepts as additional features for classi er training. We tested our approach using two di erent methods. One is shallow semantic smoothing where words are replaced with their corresponding semantic concepts; another is to interpolate the original unigram language model in the Naive Bayes (NB) classi er with the generative model of words given semantic concepts. Preliminary results show that with shallow semantic smoothing the vocabulary size has been reduced by 20%. Moreover, the interpolation method improves upon shallow semantic smoothing by over 5% in sentiment classi cation and slightly outperforms NB trained on unigrams only without semantic smoothing.

Links and resources

Tags

community