H. Saif, Y. He, and H. Alani. The 10th International Semantic Web Conference (ISWC), Bonn, Germany, (2011)
Abstract
Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classiers from tweets data often faces the data sparsity problem partly due to the large variety of short forms introduced to tweets because of the 140-character limit. In this work we propose using semantic smoothing to alleviate the data sparseness problem. Our approach extracts semantically hidden concepts from the training documents and then incorporates these concepts as additional features for classier training. We tested our approach using two dierent methods. One is shallow semantic smoothing where words are replaced with their corresponding semantic concepts; another is to interpolate the original unigram language model in the Naive Bayes (NB) classier with the generative model of words given semantic concepts. Preliminary results show that with shallow semantic smoothing the vocabulary size has been reduced by 20%. Moreover, the interpolation method improves upon shallow semantic smoothing by over 5% in sentiment classication and slightly outperforms NB trained on unigrams only without semantic smoothing.
%0 Conference Paper
%1 iswcTwitter2011
%A Saif, Hassan
%A He, Yulan
%A Alani, Harith
%B The 10th International Semantic Web Conference (ISWC)
%C Bonn, Germany
%D 2011
%K myown robust-project semantic twitter
%T Semantic Smoothing for Twitter Sentiment Analysis
%U http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/PostersDemos/iswc11pd_submission_55.pdf
%X Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classiers from tweets data often faces the data sparsity problem partly due to the large variety of short forms introduced to tweets because of the 140-character limit. In this work we propose using semantic smoothing to alleviate the data sparseness problem. Our approach extracts semantically hidden concepts from the training documents and then incorporates these concepts as additional features for classier training. We tested our approach using two dierent methods. One is shallow semantic smoothing where words are replaced with their corresponding semantic concepts; another is to interpolate the original unigram language model in the Naive Bayes (NB) classier with the generative model of words given semantic concepts. Preliminary results show that with shallow semantic smoothing the vocabulary size has been reduced by 20%. Moreover, the interpolation method improves upon shallow semantic smoothing by over 5% in sentiment classication and slightly outperforms NB trained on unigrams only without semantic smoothing.
@inproceedings{iswcTwitter2011,
abstract = {Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classiers from tweets data often faces the data sparsity problem partly due to the large variety of short forms introduced to tweets because of the 140-character limit. In this work we propose using semantic smoothing to alleviate the data sparseness problem. Our approach extracts semantically hidden concepts from the training documents and then incorporates these concepts as additional features for classier training. We tested our approach using two dierent methods. One is shallow semantic smoothing where words are replaced with their corresponding semantic concepts; another is to interpolate the original unigram language model in the Naive Bayes (NB) classier with the generative model of words given semantic concepts. Preliminary results show that with shallow semantic smoothing the vocabulary size has been reduced by 20%. Moreover, the interpolation method improves upon shallow semantic smoothing by over 5% in sentiment classication and slightly outperforms NB trained on unigrams only without semantic smoothing.},
added-at = {2011-09-23T13:02:25.000+0200},
address = {Bonn, Germany},
author = {Saif, Hassan and He, Yulan and Alani, Harith},
biburl = {https://www.bibsonomy.org/bibtex/2073f558b682ff264de2af731da8a3a3a/yulanhe},
booktitle = {The 10th International Semantic Web Conference (ISWC)},
interhash = {4cb80cbd6980b69e6b85b403cd548b91},
intrahash = {073f558b682ff264de2af731da8a3a3a},
keywords = {myown robust-project semantic twitter},
timestamp = {2012-11-09T12:00:36.000+0100},
title = {Semantic Smoothing for Twitter Sentiment Analysis},
url = {http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/PostersDemos/iswc11pd_submission_55.pdf},
year = 2011
}