copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Detecting Spammers on Twitter

F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Proceedings of the Seventh Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), (July 2010)

Abstract

With millions of users tweeting around the world, real time search systems and diﬀerent types of mining tools are emerging to allow people tracking the repercussion of events and news on Twitter. However, although appealing as mechanisms to ease the spread of news and allow users to discuss events and post their status, these services open opportunities for new forms of spam. Trending topics, the most talked about items on Twitter at a given point in time, have been seen as an opportunity to generate traﬃc and revenue. Spammers post tweets containing typical words of a trending topic and URLs, usually obfuscated by URL shorteners, that lead users to completely unrelated websites. This kind of spam can contribute to de-value real time search services unless mechanisms to ﬁght and stop spammers can be found. In this paper we consider the problem of detecting spammers on Twitter. We ﬁrst collected a large dataset of Twitter that includes more than 54 million users, 1.9 billion links, and almost 1.8 billion tweets. Using tweets related to three famous trending topics from 2009, we construct a large labeled collection of users, manually classiﬁed into spammers and non-spammers. We then identify a number of characteristics related to tweet content and user social behavior, which could potentially be used to detect spammers. We used these characteristics as attributes of machine learning process for classifying users as either spammers or nonspammers. Our strategy succeeds at detecting much of the spammers while only a small percentage of non-spammers are misclassiﬁed. Approximately 70\% of spammers and 96\% of non-spammers were correctly classiﬁed. Our results also highlight the most important attributes for spam detection on Twitter.

Links and resources

BibTeX key: twitter-spam-benevenuto
entry type: inproceedings
booktitle: Proceedings of the Seventh Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS)
year: 2010
month: jul
posted-at: 2011-09-09 18:58:57
location: Washington, DC, USA
priority: 3
citeulike-article-id: 8510242
citeulike-linkout-0: http://ceas.cc/2010/papers/Paper\%2021.pdf

@becker's tags highlighted

Cite this publication

%0 Conference Paper %1 twitter-spam-benevenuto %A Benevenuto, Fabricio %A Magno, Gabriel %A Rodrigues, Tiago %A Almeida, Virgilio %B Proceedings of the Seventh Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) %D 2010 %K 2011 selected seminar spam twitter winter %T Detecting Spammers on Twitter %X With millions of users tweeting around the world, real time search systems and diﬀerent types of mining tools are emerging to allow people tracking the repercussion of events and news on Twitter. However, although appealing as mechanisms to ease the spread of news and allow users to discuss events and post their status, these services open opportunities for new forms of spam. Trending topics, the most talked about items on Twitter at a given point in time, have been seen as an opportunity to generate traﬃc and revenue. Spammers post tweets containing typical words of a trending topic and URLs, usually obfuscated by URL shorteners, that lead users to completely unrelated websites. This kind of spam can contribute to de-value real time search services unless mechanisms to ﬁght and stop spammers can be found. In this paper we consider the problem of detecting spammers on Twitter. We ﬁrst collected a large dataset of Twitter that includes more than 54 million users, 1.9 billion links, and almost 1.8 billion tweets. Using tweets related to three famous trending topics from 2009, we construct a large labeled collection of users, manually classiﬁed into spammers and non-spammers. We then identify a number of characteristics related to tweet content and user social behavior, which could potentially be used to detect spammers. We used these characteristics as attributes of machine learning process for classifying users as either spammers or nonspammers. Our strategy succeeds at detecting much of the spammers while only a small percentage of non-spammers are misclassiﬁed. Approximately 70\% of spammers and 96\% of non-spammers were correctly classiﬁed. Our results also highlight the most important attributes for spam detection on Twitter.

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Detecting Spammers on Twitter

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Detecting Spammers on Twitter

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Detecting Spammers on Twitter

Comments and Reviews
(0)