F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Proceedings of the Seventh Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), (July 2010)
Abstract
With millions of users tweeting around the world, real
time search systems and different types of mining tools are
emerging to allow people tracking the repercussion of events
and news on Twitter. However, although appealing as mechanisms to ease the spread of news and allow users to discuss
events and post their status, these services open opportunities for new forms of spam. Trending topics, the most
talked about items on Twitter at a given point in time, have
been seen as an opportunity to generate traffic and revenue.
Spammers post tweets containing typical words of a trending topic and URLs, usually obfuscated by URL shorteners,
that lead users to completely unrelated websites. This kind
of spam can contribute to de-value real time search services
unless mechanisms to fight and stop spammers can be found.
In this paper we consider the problem of detecting spammers on Twitter. We first collected a large dataset of Twitter that includes more than 54 million users, 1.9 billion links,
and almost 1.8 billion tweets. Using tweets related to three
famous trending topics from 2009, we construct a large labeled collection of users, manually classified into spammers
and non-spammers. We then identify a number of characteristics related to tweet content and user social behavior,
which could potentially be used to detect spammers. We
used these characteristics as attributes of machine learning process for classifying users as either spammers or nonspammers. Our strategy succeeds at detecting much of the
spammers while only a small percentage of non-spammers
are misclassified. Approximately 70\% of spammers and 96\%
of non-spammers were correctly classified. Our results also
highlight the most important attributes for spam detection
on Twitter.
%0 Conference Paper
%1 twitter-spam-benevenuto
%A Benevenuto, Fabricio
%A Magno, Gabriel
%A Rodrigues, Tiago
%A Almeida, Virgilio
%B Proceedings of the Seventh Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS)
%D 2010
%K 2011 selected seminar spam twitter winter
%T Detecting Spammers on Twitter
%X With millions of users tweeting around the world, real
time search systems and different types of mining tools are
emerging to allow people tracking the repercussion of events
and news on Twitter. However, although appealing as mechanisms to ease the spread of news and allow users to discuss
events and post their status, these services open opportunities for new forms of spam. Trending topics, the most
talked about items on Twitter at a given point in time, have
been seen as an opportunity to generate traffic and revenue.
Spammers post tweets containing typical words of a trending topic and URLs, usually obfuscated by URL shorteners,
that lead users to completely unrelated websites. This kind
of spam can contribute to de-value real time search services
unless mechanisms to fight and stop spammers can be found.
In this paper we consider the problem of detecting spammers on Twitter. We first collected a large dataset of Twitter that includes more than 54 million users, 1.9 billion links,
and almost 1.8 billion tweets. Using tweets related to three
famous trending topics from 2009, we construct a large labeled collection of users, manually classified into spammers
and non-spammers. We then identify a number of characteristics related to tweet content and user social behavior,
which could potentially be used to detect spammers. We
used these characteristics as attributes of machine learning process for classifying users as either spammers or nonspammers. Our strategy succeeds at detecting much of the
spammers while only a small percentage of non-spammers
are misclassified. Approximately 70\% of spammers and 96\%
of non-spammers were correctly classified. Our results also
highlight the most important attributes for spam detection
on Twitter.
@inproceedings{twitter-spam-benevenuto,
abstract = {{With millions of users tweeting around the world, real
time search systems and different types of mining tools are
emerging to allow people tracking the repercussion of events
and news on Twitter. However, although appealing as mechanisms to ease the spread of news and allow users to discuss
events and post their status, these services open opportunities for new forms of spam. Trending topics, the most
talked about items on Twitter at a given point in time, have
been seen as an opportunity to generate traffic and revenue.
Spammers post tweets containing typical words of a trending topic and URLs, usually obfuscated by URL shorteners,
that lead users to completely unrelated websites. This kind
of spam can contribute to de-value real time search services
unless mechanisms to fight and stop spammers can be found.
In this paper we consider the problem of detecting spammers on Twitter. We first collected a large dataset of Twitter that includes more than 54 million users, 1.9 billion links,
and almost 1.8 billion tweets. Using tweets related to three
famous trending topics from 2009, we construct a large labeled collection of users, manually classified into spammers
and non-spammers. We then identify a number of characteristics related to tweet content and user social behavior,
which could potentially be used to detect spammers. We
used these characteristics as attributes of machine learning process for classifying users as either spammers or nonspammers. Our strategy succeeds at detecting much of the
spammers while only a small percentage of non-spammers
are misclassified. Approximately 70\% of spammers and 96\%
of non-spammers were correctly classified. Our results also
highlight the most important attributes for spam detection
on Twitter.}},
added-at = {2011-10-10T09:19:57.000+0200},
author = {Benevenuto, Fabricio and Magno, Gabriel and Rodrigues, Tiago and Almeida, Virgilio},
biburl = {https://www.bibsonomy.org/bibtex/2fcdff2de72e8741f79e6a656cae70b87/becker},
booktitle = {Proceedings of the Seventh Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS)},
citeulike-article-id = {8510242},
citeulike-linkout-0 = {http://ceas.cc/2010/papers/Paper\%2021.pdf},
interhash = {3e6fbc5dbb1cf49e362670d5e7256800},
intrahash = {fcdff2de72e8741f79e6a656cae70b87},
keywords = {2011 selected seminar spam twitter winter},
location = {Washington, DC, USA},
month = jul,
posted-at = {2011-09-09 18:58:57},
priority = {3},
timestamp = {2011-10-26T09:17:13.000+0200},
title = {Detecting Spammers on {Twitter}},
year = 2010
}