Abstract
The rapidly growing social network Twitter has been infiltrated by
large amount of spam. In this paper, a spam
detection prototype system is proposed to identify suspicious users
on Twitter. A directed social graph model
is proposed to explore the “follower” and “friend” relationships among
Twitter. Based on Twitter’s spam
policy, novel content-based features and graph-based features are
also proposed to facilitate spam detection.
A Web crawler is developed relying on API methods provided by Twitter.
Around 25K users, 500K tweets,
and 49M follower/friend relationships in total are collected from
public available data on Twitter. Bayesian
classification algorithm is applied to distinguish the suspicious behaviors
from normal ones. I analyze the data
set and evaluate the performance of the detection system. Classic
evaluation metrics are used to compare the
performance of various traditional classification methods. Experiment
results show that the Bayesian classifier
has the best overall performance in term of F-measure. The trained
classifier is also applied to the entire data
set. The result shows that the spam detection system can achieve 89%
precision.
Users
Please
log in to take part in the discussion (add own reviews or comments).