Abstract
Twitter is used for a variety of reasons, including information
dissemination, marketing, political organizing and to spread propaganda,
spamming, promotion, conversations, and so on. Characterizing these activities
and categorizing associated user generated content is a challenging task. We
present a information-theoretic approach to classification of user activity on
Twitter. We focus on tweets that contain embedded URLs and study their
collective `retweeting' dynamics. We identify two features, time-interval and
user entropy, which we use to classify retweeting activity. We achieve good
separation of different activities using just these two features and are able
to categorize content based on the collective user response it generates.
We have identified five distinct categories of retweeting activity on
Twitter: automatic/robotic activity, newsworthy information dissemination,
advertising and promotion, campaigns, and parasitic advertisement. In the
course of our investigations, we have shown how Twitter can be exploited for
promotional and spam-like activities. The content-independent, entropy-based
activity classification method is computationally efficient, scalable and
robust to sampling and missing data. It has many applications, including
automatic spam-detection, trend identification, trust management,
user-modeling, social search and content classification on online social media.
Description
[1106.0346] Entropy-based Classification of 'Retweeting' Activity on Twitter
Links and resources
Tags
community