Abstract
We present an adversarial information retrieval approach to the
automatic detection of spam content in social bookmarking websites. Our approach is based on the intuitive notion that similar
users and posts use similar language. We detect malicious users
on the basis of a similarity function that adopts language modeling at two different levels of granularity: at the level of individual posts, and at an aggregated user level, where all posts of one
user are merged into a single profile. We evaluate our approach
on two spam-annotated data sets representing snapshots of the social bookmarking websites CiteULike and BibSonomy. We find
that our approach achieves promising results across data sets, with
AUC scores ranging from 0.92 to 0.96.
Users
Please
log in to take part in the discussion (add own reviews or comments).