Inproceedings,

Using Language Modeling for Spam Detection in Social Reference Manager Websites

, and .
(2009)

Abstract

We present an adversarial information retrieval approach to the automatic detection of spam content in social bookmarking websites. Our approach is based on the intuitive notion that similar users and posts use similar language. We detect malicious users on the basis of a similarity function that adopts language modeling at two different levels of granularity: at the level of individual posts, and at an aggregated user level, where all posts of one user are merged into a single profile. We evaluate our approach on two spam-annotated data sets representing snapshots of the social bookmarking websites CiteULike and BibSonomy. We find that our approach achieves promising results across data sets, with AUC scores ranging from 0.92 to 0.96.

Tags

Users

  • @dimitargn

Comments and Reviews