Using Language Modeling for Spam Detection in Social Reference Manager Websites

Abstract

We present an adversarial information retrieval approach to the automatic detection of spam content in social bookmarking websites. Our approach is based on the intuitive notion that similar users and posts use similar language. We detect malicious users on the basis of a similarity function that adopts language modeling at two different levels of granularity: at the level of individual posts, and at an aggregated user level, where all posts of one user are merged into a single proﬁle. We evaluate our approach on two spam-annotated data sets representing snapshots of the social bookmarking websites CiteULike and BibSonomy. We ﬁnd that our approach achieves promising results across data sets, with AUC scores ranging from 0.92 to 0.96.

BibTeX key: spam-language-models-bogers
entry type: inproceedings
year: 2009
posted-at: 2011-09-09 19:08:21
priority: 3
citeulike-article-id: 9755429

BibSonomy

Using Language Modeling for Spam Detection in Social Reference Manager Websites

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on