The Anti-Social Tagger - Detecting Spam in Social Bookmarking Systems
B. Krause, A. Hotho, and G. Stumme. Proc. of the Fourth International Workshop on Adversarial Information Retrieval on the Web, (2008)
Abstract
The annotation of web sites in social bookmarking systems
has become a popular way to manage and find information
on the web. The community structure of such systems attracts
spammers: recent post pages, popular pages or specific
tag pages can be manipulated easily. As a result, searching
or tracking recent posts does not deliver quality results
annotated in the community, but rather unsolicited, often
commercial, web sites. To retain the benefits of sharing
one’s web content, spam-fighting mechanisms that can face
the flexible strategies of spammers need to be developed.
A classical approach in machine learning is to determine
relevant features that describe the system’s users, train different
classifiers with the selected features and choose the
one with the most promising evaluation results. In this paper
we will transfer this approach to a social bookmarking
setting to identify spammers. We will present features considering
the topological, semantic and profile-based information
which people make public when using the system.
The dataset used is a snapshot of the social bookmarking
system BibSonomy and was built over the course of several
months when cleaning the system from spam. Based
on our features, we will learn a large set of different classification
models and compare their performance. Our results
represent the groundwork for a first application in BibSonomy
and for the building of more elaborate spam detection
mechanisms.
%0 Conference Paper
%1 paper:krause:2008
%A Krause, Beate
%A Hotho, Andreas
%A Stumme, Gerd
%B Proc. of the Fourth International Workshop on Adversarial Information Retrieval on the Web
%D 2008
%K 2008 bibsonomy bookmarking machine-learning social spam tags
%T The Anti-Social Tagger - Detecting Spam in Social Bookmarking Systems
%U http://airweb.cse.lehigh.edu/2008/submissions/krause_2008_anti_social_tagger.pdf
%X The annotation of web sites in social bookmarking systems
has become a popular way to manage and find information
on the web. The community structure of such systems attracts
spammers: recent post pages, popular pages or specific
tag pages can be manipulated easily. As a result, searching
or tracking recent posts does not deliver quality results
annotated in the community, but rather unsolicited, often
commercial, web sites. To retain the benefits of sharing
one’s web content, spam-fighting mechanisms that can face
the flexible strategies of spammers need to be developed.
A classical approach in machine learning is to determine
relevant features that describe the system’s users, train different
classifiers with the selected features and choose the
one with the most promising evaluation results. In this paper
we will transfer this approach to a social bookmarking
setting to identify spammers. We will present features considering
the topological, semantic and profile-based information
which people make public when using the system.
The dataset used is a snapshot of the social bookmarking
system BibSonomy and was built over the course of several
months when cleaning the system from spam. Based
on our features, we will learn a large set of different classification
models and compare their performance. Our results
represent the groundwork for a first application in BibSonomy
and for the building of more elaborate spam detection
mechanisms.
@inproceedings{paper:krause:2008,
abstract = {The annotation of web sites in social bookmarking systems
has become a popular way to manage and find information
on the web. The community structure of such systems attracts
spammers: recent post pages, popular pages or specific
tag pages can be manipulated easily. As a result, searching
or tracking recent posts does not deliver quality results
annotated in the community, but rather unsolicited, often
commercial, web sites. To retain the benefits of sharing
one’s web content, spam-fighting mechanisms that can face
the flexible strategies of spammers need to be developed.
A classical approach in machine learning is to determine
relevant features that describe the system’s users, train different
classifiers with the selected features and choose the
one with the most promising evaluation results. In this paper
we will transfer this approach to a social bookmarking
setting to identify spammers. We will present features considering
the topological, semantic and profile-based information
which people make public when using the system.
The dataset used is a snapshot of the social bookmarking
system BibSonomy and was built over the course of several
months when cleaning the system from spam. Based
on our features, we will learn a large set of different classification
models and compare their performance. Our results
represent the groundwork for a first application in BibSonomy
and for the building of more elaborate spam detection
mechanisms.},
added-at = {2008-09-08T14:11:25.000+0200},
author = {Krause, Beate and Hotho, Andreas and Stumme, Gerd},
biburl = {https://www.bibsonomy.org/bibtex/203d349d70b578ca9ac3155f661151868/mschuber},
booktitle = {Proc. of the Fourth International Workshop on Adversarial Information Retrieval on the Web},
interhash = {158c905dd077d269c0a65c2b39f63f25},
intrahash = {03d349d70b578ca9ac3155f661151868},
keywords = {2008 bibsonomy bookmarking machine-learning social spam tags},
timestamp = {2008-09-09T12:26:14.000+0200},
title = {The Anti-Social Tagger - Detecting Spam in Social Bookmarking Systems},
url = {http://airweb.cse.lehigh.edu/2008/submissions/krause_2008_anti_social_tagger.pdf},
year = 2008
}