Hyperincident Connected Components of Tagging Networks
N. Neubauer, and K. Obermayer. HT '09: Proceedings of the Twentieth ACM Conference on Hypertext and Hypermedia, New York, NY, USA, ACM, (July 2009)
Abstract
Data created by social bookmarking systems can be described as
3-partite 3-uniform hypergraphs connecting documents, users, and
tags (tagging networks),
such that the toolbox of complex network analysis can be applied to
examine their properties. One of the most basic tools, the
analysis of connected components, however cannot be applied
meaningfully: Tagging networks
tend to be almost entirely connected. We therefore propose a
generalization of connected components, m-hyperincident
connected components.
We show that decomposing tagging networks into 2-hyperincident
connected components yields a characteristic component
distribution with a salient giant component that can be found
across various datasets.
This pattern changes if the underlying formation process
changes, for example, if the hypergraph is constructed from
search logs, or if the tagging data is contaminated by spam: It
turns out that the second- to 129th largest components of the
spam-labeled Bibsonomy dataset are inhabited exclusively by spam
users. Based on these findings, we propose and unsupervised
method for spam detection.
%0 Conference Paper
%1 neubauer2009hyperincident
%A Neubauer, Nicolas
%A Obermayer, Klaus
%B HT '09: Proceedings of the Twentieth ACM Conference on Hypertext and Hypermedia
%C New York, NY, USA
%D 2009
%I ACM
%K analysis ht09 network spam tagging
%T Hyperincident Connected Components of Tagging Networks
%X Data created by social bookmarking systems can be described as
3-partite 3-uniform hypergraphs connecting documents, users, and
tags (tagging networks),
such that the toolbox of complex network analysis can be applied to
examine their properties. One of the most basic tools, the
analysis of connected components, however cannot be applied
meaningfully: Tagging networks
tend to be almost entirely connected. We therefore propose a
generalization of connected components, m-hyperincident
connected components.
We show that decomposing tagging networks into 2-hyperincident
connected components yields a characteristic component
distribution with a salient giant component that can be found
across various datasets.
This pattern changes if the underlying formation process
changes, for example, if the hypergraph is constructed from
search logs, or if the tagging data is contaminated by spam: It
turns out that the second- to 129th largest components of the
spam-labeled Bibsonomy dataset are inhabited exclusively by spam
users. Based on these findings, we propose and unsupervised
method for spam detection.
@inproceedings{neubauer2009hyperincident,
abstract = {Data created by social bookmarking systems can be described as
3-partite 3-uniform hypergraphs connecting documents, users, and
tags (tagging networks),
such that the toolbox of complex network analysis can be applied to
examine their properties. One of the most basic tools, the
analysis of connected components, however cannot be applied
meaningfully: Tagging networks
tend to be almost entirely connected. We therefore propose a
generalization of connected components, m-hyperincident
connected components.
We show that decomposing tagging networks into 2-hyperincident
connected components yields a characteristic component
distribution with a salient giant component that can be found
across various datasets.
This pattern changes if the underlying formation process
changes, for example, if the hypergraph is constructed from
search logs, or if the tagging data is contaminated by spam: It
turns out that the second- to 129th largest components of the
spam-labeled Bibsonomy dataset are inhabited exclusively by spam
users. Based on these findings, we propose and unsupervised
method for spam detection. },
added-at = {2009-07-01T13:32:54.000+0200},
address = {New York, NY, USA},
author = {Neubauer, Nicolas and Obermayer, Klaus},
biburl = {https://www.bibsonomy.org/bibtex/2f696989e22dd4c77c8a6352526e13efe/brusilovsky},
booktitle = {HT '09: Proceedings of the Twentieth ACM Conference on Hypertext and Hypermedia},
interhash = {2686bad7ff1f4d07c0b3302dff08368a},
intrahash = {f696989e22dd4c77c8a6352526e13efe},
keywords = {analysis ht09 network spam tagging},
month = {July},
paperid = {fp105},
publisher = {ACM},
session = {Full Paper},
timestamp = {2009-07-01T13:32:54.000+0200},
title = {Hyperincident Connected Components of Tagging Networks},
year = 2009
}