@shabbychef

Sampling online social networks by random walk

, and . In ACM SIGKDD Workshop on Hot Topics in Online Social Networks, page 33--40. ACM, (2012)

Abstract

This paper proposes to use simple random walk, a sampling method supported by most online social networks (OSN), to estimate a variety of properties of large OSNs. We show that due to the scale-free nature of OSNs the estimators derived from random walk sampling scheme are much better than uniform random sampling, even when uniform random samples are available disregarding the notorious high cost of obtaining the random samples. The paper first proposes to use harmonic mean to estimate the average degree of OSNs. The accurate estimation of the average degree leads to the discovery of other properties, such as the population size, the heterogeneity of the degrees, the number of friends of friends, the threshold value for messages to reach a large component, and Gini coefficient of the population. The method is validated in complete Twitter data dated in 2009 that contains 42 million nodes and 1.5 billion edges.

Description

Sampling online social networks by random walk

Links and resources

Tags