Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach
J. Yang, and J. Leskovec. Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, page 587--596. New York, NY, USA, ACM, (2013)
Network communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the network. Communities in networks also overlap as nodes belong to multiple clusters at once. Due to the difficulties in evaluating the detected communities and the lack of scalable algorithms, the task of overlapping community detection in large networks largely remains an open problem. In this paper we present BIGCLAM (Cluster Affiliation Model for Big Networks), an overlapping community detection method that scales to large networks of millions of nodes and edges. We build on a novel observation that overlaps between communities are densely connected. This is in sharp contrast with present community detection methods which implicitly assume that overlaps between communities are sparsely connected and thus cannot properly extract overlapping communities in networks. In this paper, we develop a model-based community detection algorithm that can detect densely overlapping, hierarchically nested as well as non-overlapping communities in massive networks. We evaluate our algorithm on 6 large social, collaboration and information networks with ground-truth community information. Experiments show state of the art performance both in terms of the quality of detected communities as well as in speed and scalability of our algorithm.