Consensus clustering has emerged as an important elaboration of the classical clustering problem. Consensus clustering, also called aggregation of clustering (or partitions), refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete.
EM has been shown to have favorable convergence properties, automatical satisfaction of constraints, and fast convergence. The next section explains the traditional approach to deriving the EM algorithm and proving its convergence property. Section 3.3 covers the interpretion the EM algorithm as the maximization of two quantities: the entropy and the expectation of complete-data likelihood. Then, the K-means algorithm and the EM algorithm are compared. The conditions under which the EM algorithm is reduced to the K-means are also explained. The discussion in Section 3.4 generalizes the EM algorithm described in Sections 3.2 and 3.3 to problems with partial-data and hidden-state. We refer to this new type of EM as the doubly stochastic EM. Finally, the chapter is concluded in Section 3.5.
The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs based on simulation of (stochastic) flow in graphs.
K. Avrachenkov, V. Dobrynin, D. Nemirovsky, S. Pham, и E. Smirnova. SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, стр. 873--874. New York, NY, USA, ACM, (2008)
T. Hu, H. Xiong, W. Zhou, S. Sung, и H. Luo. SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, стр. 871--872. New York, NY, USA, ACM, (2008)
C. Ding, T. Li, D. Luo, и W. Peng. SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, стр. 831--832. New York, NY, USA, ACM, (2008)
D. Ramage, P. Heymann, C. Manning, и H. Garcia-Molina. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, стр. 54--63. New York, NY, USA, ACM, (2009)
N. Slonim, и N. Tishby. SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, стр. 208--215. New York, NY, USA, ACM, (2000)
X. Liu, и W. Croft. SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, стр. 186--193. New York, NY, USA, ACM, (2004)
G. Erkan. Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, стр. 479--486. Morristown, NJ, USA, Association for Computational Linguistics, (2006)
M. B\=adoiu, S. Har-Peled, и P. Indyk. STOC '02: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, стр. 250--257. New York, NY, USA, ACM, (2002)
M. Balcan, A. Blum, и A. Gupta. SODA '09: Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms, стр. 1068--1077. Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, (2009)
M. Surdeanu, J. Turmo, и A. Ageno. KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, стр. 685--690. New York, NY, USA, ACM, (2005)
D. Arthur, и S. Vassilvitskii. SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, стр. 1027--1035. Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, (2007)
W. Xu, X. Liu, и Y. Gong. SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, стр. 267--273. New York, NY, USA, ACM, (2003)