@alourenco

Comparison of combination methods using Spectral Clustering Ensembles

, and . 4th International Workshop on Pattern Recognition in Information Systems - PRIS 2004 ICEIS04, (2004)

Abstract

We address the problem of the combination of multiple data partitions, that we call a clustering ensemble. We use a recent clustering approach, known as Spectral Clustering, and the classical K-Means algorithm to produce the partitions that constitute the clustering ensembles. A comparative evaluation of several combination methods is performed by measuring the consistency between the combined data partition and (a) ground truth information, and (b) the clustering ensemble. Two consistency measures are used: (i) an index based on cluster matching between two partitions; and (ii) an information theoretic index exploring the concept of mutual information between data partitions. Results on a variety of synthetic and real data sets show that, while combination results are more robust solutions than individual clusterings, no combination method proves to be a clear winner. Furthermore, without the use of a priori information, the mutual information based measure is not able to systematically select the best combination method for each problem, optimality being measured based on ground truth information.

Links and resources

Tags