Abstract
We address the problem of the combination of multiple data partitions, that we call a clustering ensemble. We use a recent
clustering approach, known as Spectral Clustering, and the classical K-Means algorithm to produce the partitions that
constitute the clustering ensembles. A comparative evaluation of several combination methods is performed by measuring the
consistency between the combined data partition and (a) ground truth information, and (b) the clustering ensemble. Two
consistency measures are used: (i) an index based on cluster matching between two partitions; and (ii) an information theoretic
index exploring the concept of mutual information between data partitions. Results on a variety of synthetic and real data sets
show that, while combination results are more robust solutions than individual clusterings, no combination method proves to be a clear winner. Furthermore, without the use of a priori information, the mutual information based measure is not able to systematically select the best combination method for each
problem, optimality being measured based on ground truth information.
Users
Please
log in to take part in the discussion (add own reviews or comments).