Zusammenfassung
We address the problem of clustering of string patterns, in an Ensemble Methods perspective. In this approach different partitionings of the data are combined attempting to find a better and more robust partition. In this study we cover the different phases of this approach:
from the generation of the partitions, the clustering ensemble, to the combination and validation of the combined result.
For the generation we address, both different clustering algorithms (using both the hierarchical agglomerative concept
and partitional approaches) and different similarity measures (string matching, structural resemblance). The focus
of the paper is the concept of validation/selection of the final data partition. For that, an information-theoretic measure in conjunction with a
variance analysis using bootstrapping is used to quantitatively measure the consistency between partitions and combined results and choose the best obtained result without the use of additional information. Experimental results on a real data set (contour images), show that this approach can be used to unsupervisedly choose the best partition amongst alternative solutions, as validated by measuring the consistency with the
ground truth information.
Nutzer