Artikel in einem Konferenzbericht,

String Patterns: from Single Clustering to Ensemble Methods and Validation

, und .
7th International Workshop on Pattern Recognition in Information Systems PRIS-2007, Funchal, Portugal, (Juni 2007)

Zusammenfassung

We address the problem of clustering of string patterns, in an Ensemble Methods perspective. In this approach different partitionings of the data are combined attempting to find a better and more robust partition. In this study we cover the different phases of this approach: from the generation of the partitions, the clustering ensemble, to the combination and validation of the combined result. For the generation we address, both different clustering algorithms (using both the hierarchical agglomerative concept and partitional approaches) and different similarity measures (string matching, structural resemblance). The focus of the paper is the concept of validation/selection of the final data partition. For that, an information-theoretic measure in conjunction with a variance analysis using bootstrapping is used to quantitatively measure the consistency between partitions and combined results and choose the best obtained result without the use of additional information. Experimental results on a real data set (contour images), show that this approach can be used to unsupervisedly choose the best partition amongst alternative solutions, as validated by measuring the consistency with the ground truth information.

Tags

Nutzer

  • @alourenco

Kommentare und Rezensionen