@alourenco

Ensemble Methods in the Clustering of String Patterns

, and . IEEE Workshop on Applications of Computer Vision - WACV/MOTION'05, 1, page 143 - 148. Colorado, United States, (January 2005)

Abstract

We address the problem of clustering of contour images from hardware tools based on string descriptions, in a comparative study of cluster combination techniques. Several clustering algorithms are addressed using both the hierarchical agglomerative concept and partitional approaches. In the later class of algorithms, we explore: an adaptation of the K-means algorithm to string patterns using the median string as cluster representative; the error-correcting parsing approach by Fu; and the very recent spectral clustering approach. These algorithms are applied using several dissimilarity measures, namely: minimum code length based measures; dissimilarity based on the concept of reduction in grammatical complexity; and error-correcting parsing. In a first instance, clustering algorithms are applied individually to the image data set, and results are evaluated in terms of the error rate, taking as ground truth known labeling of the data. In a second step, we combine multiple data partitions, that we call a clustering ensemble, using three state-of-the-art clustering combination techniques. Results show that combination methods lead in general to better data partitioning, as compared to ground truth information.

Links and resources

Tags