Inproceedings,

An Information-Theoretical Approach to Clustering Categorical Databases using Genetic Algorithms

, and .
In 2nd SIAM ICDM, Workshop on clustering high dimensional data, page 37--46. (2002)

Abstract

Clustering categorical databases presents special difficulties due to the absence of natural dissimilarities between objects. We present a solution that overcomes these difficulties that is based on an information-theoretical definition of dissimilarities between partitions of finite sets (applied to partitions of the set of objects to be clustered which are determined by categorical attributes) and makes use of genetic algorithms for finding an acceptable approximative clustering. We tested our method on databases for which the clustering of the rows is known in advance and we show that our proposed method finds the natural clustering of the data with a good classification rate, better than that of the classical algorithm k-means.

Tags

Users

  • @k.e.

Comments and Reviews