Book,

Knowledge Mining Using Robust Clustering

.
Jyväskylä Studies in Computing University of Jyväsylä, (2006)

Abstract

This work is devoted to the development of scalable and robust algorithms for data mining and knowledge discovery problems. The main interest lies in so-called prototype-based clustering methods that are implemented using iterative relocation algorithms. Different elements of prototype-based data clustering are discussed and basic algorithms are described. In order to support the usability of the new methods and algorithms, a modified knowledge mining process model is also proposed. The refined model is based on the well-known knowledge discovery process, but it emphasizes more domain analysis and ''black box'' nature of data mining. Significance and importance of knowledge mining are clarified by outlining the current body of the existing knowledge with real applications.As the main outcome of this thesis, a highly automated robust clustering method is presented. The method consists of a number of separately developed and tested elements such as initialization, prototype estimation, and missing data strategy. Non-smooth nature of the robust statistics is rigorously considered from the point of view of non-smooth optimization. Numerical and statistical properties, such as robustness, scalability, computational and statistical efficiency, of the presented methods are tested and illustrated through a number of numerical experiments. The results are completed with some analytic results and illustrative real-world examples. Furthermore, in order to estimate the correct number of clusters, a new proposal of a cluster validity index is given.

Tags

Users

  • @vipirtti

Comments and Reviews