@vivion

Optimising k-means clustering results with standard software packages

, and . Computational Statistics & Data Analysis, 49 (4): 969 - 973 (2005)
DOI: 10.1016/j.csda.2004.06.017

Abstract

The k-means method of clustering is a very popular technique available on most standard statistical software packages. It is an iterative algorithm that requires specification of a starting configuration, and many packages use a random start unless the user declares otherwise. Typically, users are encouraged to run the analysis from a number of random starts and to take the best resultant solution. Some packages, however, base the default starting option on a preliminary analysis such as hierarchical clustering. This does not allow users to produce different "replicate" solutions, so the temptation is to treat the final solution as a global rather than local optimum. The dangers of drawing this conclusion are highlighted, an iterative scheme that generally improves on the default solution is suggested, and this scheme is compared with the "best of 20 random starts" method favoured by many users.

Description

ScienceDirect - Computational Statistics & Data Analysis : Optimising k-means clustering results with standard software packages

Links and resources

Tags