Abstract

Subspace clustering is an extension of traditional cluster- ing that seeks to ¯nd clusters in di®erent subspaces within a dataset. Often in high dimensional data, many dimen- sions are irrelevant and can mask existing clusters in noisy data. Feature selection removes irrelevant and redundant dimensions by analyzing the entire dataset. Subspace clus- tering algorithms localize the search for relevant dimensions allowing them to ¯nd clusters that exist in multiple, possi- bly overlapping subspaces. There are two major branches of subspace clustering based on their search strategy. Top- down algorithms ¯nd an initial clustering in the full set of dimensions and evaluate the subspaces of each cluster, it- eratively improving the results. Bottom-up approaches ¯nd dense regions in low dimensional spaces and combine them to form clusters. This paper presents a survey of the various subspace clustering algorithms along with a hierarchy orga- nizing the algorithms by their de¯ning characteristics. We then compare the two main approaches to subspace cluster- ing using empirical scalability and accuracy tests and discuss some potential applications where subspace clustering could be particularly useful.

Links and resources

Tags

community