- Requirements The main requirements that a clustering algorithm should satisfy are: scalability; dealing with different types of attributes; discoveri...Requirements The main requirements that a clustering algorithm should satisfy are: scalability; dealing with different types of attributes; discovering clusters with arbitrary shape; minimal requirements for domain knowledge to determine input parameters; ability to deal with noise and outliers; insensitivity to order of input records; high dimensionality; interpretability and usability.
- One of the consequences of fast computers, the Internet and inexpensive storage is the widespread collection of data from a variety of sources and of a var...One of the consequences of fast computers, the Internet and inexpensive storage is the widespread collection of data from a variety of sources and of a variety of types. Sources of data include web click streams, financial transactions, and observational science data. Data types include categorical vs. numerical, static vs. dynamic, points in a metric space vs. vertices in a graph. The nagging question often posed about these data sets is: Can we find something interesting that we did not already know? The first answer to this question is often: Let's try clustering the data! Indeed, clustering is one of the most widely used tools for analyzing data sets. Some modern applications of clustering include clustering the web, clustering search results, clustering click streams, customer segmentation, and community discovery in social networks. Because of its recent ubiquitous applicability, the field of clustering has undergone major revolution over the last few decades characterized by advances in approximation and randomized algorithms, novel formulations of the clustering problem, algorithms for clustering massively large data sets, algorithms for clustering data streams, and dimension reduction techniques


user