- Dias, G. & Alves, E. (2005). Language-Independent Informative Topic Segmentation. In Proceedings of the 9th International Symposium on Social Communication. Santiago de Cuba, Cuba, January 24-28. pp. 588-592. ISBN: 9597174057. [pdf]
Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, a
Anna Szymkowiak Have, Mark A. Girolami, Jan Larsen
Abstract: Methods for spectral clustering have been proposed
recently which rely on the eigenvalue decomposition of an affinity
matrix. In this work it is proposed that the affinity matrix
is created based on the elements of a non-parametric density
estimator. This matrix is then decomposed to obtain posterior
probabilities of class membership using an appropriate form of
nonnegative matrix factorization. The troublesome selection of
hyperparameters such as kernel width and number of clusters
can be obtained using standard cross-validation methods as is
demonstrated on a number of diverse data sets.
J. Han, M. Kamber, and A. K. H. Tung. Geographic Data Mining and Knowledge Discovery, chapter Spatial Clustering Methods in Data Mining: A Survey, pages 1–29. Taylor and Francis, 2001. [url]