Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document
In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
In this paper we propose the type of Bayesian networks that we call the hierarchical Bayesian network (HBN) classifiers. We present algorithms for the construction of the HBN classifiers and test them on the Reuters text categorization test collection
L. Wu, M. Li, Z. Li, W. Ma, and N. Yu. MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval, page 115--124. New York, NY, USA, ACM, (2007)
P. Eklund, and P. Deer. Proceedings of the 9th Int. Conf. on Information Processing and Management of Uncertainty (IPMU 2002), page 187-194. ESIA - Universite Savoie, (2002)presentation slides.
K. Fung, and O. Bodenreider. AMIA Annu Symp Proc, (2005)Fung, Kin Wah Bodenreider, Olivier Evaluation Studies Research Support, N.I.H., Extramural United States AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium.