Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document
In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
In this paper we propose the type of Bayesian networks that we call the hierarchical Bayesian network (HBN) classifiers. We present algorithms for the construction of the HBN classifiers and test them on the Reuters text categorization test collection
C. Au Yeung, N. Gibbins, and N. Shadbolt. Proceedings of the Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 2008), co-located with ECIR 2008, Glasgow, United Kingdom, 31 March, 2008, page 48--61. (2008)
L. Bing, R. Guo, W. Lam, Z. Niu, and H. Wang. Proceedings of the 37th International ACM SIGIR Conference on Research &\#38; Development in Information Retrieval, page 767--776. New York, NY, USA, ACM, (2014)
B. Choi, and Z. Yao. Foundations and Advances in Data Mining, volume 180 of Studies in Fuzziness and Soft Computing, Springer, Berlin / Heidelberg, (2005)
A. Sun, E. Lim, and W. Ng. Proceedings of the 4th international workshop on Web information and data management, page 96--99. New York, NY, USA, ACM, (2002)