Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document
In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
In this paper we propose the type of Bayesian networks that we call the hierarchical Bayesian network (HBN) classifiers. We present algorithms for the construction of the HBN classifiers and test them on the Reuters text categorization test collection
B. Rink, and S. Harabagiu. Proceedings of the 5th International Workshop on Semantic Evaluation, page 256--259. Association for Computational Linguistics, (2010)
A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal. WWW '08: Proceeding of the 17th international conference on World Wide Web, page 61--70. New York, NY, USA, ACM, (2008)
X. Li, and V. Ciesielski. AI 2004: Advances in Artificial Intelligence:
Proceedings of the 17th Australian Joint Conference on
Artificial Intelligence, volume 3339 of Lecture Notes in Computer Science, page 898--909. Cairns, Australia, Springer, (December 2004)
P. Wang, E. Tsang, T. Weise, K. Tang, and X. Yao. Proceedings of the 9th IEEE International Conference on Cognitive Informatics (ICCI'10), page 722--727. IEEE Computer Society Press: Los Alamitos, CA, USA, (2010)
A. Almal, A. Mitra, R. Datar, P. Lenehan, D. Fry, R. Cote, and W. Worzel. GECCO 2006: Proceedings of the 8th annual conference
on Genetic and evolutionary computation, 1, page 239--246. Seattle, Washington, USA, ACM Press, (8-12 July 2006)
W. Smart, and M. Zhang. Proceedings of the 8th European Conference on Genetic
Programming, volume 3447 of Lecture Notes in Computer Science, page 227--239. Lausanne, Switzerland, Springer, (30 March - 1 April 2005)