Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document
In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
In this paper we propose the type of Bayesian networks that we call the hierarchical Bayesian network (HBN) classifiers. We present algorithms for the construction of the HBN classifiers and test them on the Reuters text categorization test collection
C. Hoede, and L. Zhang. Proceedings of the 9th International Conference on Conceptual Structures (ICCS 2001), volume 2120 of Lecture Notes in Computer Science, page 15-28. Springer, (2001)
J. Hopcroft, T. Lou, and J. Tang. Proceedings of the 20th ACM International Conference on Information and Knowledge Management, page 1137--1146. New York, NY, USA, ACM, (2011)
S. Wu, J. Hofman, W. Mason, and D. Watts. Proceedings of the 20th international conference on World wide web, page 705--714. New York, NY, USA, ACM, (2011)
G. Krempl, D. Bodnar, and A. Hrubos. Advances in Intelligent Data Analysis XIV - 14th Int. Symposium, IDA 2015, St. Etienne, France, volume 9385 of Lecture Notes in Computer Science, page XXII--XXIII. Springer, (2015)
D. Shen, Z. Chen, Q. Yang, H. Zeng, B. Zhang, Y. Lu, and W. Ma. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, page 242--249. New York, NY, USA, ACM, (2004)
D. Shen, Z. Chen, Q. Yang, H. Zeng, B. Zhang, Y. Lu, and W. Ma. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, page 242--249. New York, NY, USA, ACM, (2004)