In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
In this post you will see 5 recipes of supervised classification algorithms applied to small standard datasets that are provided with the scikit-learn library.
I. Androutsopoulos, J. Koutsias, K. Cb, and C. Spyropoulos. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, page 160--167. (2000)
G. Cormack, José, and E. Sánz. CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, page 313--320. New York, NY, USA, ACM, (2007)
G. Cormack, José, and E. Sánz. SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, page 871--872. New York, NY, USA, ACM, (2007)
M. Kelly, D. Hand, and N. Adams. KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, page 367--371. New York, NY, USA, ACM, (1999)
Y. Song, L. Zhang, and C. Giles. CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining, page 93--102. New York, NY, USA, ACM, (2008)