In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
G. Forman, M. Scholz, and S. Rajaram. KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, page 299--308. New York, NY, USA, ACM, (2009)
M. Li, Y. Cheng, and H. Zhao. CGIV '04: Proceedings of the International Conference on Computer Graphics, Imaging and Visualization, page 183--186. Washington, DC, USA, IEEE Computer Society, (2004)
R. Angelova, and G. Weikum. SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, page 485--492. New York, NY, USA, ACM, (2006)
G. Ifrim, M. Theobald, and G. Weikum. Proceedings of the 22nd International Conference on Machine Learning - Learning in Web Search (LWS 2005), page 18--26. Bonn, Germany, (2005)
L. Hirsch, R. Hirsch, and M. Saeedi. GECCO '07: Proceedings of the 9th annual conference on
Genetic and evolutionary computation, 2, page 1604--1611. London, ACM Press, (7-11 July 2007)
Y. Yang, and X. Liu. SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, page 42--49. New York, NY, USA, ACM Press, (1999)