In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
G. Fung, J. Yu, P. Yu, and H. Lu. VLDB '05: Proceedings of the 31st international conference on Very large data bases, page 181--192. VLDB Endowment, (2005)
L. Baker, and A. McCallum. Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval, page 96--103. Melbourne, AU, ACM Press, New York, US, (1998)
B. Lauser, and A. Hotho. Proc. of the 7th European Conference in Research and Advanced Technology for Digital Libraries, ECDL 2003, volume 2769 of LNCS, page 140-151. Springer, (2003)