In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
H. Yu, J. Han, and K. Chang. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 239--248. New York, NY, USA, ACM, (2002)
X. Li, B. Liu, and S. Ng. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, page 218--228. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
T. Joachims. Proceedings of ECML-98, 10th European Conference on Machine Learning, 1398, page 137--142. Chemnitz, DE, Springer Verlag, Heidelberg, DE, (1998)
A. Sun, E. Lim, and W. Ng. Proceedings of the 4th international workshop on Web information and data management, page 96--99. New York, NY, USA, ACM, (2002)