In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
H. Yu, J. Han, and K. Chang. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 239--248. New York, NY, USA, ACM, (2002)
X. Li, B. Liu, and S. Ng. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, page 218--228. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
T. Joachims. Proceedings of ECML-98, 10th European Conference on Machine Learning, 1398, page 137--142. Chemnitz, DE, Springer Verlag, Heidelberg, DE, (1998)
A. Sun, E. Lim, and W. Ng. Proceedings of the 4th international workshop on Web information and data management, page 96--99. New York, NY, USA, ACM, (2002)
G. Forman, M. Scholz, and S. Rajaram. KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, page 299--308. New York, NY, USA, ACM, (2009)
B. Lauser, and A. Hotho. Proc. of the 7th European Conference in Research and Advanced Technology for Digital Libraries, ECDL 2003, volume 2769 of LNCS, page 140-151. Springer, (2003)
H. Yu, J. Han, and K. Chang. KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, page 239--248. New York, NY, USA, ACM Press, (2002)