In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
G. Ifrim, M. Theobald, and G. Weikum. Proceedings of the 22nd International Conference on Machine Learning - Learning in Web Search (LWS 2005), page 18--26. Bonn, Germany, (2005)
Y. Yang, and X. Liu. SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, page 42--49. New York, NY, USA, ACM Press, (1999)
R. Angelova, and G. Weikum. SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, page 485--492. New York, NY, USA, ACM, (2006)
M. Li, Y. Cheng, and H. Zhao. CGIV '04: Proceedings of the International Conference on Computer Graphics, Imaging and Visualization, page 183--186. Washington, DC, USA, IEEE Computer Society, (2004)
L. Baker, and A. McCallum. Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval, page 96--103. Melbourne, AU, ACM Press, New York, US, (1998)
Y. Yang, and X. Liu. SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, page 42--49. New York, NY, USA, ACM Press, (1999)
C. Henning, and R. Ewerth. Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, page 14--22. New York, NY, USA, ACM, (2017)
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, page 1480--1489. San Diego, California, Association for Computational Linguistics, (June 2016)
X. Zhang, and Y. LeCun. (2015)cite arxiv:1502.01710Comment: This technical report is superseded by a paper entitled "Character-level Convolutional Networks for Text Classification", arXiv:1509.01626. It has considerably more experimental results and a rewritten introduction.