In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, page 1480--1489. San Diego, California, Association for Computational Linguistics, (June 2016)
S. Bloehdorn, and A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 70-87. (August 2004)
Y. Kim. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, page 1746--1751. (2014)
C. Henning, and R. Ewerth. Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, page 14--22. New York, NY, USA, ACM, (2017)
S. Bloehdorn, and A. Hotho. Proceedings of the Workshop on Text-based Information Retrieval (TIR-04) at the 27th German Conference on Artificial Intelligence, (September 2004)
X. Zhang, and Y. LeCun. (2015)cite arxiv:1502.01710Comment: This technical report is superseded by a paper entitled "Character-level Convolutional Networks for Text Classification", arXiv:1509.01626. It has considerably more experimental results and a rewritten introduction.
S. Bloehdorn, and A. Hotho. Proceedings of the Workshop on Text-based Information Retrieval (TIR-04) at the 27th German Conference on Artificial Intelligence, (September 2004)
B. Lauser, and A. Hotho. Proc. of the 7th European Conference in Research and Advanced Technology for Digital Libraries, ECDL 2003, volume 2769 of LNCS, page 140-151. Springer, (2003)
S. Bloehdorn, and A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 70-87. (August 2004)
S. Bloehdorn, and A. Hotho. Proceedings of the Fourth IEEE International Conference on Data Mining, page 331-334. IEEE Computer Society Press, (November 2004)
S. Bloehdorn, and A. Hotho. Proceedings of the Workshop on Text-based Information Retrieval (TIR-04) at the 27th German Conference on Artificial Intelligence, (September 2004)
S. Bloehdorn, and A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 70-87. (August 2004)
S. Bloehdorn, and A. Hotho. Proceedings of the Fourth IEEE International Conference on Data Mining, page 331-334. IEEE Computer Society Press, (November 2004)
B. Lauser, and A. Hotho. Proc. of the 7th European Conference in Research and Advanced Technology for Digital Libraries, ECDL 2003, volume 2769 of LNCS, page 140-151. Springer, (2003)
S. Dori-Hacohen, and J. Allan. Proceedings of the 22nd ACM international conference on Conference on information &\#38; knowledge management, page 1845--1848. New York, NY, USA, ACM, (2013)
E. Loza Mencía, and J. Fürnkranz. Machine Learning and Knowledge Discovery in Databases, volume 5212 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2008)
E. Loza Mencía, and J. Fürnkranz. Semantic Processing of Legal Texts, volume 6036 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2010)
X. Li, B. Liu, and S. Ng. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, page 218--228. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
W. Cavnar, and J. Trenkle. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, page 161--175. Las Vegas, US, (1994)
C. Rose, A. Roque, D. Bhembe, and K. VanLehn. Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing - Volume 2, page 68--75. Stroudsburg, PA, USA, Association for Computational Linguistics, (2003)
S. Feldman, M. Marin, M. Ostendorf, and M. Gupta. Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, page 4781--4784. Washington, DC, USA, IEEE Computer Society, (2009)
X. Phan, L. Nguyen, and S. Horiguchi. WWW '08: Proceeding of the 17th international conference on World Wide Web, page 91--100. New York, NY, USA, ACM, (2008)
P. Schonhofen. WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, page 456--462. Washington, DC, USA, IEEE Computer Society, (2006)
G. Forman, M. Scholz, and S. Rajaram. KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, page 299--308. New York, NY, USA, ACM, (2009)
M. Li, Y. Cheng, and H. Zhao. CGIV '04: Proceedings of the International Conference on Computer Graphics, Imaging and Visualization, page 183--186. Washington, DC, USA, IEEE Computer Society, (2004)
R. Angelova, and G. Weikum. SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, page 485--492. New York, NY, USA, ACM, (2006)
G. Ifrim, M. Theobald, and G. Weikum. Proceedings of the 22nd International Conference on Machine Learning - Learning in Web Search (LWS 2005), page 18--26. Bonn, Germany, (2005)
L. Hirsch, R. Hirsch, and M. Saeedi. GECCO '07: Proceedings of the 9th annual conference on
Genetic and evolutionary computation, 2, page 1604--1611. London, ACM Press, (7-11 July 2007)
Y. Yang, and X. Liu. SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, page 42--49. New York, NY, USA, ACM Press, (1999)