In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
G. Fung, J. Yu, P. Yu, и H. Lu. VLDB '05: Proceedings of the 31st international conference on Very large data bases, стр. 181--192. VLDB Endowment, (2005)
L. Baker, и A. McCallum. Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval, стр. 96--103. Melbourne, AU, ACM Press, New York, US, (1998)
B. Lauser, и A. Hotho. Proc. of the 7th European Conference in Research and Advanced Technology for Digital Libraries, ECDL 2003, том 2769 из LNCS, стр. 140-151. Springer, (2003)
Y. Yang, и X. Liu. SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, стр. 42--49. New York, NY, USA, ACM Press, (1999)
S. Bloehdorn, и A. Hotho. Proceedings of the Fourth IEEE International Conference on Data Mining, стр. 331-334. IEEE Computer Society Press, (ноября 2004)
S. Bloehdorn, и A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, стр. 70-87. (августа 2004)
S. Bloehdorn, и A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, стр. 70-87. (августа 2004)
S. Bloehdorn, и A. Hotho. Proceedings of the Workshop on Text-based Information Retrieval (TIR-04) at the 27th German Conference on Artificial Intelligence, (сентября 2004)