In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
G. Fung, J. Yu, P. Yu, und H. Lu. VLDB '05: Proceedings of the 31st international conference on Very large data bases, Seite 181--192. VLDB Endowment, (2005)
L. Baker, und A. McCallum. Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval, Seite 96--103. Melbourne, AU, ACM Press, New York, US, (1998)
B. Lauser, und A. Hotho. Proc. of the 7th European Conference in Research and Advanced Technology for Digital Libraries, ECDL 2003, Volume 2769 von LNCS, Seite 140-151. Springer, (2003)
Y. Yang, und X. Liu. SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Seite 42--49. New York, NY, USA, ACM Press, (1999)
S. Bloehdorn, und A. Hotho. Proceedings of the Fourth IEEE International Conference on Data Mining, Seite 331-334. IEEE Computer Society Press, (November 2004)
S. Bloehdorn, und A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Seite 70-87. (August 2004)
S. Bloehdorn, und A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Seite 70-87. (August 2004)
S. Bloehdorn, und A. Hotho. Proceedings of the Workshop on Text-based Information Retrieval (TIR-04) at the 27th German Conference on Artificial Intelligence, (September 2004)