In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
E. Loza Mencía, and J. Fürnkranz. Semantic Processing of Legal Texts, volume 6036 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2010)
E. Loza Mencía, and J. Fürnkranz. Machine Learning and Knowledge Discovery in Databases, volume 5212 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2008)