In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
This post is meant as a summary of many of the concepts that I learned in Marti Hearst's Natural Language Processing class at the UC Berkeley School of Information.
Labeled LDA (D. Ramage, D. Hall, R. Nallapati and C.D. Manning; EMNLP2009) is a supervised topic model derived from LDA (Blei+ 2003). While LDA's estimated topics don't often equal to human's expectation because it is unsupervised, Labeled LDA is to treat documents with multiple labels. I implemented Labeled LDA in python.
This page provides 32- and 64-bit Windows binaries of many scientific open-source extension packages for the official CPython distribution of the Python programming language.
In this post you will see 5 recipes of supervised classification algorithms applied to small standard datasets that are provided with the scikit-learn library.