There are many different folk tales in the world, but many tales are variations on a limited number of themes. The classification system originally designed by Aarne, and later revised first by Thompson and later by Uther, is intended to bring out the similarities between tales by grouping variants of the same tale under the same ATU category. like hraf
In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.