The Cataloging Lab is a place for catalogers and anyone who cares about library metadata to experiment with creating better controlled vocabularies. Suggesting additions and changes to the Library of Congress Subject Headings vocabulary can be an isolating endeavor—it can be difficult to determine if your heading has already been proposed or if someone else is working on a proposal at the same time you are. The Cataloging Lab is designed to be a wiki where folks can collaborate on headings together to create stronger proposals.
There are many different folk tales in the world, but many tales are variations on a limited number of themes. The classification system originally designed by Aarne, and later revised first by Thompson and later by Uther, is intended to bring out the similarities between tales by grouping variants of the same tale under the same ATU category. like hraf
In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.