In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
Mloss is a community effort at producing reproducible research via open source software, open access to data and results, and open standards for interchange.
Mahout currently has
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
High performance java collections (previously colt collections)
A vibrant community
and many more cool stuff to come by this summer thanks to Google summer of code
Platform for sharing and evaluation of intelligent algorithms. Data mining data, experiments, datasets, performance analysis, data repository, challenges. Research and applications, prediction. Data mining and machine learning