You’re no doubt reading this article because there’s a gigantic hiberfil.sys file sitting in the root of your drive, and you want to get rid of it to free up some space… but you can’t!
Mahout currently has
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
High performance java collections (previously colt collections)
A vibrant community
and many more cool stuff to come by this summer thanks to Google summer of code
Tweets2011
As part of the TREC 2011 microblog track, Twitter provided identifiers for approximately 16 million tweets sampled between January 23rd and February 8th, 2011. The corpus is designed to be a reusable, representative sample of the twittersphere - i.e. both important and spam tweets are included.
P. Moreira, Y. Bizzoni, K. Nielbo, I. Lassen, and M. Thomsen. Proceedings of the The 5th Workshop on Narrative Understanding, page 25--35. Toronto, Canada, Association for Computational Linguistics, (July 2023)
J. Droste, H. Deters, J. Puglisi, and J. Klünder. 2023 IEEE 31st International Requirements Engineering Conference Workshops (REW), page 129-135. (September 2023)
S. Ghodsi, and E. Ntoutsi. Proceedings of the 2nd European Workshop on Algorithmic Fairness, Winterthur, Switzerland, June 7th to 9th, 2023, volume 3442 of CEUR Workshop Proceedings, CEUR-WS.org, (June 2023)