- bag of words document data sets
- Elefant (Efficient Learning, Large-scale Inference, and Optimisation Toolkit) is an open source library for machine learning licensed under the Mozilla Pub...Elefant (Efficient Learning, Large-scale Inference, and Optimisation Toolkit) is an open source library for machine learning licensed under the Mozilla Public License (MPL). We develop an open source machine learning toolkit which provides
- Due to an explosion of data, there has been an increasing demand for scalable machine learning and data mining algorithms in many applications, such as soc...Due to an explosion of data, there has been an increasing demand for scalable machine learning and data mining algorithms in many applications, such as social network analysis, information retrieval, recommendation system, biology applications, multimedia, and e-commerce. The objective of this special issue is to connect academia and industry on the methods and experiences of large scale data analysis. We look for scalable machine learning, data mining algorithms, implementations, frameworks and case studies that target at real and practical scenarios for large datasets. The focus is to identify the real challenges in large-scale data mining and to investigate the scalable methods and practical solutions of the core machine learning and data mining problems with respect to both theoretical and experimental perspectives.
- Broadly speaking, there are two no free lunch theorems. One for supervised machine learning and one for search/optimization.
- Compare k-means and PAM. PAM is also known as k-medoids.
- Useful bullet points on different types of clustering.
- Comparison of naive bayes classifiers, support vector machines and modular multilayer perceptron neural networks.
- This tool performs spectral clustering using either sparse similarity matrix (nearest neighbors) or the Nystrom method.
- MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC, (2010)
- Statistical Science 16(3):199--215 (2001)
- British Journal of Mathematical and Statistical Psychology 59(1):1--34 (2006)
- (2007)
- IEEE Transactions on Knowledge and Data Engineering 19(8):1026-1041 (2007)
- CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management, page 600--607. New York, NY, USA, ACM, (2002)
- Journal of the American Society for Information Science and Technology 56(13):1448--1462 (2005)
- ICML2010 (2010)
- ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, page 26--33. Morristown, NJ, USA, Association for Computational Linguistics, (2001)
- Knowledge and Information Systems 14(1):1--37 (2008)
- IEEE Intelligent Systems 24(2):8--12 (2009)
- Advances in Neural Information Processing Systems (2006)
- J. Mach. Learn. Res. (2004)


user