FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
Pattern is a web mining module for the Python programming language.
It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics), clustering and classification (k-means, KNN, SVM), and data visualization (graph networks).
HEigen is a spectral analysis tool which computes top k eigenvalues and corresponding eigenvectors of extremely large(~billions of nodes and edges) graphs. HEigen runs on top of Hadoop platform.
SUBDUE is a graph-based knowledge discovery system that finds structural, relational patterns in data representing entities and relationships. SUBDUE represents data using a labeled, directed graph in which entities are represented by labeled vertices or subgraphs, and relationships are represented by labeled edges between the entities. SUBDUE uses the minimum description length (MDL) principle to identify patterns that minimize the number of bits needed to describe the input graph after being compressed by the pattern. SUBDUE can perform several learning tasks, including unsupervised learning, supervised learning, clustering and graph grammar learning.
Mloss is a community effort at
producing reproducible research
via open source software, open
access to data and results, and
open standards for interchange.
Markov Logic Networks (MLNs) is a powerful framework that combines statistical and logical reasoning; they have been applied to many data intensive problems including information extraction, entity resolution, text mining, and natural language processing. Based on principled data management techniques, Tuffy is an MLN inference engine that achieves scalability and orders of magnitude speedup compared to prior art implementations. It is written in Java and relies on PostgreSQL. For a brief introduction to MLNs and the technical details of Tuffy, please see our technical report.
Local Outlier Factor (LOF) is an anomaly detection algorithm presented as "LOF: Identifying Density-based Local Outliers" by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander[1]. The key idea of LOF is comparing the local density of a point's neighborhood with the local density of its neighbors.
G. Forman, and E. Kirshenbaum. CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge management, page 1221--1230. New York, NY, USA, ACM, (2008)
D. Lin. Proceedings of the 17th international conference on Computational linguistics, page 768--774. Morristown, NJ, USA, Association for Computational Linguistics, (1998)
M. Banko, and E. Brill. ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, page 26--33. Morristown, NJ, USA, Association for Computational Linguistics, (2001)
S. Baluja, D. Ravichandran, and D. Sivakumar. Proceeding of the International Conference on Knowledge Discovery and Information Retrieval (KDIR 2009), INSTICC, (6-8 oct 2009)
P. Teufl, and G. Lackner. 10th International Conference on Knowledge Management and Knowledge Technologies 1–3 September 2010, Messe Congress Graz, Austria, page 18 - 18. (2010)