InfiniteGraph enables large-scale graph processing, data analytics and discovery. InfiniteGraph's uniquely distributed graph database solution enables commercial, enterprise, government and other organizations to discover complex relationships in their vast and highly distributed data, with significant time-to-market advantages and technical cost savings.
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.
The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlschütter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA. Click here to read the paper and the presentation slides
The main objective of FLOSSMETRICS is to construct, publish and analyse a large scale database with information and metrics about libre software development coming from several thousands of software projects, using existing methodologies, and tools already developed.
M. D'Ambros. Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, page 529--530. New York, NY, USA, ACM, (2010)