LOD-a-lot democratizes access to the Linked Open Data (LOD) Cloud by serving more than 28 billion unique triples from 650K datasets from a single self-indexed file. This corpus can be queried online with a sustainable Linked Data Fragments interface, or it can be downloaded and consumed locally: LOD-a-lot is easy to deploy and only requires limited resources (524 GB of disk space and 15.7 GB of RAM), enabling web-scale repeatable experimentation and research from a high-end laptop.
This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
A. Hotho, R. Jaeschke, and K. Lerman. Semantic Web8
623--624 (April 2017)2017 IOS Press and the authors. This is an author produced version of a paper subsequently published in Semantic Web. Uploaded in accordance with the publisher's self-archiving policy..
T. Tran, N. Tran, A. Teka Hadgu, and R. Jäschke. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)
, Association for Computational Linguistics, (September 2015)