LOD-a-lot democratizes access to the Linked Open Data (LOD) Cloud by serving more than 28 billion unique triples from 650K datasets from a single self-indexed file. This corpus can be queried online with a sustainable Linked Data Fragments interface, or it can be downloaded and consumed locally: LOD-a-lot is easy to deploy and only requires limited resources (524 GB of disk space and 15.7 GB of RAM), enabling web-scale repeatable experimentation and research from a high-end laptop.
This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
Die Deutsche Gesellschaft für Informationswissenschaft und Informationspraxis e.V. (DGI) fördert die Entwicklungen der Informationswissenschaft und Informationspraxis durch die Beobachtung und Vermittlung von Grundlagen, Arbeitsmethoden und technischen Hilfsmitteln.
B. Berendt, A. Hotho, and G. Stumme. Web Semantics: Science, Services and Agents on the World Wide Web8 (2-3):
95 - 96(2010)Bridging the Gap--Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0; The Future of Knowledge Dissemination: The Elsevier Grand Challenge for the Life Sciences.
B. Berendt, N. Glance, and A. Hotho (Eds.).
. Workshop at 18th Europ. Conf. on Machine Learning (ECML'08) / 11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), (2008)