This page provides two large hyperlink graph for public download. The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks. Below we provide instructions on how to download the graphs as well as basic statistics about their topology.
This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
Brad Fitzpatrick recently wrote an elegant and important post about the Social Graph, a term used by Facebook to describe their social network. In his post, Fitzpatrick defines "social graph" as "the global mapping of everybody and how they're related". He went on to outline the problems with it, as well as a broad set of goals going forward. One problem is that currently you need to have different logins for different social networks. Another issue is portability and ownership of an individual's information, explicitly and implicitly revealed while using social networks. As was recently asserted in the Social...
F. Abel, N. Henze, E. Herder, G. Houben, D. Krause, and E. Leonardi. Proceedings of the International Workshop on Architectures and Building Blocks of Web-Based User-Adaptive Systems (WABBWUAS 2010), 609, CEUR-WS.org, (June 2010)
F. Abel, N. Henze, E. Herder, and D. Krause. Proceedings the 6the International Conference on Semantic
Systems, I-SEMANTICS 2010, Graz, Austria, September 1-3,
2010, ACM, (September 2010)
C. Karande, K. Chellapilla, and R. Andersen. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, page 272--281. New York, NY, USA, ACM, (2009)
G. Xue, H. Zeng, Z. Chen, Y. Yu, W. Ma, W. Xi, and W. Fan. CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, page 118--126. New York, NY, USA, ACM, (2004)
A. Harth, J. Umbrich, A. Hogan, and S. Decker. Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC/ASWC2007), Busan, South Korea, volume 4825 of LNCS, page 211--224. Berlin, Heidelberg, Springer Verlag, (November 2007)
A. Schenker, H. Bunke, M. Last, and A. Kandel. Document Analysis Systems, volume 3163 of Lecture Notes in Computer Science, page 401-412. Springer, (2004)