This page provides two large hyperlink graph for public download. The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks. Below we provide instructions on how to download the graphs as well as basic statistics about their topology.
Hany M. SalahEldeen, and Michael L. Nelson. Proceedings of the 22Nd International Conference on World Wide Web, page 1075--1082. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2013)
Rianne Kaptein, Pavel Serdyukov, and Jaap Kamps. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, page 839--840. New York, NY, USA, ACM, (2010)
Miguel Costa, Daniel Gomes, Francisco Couto, and Mário Silva. Proceedings of the 22nd International Conference on World Wide Web Companion, page 1045--1050. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2013)