This page provides two large hyperlink graph for public download. The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks. Below we provide instructions on how to download the graphs as well as basic statistics about their topology.
G. Lee, S. Kang, and J. Whang. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, (July 2019)
D. Gibson, J. Kleinberg, and P. Raghavan. Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems links, objects, time and space---structure in hypermedia systems - HYPERTEXT \textquotesingle98, ACM Press, (1998)
M. Meiss, B. Goncalves, J. Ramasco, A. Flammini, and F. Menczer. Proc. 7th Workshop on Algorithms and Models for the Web Graph (WAW), volume 6516 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, (2010)
B. Pereira Nunes, R. Kawase, S. Dietze, D. Taibi, M. Casanova, and W. Nejdl. Proceedings of the Web of Linked Entities Workshop in conjuction with the 11th International Semantic Web Conference, volume 906 of CEUR-WS.org, page 45--57. (November 2012)
A. Cuzzocrea, and M. Fisichella. Proc. of 1st international workshop on linked web data management (LWDM 2011) in conjunction with the EDBT 2011, Uppsala, Sweden - March 21-25, 2011, (2011)