Netspeak helps you to search for words you don't know, yet. It is a new kind of dictionary that contains everything that has ever been written on the web.
oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.
This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
This project brings together OII research fellows and doctoral students to shed light on the incorporation of new users and information into the Wikipedia community.
J. Abernethy, O. Chapelle, and C. Castillo. Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, page 41--44. New York, NY, USA, ACM, (2008)
M. Ageev, Q. Guo, D. Lagun, and E. Agichtein. Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, page 345--354. New York, NY, USA, ACM, (2011)
O. Alonso, J. Strötgen, R. Baeza-Yates, and M. Gertz. Proceedings of the 1st International Temporal Web Analytics Workshop, volume 707 of CEUR Workshop Proceedings, page 1-8. CEUR-WS.org, (2011)