LOD-a-lot democratizes access to the Linked Open Data (LOD) Cloud by serving more than 28 billion unique triples from 650K datasets from a single self-indexed file. This corpus can be queried online with a sustainable Linked Data Fragments interface, or it can be downloaded and consumed locally: LOD-a-lot is easy to deploy and only requires limited resources (524 GB of disk space and 15.7 GB of RAM), enabling web-scale repeatable experimentation and research from a high-end laptop.
Wappalyzer is a cross-platform utility that uncovers the technologies used on websites. It detects content management systems, ecommerce platforms, web frameworks, server software, analytics tools and many more.
Grafana is the leading open source project for visualizing metrics. Supporting rich integration for every popular database like Graphite, Prometheus and InfluxDB.
The Net Data Directory collects and shares information on different sources of data about the Internet. For more about the project, see our about page. To get started, use the search box below, or check out our quick start guide.
This project brings together OII research fellows and doctoral students to shed light on the incorporation of new users and information into the Wikipedia community.
This page provides two large hyperlink graph for public download. The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks. Below we provide instructions on how to download the graphs as well as basic statistics about their topology.
Unicorn is an HTTP server for Rack applications designed to only serve fast clients on low-latency, high-bandwidth connections and take advantage of features in Unix/Unix-like kernels. Slow clients should only be served by placing a reverse proxy capable of fully buffering both the the request and response in between Unicorn and slow clients.
This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
Die Deutsche Gesellschaft für Informationswissenschaft und Informationspraxis e.V. (DGI) fördert die Entwicklungen der Informationswissenschaft und Informationspraxis durch die Beobachtung und Vermittlung von Grundlagen, Arbeitsmethoden und technischen Hilfsmitteln.
oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.
Enabling collaboration and discovery among scientists across all disciplines.
The network of scientists will facilitate scholarly discovery. Institutions will participate in the network by installing VIVO, or by providing semantic web-compliant data to the network.
Netspeak helps you to search for words you don't know, yet. It is a new kind of dictionary that contains everything that has ever been written on the web.
H. Zhang, A. Santos, and J. Freire. Proceedings of the 30th ACM International Conference on Information &$\mathsemicolon$ Knowledge Management, ACM, (October 2021)
G. Feng, T. Liu, Y. Wang, Y. Bao, Z. Ma, X. Zhang, and W. Ma. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR \textquotesingle06, ACM Press, (2006)