LOD-a-lot democratizes access to the Linked Open Data (LOD) Cloud by serving more than 28 billion unique triples from 650K datasets from a single self-indexed file. This corpus can be queried online with a sustainable Linked Data Fragments interface, or it can be downloaded and consumed locally: LOD-a-lot is easy to deploy and only requires limited resources (524 GB of disk space and 15.7 GB of RAM), enabling web-scale repeatable experimentation and research from a high-end laptop.
Wappalyzer is a cross-platform utility that uncovers the technologies used on websites. It detects content management systems, ecommerce platforms, web frameworks, server software, analytics tools and many more.
Grafana is the leading open source project for visualizing metrics. Supporting rich integration for every popular database like Graphite, Prometheus and InfluxDB.
The Net Data Directory collects and shares information on different sources of data about the Internet. For more about the project, see our about page. To get started, use the search box below, or check out our quick start guide.
This project brings together OII research fellows and doctoral students to shed light on the incorporation of new users and information into the Wikipedia community.
This page provides two large hyperlink graph for public download. The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks. Below we provide instructions on how to download the graphs as well as basic statistics about their topology.
This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
Die Deutsche Gesellschaft für Informationswissenschaft und Informationspraxis e.V. (DGI) fördert die Entwicklungen der Informationswissenschaft und Informationspraxis durch die Beobachtung und Vermittlung von Grundlagen, Arbeitsmethoden und technischen Hilfsmitteln.
oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.
Enabling collaboration and discovery among scientists across all disciplines.
The network of scientists will facilitate scholarly discovery. Institutions will participate in the network by installing VIVO, or by providing semantic web-compliant data to the network.
Netspeak helps you to search for words you don't know, yet. It is a new kind of dictionary that contains everything that has ever been written on the web.
Truthy is a research project that helps you understand how memes spread online. We collect tweets from Twitter and analyze them. With our statistics, images, movies, and interactive data, you can explore these dynamic networks.
Our first application was the study of astroturf campaigns in elections. Currently, we're extending our focus to several themes. Browse the collection on the Memes page. Check out the Movie tool to browse and create animations of meme networks.
Convert your Chrome extension into a Firefox or Safari one!
This service converts Chrome Apps and Extensions to a Firefox and Safari version. This is a beta test and we offer it with no guarantees.
If you are interested in distributing a converted extension, have a problem with a converted extension, if you want to provide feedback or have any question please Contact us!
You can either upload your own package, in crx or zip format; or use the url or ID of an extension on the Chrome WebStore (click to search an extension!)
Short introduction to Vector Space Model (VSM) In information retrieval or text mining, the term frequency - inverse document frequency also called tf-idf, is
Brad Fitzpatrick recently wrote an elegant and important post about the Social Graph, a term used by Facebook to describe their social network. In his post, Fitzpatrick defines "social graph" as "the global mapping of everybody and how they're related". He went on to outline the problems with it, as well as a broad set of goals going forward. One problem is that currently you need to have different logins for different social networks. Another issue is portability and ownership of an individual's information, explicitly and implicitly revealed while using social networks. As was recently asserted in the Social...
As a Google user, you're familiar with the speed and accuracy of a Google search. How exactly does Google manage to find the right results for every query as quickly as it does? The heart of Google's search technology is PigeonRank™, a system for ranking web pages developed by Google founders Larry Page and Sergey Brin at Stanford University.
Its so fun to write oversimplified posts about such-and-such is dead. Not because its true. At best you can point out something is broken and alternatives are rising fast. But I wonder how the people behind the aging technology and...
H. Zhang, A. Santos, and J. Freire. Proceedings of the 30th ACM International Conference on Information &$\mathsemicolon$ Knowledge Management, ACM, (October 2021)
G. Feng, T. Liu, Y. Wang, Y. Bao, Z. Ma, X. Zhang, and W. Ma. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR \textquotesingle06, ACM Press, (2006)
M. Paris, and R. Jäschke. Proceedings of the 14th International Conference on Knowledge Science, Engineering and Management, volume 12816 of Lecture Notes in Artificial Intelligence, page 1--14. Springer, (2021)