Personal webpages of researchers or faculty members make up a percentage of the academic web. These webpages contain semi-structured or plain text information, and research has shown the importance...
LOD-a-lot democratizes access to the Linked Open Data (LOD) Cloud by serving more than 28 billion unique triples from 650K datasets from a single self-indexed file. This corpus can be queried online with a sustainable Linked Data Fragments interface, or it can be downloaded and consumed locally: LOD-a-lot is easy to deploy and only requires limited resources (524 GB of disk space and 15.7 GB of RAM), enabling web-scale repeatable experimentation and research from a high-end laptop.
Grafana is the leading open source project for visualizing metrics. Supporting rich integration for every popular database like Graphite, Prometheus and InfluxDB.
The Net Data Directory collects and shares information on different sources of data about the Internet. For more about the project, see our about page. To get started, use the search box below, or check out our quick start guide.
This project brings together OII research fellows and doctoral students to shed light on the incorporation of new users and information into the Wikipedia community.
This page provides two large hyperlink graph for public download. The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks. Below we provide instructions on how to download the graphs as well as basic statistics about their topology.
This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
Die Deutsche Gesellschaft für Informationswissenschaft und Informationspraxis e.V. (DGI) fördert die Entwicklungen der Informationswissenschaft und Informationspraxis durch die Beobachtung und Vermittlung von Grundlagen, Arbeitsmethoden und technischen Hilfsmitteln.
oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.
Enabling collaboration and discovery among scientists across all disciplines.
The network of scientists will facilitate scholarly discovery. Institutions will participate in the network by installing VIVO, or by providing semantic web-compliant data to the network.
Netspeak helps you to search for words you don't know, yet. It is a new kind of dictionary that contains everything that has ever been written on the web.
Truthy is a research project that helps you understand how memes spread online. We collect tweets from Twitter and analyze them. With our statistics, images, movies, and interactive data, you can explore these dynamic networks.
Our first application was the study of astroturf campaigns in elections. Currently, we're extending our focus to several themes. Browse the collection on the Memes page. Check out the Movie tool to browse and create animations of meme networks.
Convert your Chrome extension into a Firefox or Safari one!
This service converts Chrome Apps and Extensions to a Firefox and Safari version. This is a beta test and we offer it with no guarantees.
If you are interested in distributing a converted extension, have a problem with a converted extension, if you want to provide feedback or have any question please Contact us!
You can either upload your own package, in crx or zip format; or use the url or ID of an extension on the Chrome WebStore (click to search an extension!)
Short introduction to Vector Space Model (VSM) In information retrieval or text mining, the term frequency - inverse document frequency also called tf-idf, is
H. Zhang, A. Santos, and J. Freire. Proceedings of the 30th ACM International Conference on Information &$\mathsemicolon$ Knowledge Management, ACM, (October 2021)
G. Feng, T. Liu, Y. Wang, Y. Bao, Z. Ma, X. Zhang, and W. Ma. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR \textquotesingle06, ACM Press, (2006)