A social database about things you know and love, spanning millions of topics in thousands of categories. Explore Freebase, add to it, or build applications with it.
The EPSG Geodetic Parameter Dataset is a structured dataset of Coordinate Reference Systems and Coordinate Transformations, accessible through this data registry.
Berlin wird leiser: aktiv gegen Verkehrslärm. - Die Senatsverwaltung für Stadtentwicklung und Umwelt Berlin will ihre Bürger an der Erarbeitung des Lärmaktionsplans beteiligen. Alle Bürgerinnen und Bürger können mitteilen, wo es ihnen in Berlin zu laut ist und welche Maßnahmen Abhilfe schaffen könnten. Auf dieser Basis erarbeitet die Stadt Maßnahmen, wie Berlin leiser werden kann.
This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
We have released over a million images onto Flickr Commons for anyone to use, remix and repurpose. These images were taken from the pages of 17th, 18th and 19th century books digitised by Microsoft who then generously gifted the scanned images to us, allowing us to release them back into...
Welcome to the Networked Digital Library of Theses and Dissertations (NDLTD), an international organization dedicated to promoting the adoption, creation, use, dissemination, and preservation of electronic theses and dissertations (ETDs). We support electronic publishing and open access to scholarship in order to enhance the sharing of knowledge worldwide. Our website includes resources for university administrators, librarians, faculty, students, and the general public.
The English Short Title Catalogue (ESTC) lists over 460,000 items published between 1473 and 1800 mainly, but not exclusively, in English published mainly in the British Isles and North America from the collections of the British Library and over 2,000 other libraries
The dataset genres.json contains (sub)genre classifications for novels published between 1770 and 1915. The genres covered are
gothic novels
"silver fork" novels
national tale novels
The Net Data Directory collects and shares information on different sources of data about the Internet. For more about the project, see our about page. To get started, use the search box below, or check out our quick start guide.
Expect to see an emphasis on the scholarly and research implications of the acquisition. I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I’m certain we’ll learn things that none of us now can even possibly conceive.
S-Match is an open source Java framework for semantic matching. It contains semantic matching, minimal semantic matching and structure preserving semantic matching algorithm implementations.
Scientext is a new, on-line French and English corpus of scientific texts. The corpus includes 4.8 million running tokens in French, 13 million words of research articles in English (medicine and biology), and an English-language sub-corpus of French undergraduate students’ texts (1,1 million words). The corpus is organized to facilitate the linguistic study of authorial position and reasoning in scientific articles through phraseology and lexico-grammatical markers linked to causality.
Tweets2011
As part of the TREC 2011 microblog track, Twitter provided identifiers for approximately 16 million tweets sampled between January 23rd and February 8th, 2011. The corpus is designed to be a reusable, representative sample of the twittersphere - i.e. both important and spam tweets are included.