voiD (from "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets. With voiD the discovery and usage of linked datasets can be performed both effectively and efficiently. A dataset is a collection of data, published and maintained by a single provider, available as RDF, and accessible, for example, through dereferenceable HTTP URIs or a SPARQL endpoint.
The new TRC2 corpus comprises 1,800,370 news stories covering the period from 2008-01-01 00:00:03 to 2009-02-28 23:54:14 or 2,871,075,221 bytes, and is initially made available to participants of the blog track at the Text Retrieval Conference (TREC), to supplement the BLOG08 corpus (that contains results of a large blog crawl carried out at the University of Glasgow), which is the main corpus used at the TREC Blog Track.
by: John Erickson. January 19, 2010. DataCite and linked data — or, more to the point, the DOI and linked data — are in essence made for each other. A longer answer is that the DOI infrastructure provides conveniences, such as multiple resolution, and also certain advantages, such as security, as they pertain to referencing and accessing scientific and other datasets. The bottom line is that while the DOI infrastructure does depend upon the non-HTTP protocols of the Handle System “under the hood,” from the consumer’s perspective DOI-based name resolution can (and usually does) operate completely within the “web space.” For linking to articles or datasets, the more familiar URI form of DOIs which combines a given DOI with the URL of a Handle System proxy (e.g. http://dx.doi.org/10.1109/MIC.2009.93) may be used instead of the “native” DOI form.
NSF, Division of Science Resources Statistics (SRS). The Survey of Earned Doctorates (SED) began in 1957–58 to collect data continuously on the number and characteristics of individuals receiving research doctoral degrees from all accredited U.S. institutions. The results of this annual survey are used to assess characteristics and trends in doctorate education and degrees.
This memo provides information for the Internet community interested in distributing data or databases under an “open access” structure. There are several definitions of “open” and “open access” on the Internet, including the Open Knowledge Definition and the Budapest Declaration on Open Access; the protocol laid out herein is intended to conform to the Open Knowledge Definition and extend the ideas of the Budapest Declaration to data and databases.
A variety of tools for working with datasets. data visualization, mapping, blogs, data catalogs, R tutorial, web crawlers, data mashers, code, lists of projects, APIs, etc.
a list of digital libraries, data archives, and data repositories that are inviting Digging into Data researchers to use their collections. For each repository, you'll find a description of their contents, contact information, and other details.
In our experience, one of the most effective tactics for eliciting datasets for the collection is a simple librarian-researcher interview. In this poster, we share a set often questions that a librarian can use as a starting point for such a “data interview”. It is a practical tool to draw out information that needs to be considered in order to evaluate the suitability of a dataset for the collection and the requirements for the infrastructure and services that will be needed for data curation.
The Panton Principles are a set of recommendations that address how best to make published data from scientific studies available for re-use. In this context, “published” means “made public” and is not restricted to formal publication in the scholarly literature.
A table summarises the coverage of main UK research funders' policies and the support infrastructure provided. Clarifications and links to the policies and guidance are available in the sections that follow.