voiD (from "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets. With voiD the discovery and usage of linked datasets can be performed both effectively and efficiently.
Infochimps.org
Free Redistributable Rich Data Sets
There are many sources to find out something about everything. Until now, there’s been no good place for you to find out everything about something.
The infochimps.org community is assembling and interconnecting the world's best repository for raw data -- a sort of giant free allmanac, with tables on everything you can put in a table. Built by data nerds, used by data nerds, it's a central source for the information you need to power the projects the world needs.
datasets like Wikipedia Data Dumps, 2000 Movie Reviews, & UPC Database are difficult to recreate, have high levels of accuracy, are valuable...as this becomes easier to access, the value of these datasets decreases over time.