The English Wikipedia (November '07 dump)
about 3,000,000 documents, with category information
The German Wikipedia (June '06 dump)
about 500,000 document
DBLP computer science articles (full text + meta data)
full text + meta data from about 20,000 conference articles listed at DBLP
DBLP computer science articles (meta data only)
meta data of all the 800,000+ articles from DBLP
The local libraries: MPII + CS department
about 35,000 items
Linux man pages (beta)
about 15,000 man pages from a Debian Linux installation (sarge)
PHP documentation (beta)
the official PHP reference from www.phpnet.net
Encyplopedias and a mailing list on homeopathic medicine
about 50,000 items
TREC Robust collection (with synonyms, experimental alpha-version)
about 500,000 news articles, with synonyms
Semantic Wikipedia (demo for SIGIR'07 paper, experimental alpha-version)
the whole English Wikipedia with semantic tags and relations