This data set consists of 20000 messages taken from 20 Usenet newsgroups.
description of the data
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
C. Luo, Y. Li, and S. Chung. Data & Knowledge Engineering68 (11):
1271 - 1288(2009)Including Special Section: Conference on Privacy in Statistical Databases (PSD 2008) - Six selected and extended papers on Database Privacy.