20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.
A. Dulny, A. Hotho, und A. Krause. Machine Learning and Knowledge Discovery in Databases: Research Track, Seite 438--455. Cham, Springer Nature Switzerland, (2023)
Y. Song, L. Zhang, und C. Giles. CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining, Seite 93--102. New York, NY, USA, ACM, (2008)