20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
P. Wu, Y. Lee, H. Tseng, H. Ho, M. Yang, и S. Chien. 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct), стр. 186-191. IEEE Computer Society, (2017)
S. Bowman, G. Angeli, C. Potts, и C. Manning. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, (2015)