20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
D. Wangsadirdja, F. Heinickel, S. Trapp, A. Zehe, K. Kobs, and A. Hotho. Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), page 1235--1243. Seattle, United States, Association for Computational Linguistics, (July 2022)
D. Wangsadirdja, J. Pfister, K. Kobs, and A. Hotho. Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), page 1090--1095. Toronto, Canada, Association for Computational Linguistics, (July 2023)