20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
L. Baker, and A. McCallum. Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval, page 96--103. Melbourne, AU, ACM Press, New York, US, (1998)
G. Barbieri, F. Pachet, P. Roy, and M. Esposti. Proceedings of the 20th European Conference on Artificial Intelligence, page 115--120. Amsterdam, The Netherlands, The Netherlands, IOS Press, (2012)
F. Beil, M. Ester, and X. Xu. KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, page 436--442. New York, NY, USA, ACM Press, (2002)
F. Beil, M. Ester, and X. Xu. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, page 436--442. ACM Press, (2002)
S. Bloehdorn, P. Cimiano, A. Hotho, and S. Staab. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 20 (1):
87-112(May 2005)
S. Bloehdorn, and A. Hotho. Proceedings of the Workshop on Text-based Information Retrieval (TIR-04) at the 27th German Conference on Artificial Intelligence, (September 2004)
S. Bloehdorn, and A. Hotho. Proceedings of the Fourth IEEE International Conference on Data Mining, page 331-334. IEEE Computer Society Press, (November 2004)
S. Bloehdorn, and A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 70-87. (August 2004)
P. Cimiano, A. Hotho, and S. Staab. Proceedings of the Conference on Languages Resources and Evaluation (LREC), Lisbon, Portugal, ELRA - European Language Ressources Association, (May 2004)
I. Dhillon, Y. Guan, and J. Kogan. 2nd SIAM International Conference on Data Mining (Workshop on Clustering High-Dimensional Data and its Applications), (2002)
A. Hotho, S. Staab, and G. Stumme. Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD, volume 2838 of LNCS, page 217-228. (2003)
A. Hotho, and G. Stumme. Proceedings of FGML Workshop, page 37-45. Special Interest Group of German Informatics Society (FGML --- Fachgruppe Maschinelles Lernen der GI e.V.), (2002)
A. Hotho, A. Maedche, and S. Staab. Proc. of the Workshop ``Text Learning: Beyond Supervision'' at IJCAI 2001. Seattle, WA, USA, August 6, 2001, (2001)
A. Hotho, A. Maedche, and S. Staab. ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, page 607--608. Washington, DC, USA, IEEE Computer Society, (2001)
G. Ifrim, M. Theobald, and G. Weikum. Proceedings of the 22nd International Conference on Machine Learning - Learning in Web Search (LWS 2005), page 18--26. Bonn, Germany, (2005)
C. Kohlschütter, P. Fankhauser, and W. Nejdl. Proc. of 3rd ACM International Conference on Web Search and Data Mining New York City, NY USA (WSDM 2010)., (2010)
B. Lauser, and A. Hotho. Proc. of the 7th European Conference in Research and Advanced Technology for Digital Libraries, ECDL 2003, volume 2769 of LNCS, page 140-151. Springer, (2003)