TIR 2010
7th International Workshop on Text-based Information Retrieval
in conjunction with DEXA 2010
University of Deusto
Bilbao, Spain
30 August - 3 September 2010
20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
CiteXplore combines literature search with text mining tools for biology.
Search results are cross referenced to EBI applications based on publication identifiers.
Links to full text versions are provided where available.
G. Barbieri, F. Pachet, P. Roy, and M. Esposti. Proceedings of the 20th European Conference on Artificial Intelligence, page 115--120. Amsterdam, The Netherlands, The Netherlands, IOS Press, (2012)
D. Nguyen, N. Smith, and C. Rosé. Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, page 115--123. Stroudsburg, PA, USA, Association for Computational Linguistics, (2011)
X. Zhang, and Y. LeCun. (2015)cite arxiv:1502.01710Comment: This technical report is superseded by a paper entitled "Character-level Convolutional Networks for Text Classification", arXiv:1509.01626. It has considerably more experimental results and a rewritten introduction.
C. Kohlschütter, P. Fankhauser, and W. Nejdl. Proc. of 3rd ACM International Conference on Web Search and Data Mining New York City, NY USA (WSDM 2010)., (2010)