20 Newsgroups
Abstract
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Information files:
description of the data
Data files:
20_newsgroups.tar.gz (17.3M; 61.6M uncompressed)
mini_newsgroups.tar.gz A subset composed of 100 articles from each newsgroup. (1.9M; 6.2M uncompressed)
Handwritten annotations in books are an important key to understand how historical readers used their books. ABO aims to bring these books together. It is a digital library that reveals the variety of traces that readers left in their books. These examples were previously dispersed over many different libraries in the world. Yet it is also a digital laboratory, where visitors can work together: ABO has tools to enrich the early modern annotations with transcriptions and translations. ABO seeks to encourage collaboration.
This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.
Y. Song, L. Zhang, and C. Giles. CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining, page 93--102. New York, NY, USA, ACM, (2008)
A. Dulny, A. Hotho, and A. Krause. Machine Learning and Knowledge Discovery in Databases: Research Track, page 438--455. Cham, Springer Nature Switzerland, (2023)