jaj > corpus datasets

bookmarks (hide)7
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1UCI Knowledge Discovery in Databases (KDD) Archive
Online repository of large data sets for researchers in knowledge discovery and data mining. includes Discrete Sequence Data, Image Data, Multivariate Data, Relational Data, Spatio-Temporal Data, Text (corpora), Time Series, Web Data (web pages and log files).
12 years ago by @jaj
show all tags
data_archive
datasets
datamining
big_data
corpus
data_archivedatasetsdataminingbig_datacorpus
(0)
copydelete
- community post
- history of this post
3Home Page for 20 Newsgroups Data Set
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. The collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.
12 years ago by @jaj
show all tags
data
corpus
datasets
socialnetworking
datacorpusdatasetssocialnetworking
(0)
copydelete
- community post
- history of this post
12UCI Machine Learning Repository
data sets as a service to the machine learning community.
12 years ago by @jaj
show all tags
reference
data
corpus
datasets
datamining
machine-learning
referencedatacorpusdatasetsdataminingmachine-learning
(0)
copydelete
- community post
- history of this post
2The Institute for Language, Speech and Hearing
The Moby lexicon project is complete and has been place into the public domain. Use, sell, rework, excerpt and use in any way on any platform. 610,000+ words and phrases. The largest word list in the world and more.
12 years ago by @jaj
show all tags
linguistics
datasets
wordlists
corpus
linguisticsdatasetswordlistscorpus
(0)
copydelete
- community post
- history of this post
2The ClueWeb09 Dataset
The dataset consists of 1 billion web pages, in ten languages, collected in January and February 2009.
12 years ago by @jaj
show all tags
corpus
datasets
web
corpusdatasetsweb
(0)
copydelete
- community post
- history of this post
2Access to Web Research Collections WT2G/WT10G/GOV/GOV2/Blog06/Blog08
http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html
12 years ago by @jaj
show all tags
corpus
datasets
corpusdatasets
(0)
copydelete
- community post
- history of this post
1WebBase Project
The Stanford WebBase project has been collecting topic focused snapshots of Web sites. All the resulting archives are available to the public via fast download streams. For example, we collected pages from 350 sites every day for several weeks after the Katrina hurricane disaster. We also collect pages from government Web sites on a regular basis.
12 years ago by @jaj
show all tags
dlib
datasets
corpus
archive
web
harvest
govdocs
dlibdatasetscorpusarchivewebharvestgovdocs
(0)
copydelete
- community post
- history of this post

⟨⟨
⟨
1
⟩
⟩⟩

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

No matching posts.

⟨⟨
⟨
⟩
⟩⟩

BibSonomy

bookmarks (hide)7
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1UCI Knowledge Discovery in Databases (KDD) Archive

3Home Page for 20 Newsgroups Data Set

12UCI Machine Learning Repository

2The Institute for Language, Speech and Hearing

2The ClueWeb09 Dataset

2Access to Web Research Collections WT2G/WT10G/GOV/GOV2/Blog06/Blog08

1WebBase Project

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

browse

related tags

concepts

tags

bookmarks (hide)7 displayallbookmarks onlybookmarks per page5102050100 sort byadded attitle RSSBibTeXXML

publications (hide) displayallpublications onlypublications per page5102050100 sort byadded attitleauthorpublication dateentry typehelp for advanced sorting... RSSBibTeXRDFmore...

browse

related tags

tags

bookmarks (hide)7
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...