group :: uw_ss18_web20

bookmarks (hide)44
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1Jstor - Data for research
http://about.jstor.org/service/data-for-research
10 years ago by @schwemmlein
show all tags
literature
dataset
jstor
literaturedatasetjstor
copydelete
- community post
- history of this post
1L3S Twitter Crawler
https://github.com/L3S/twitter-crawler
7 years ago by @dallmann
show all tags
dataset
data_collection
twitter
crawler
l3s
datasetdata_collectiontwittercrawlerl3s
copydelete
- community post
- history of this post
3Leipzig Corpora Collection (LCC) - the Datahub
Deutscher Wortschatz contains data generated from newspapers and web resources that are publicly available. The data were collected per language and encompass statistics about co-occurrences of words in randomly selected sentences.
10 years ago by @schwemmlein
show all tags
german
dataset
text
lcc
digital_humanities
germandatasettextlccdigital_humanities
copydelete
- community post
- history of this post
5Leipzig Corpora Collection - Wortschatz
http://corpora.informatik.uni-leipzig.de/
12 years ago by @schwemmlein
show all tags
wortschatz
leipzig
dataset
collection
corpora
wortschatzleipzigdatasetcollectioncorpora
copydelete
- community post
- history of this post
25Networked Digital Library of Theses and Dissertations (NDLTD)
Welcome to the Networked Digital Library of Theses and Dissertations (NDLTD), an international organization dedicated to promoting the adoption, creation, use, dissemination, and preservation of electronic theses and dissertations (ETDs). We support electronic publishing and open access to scholarship in order to enhance the sharing of knowledge worldwide. Our website includes resources for university administrators, librarians, faculty, students, and the general public.
10 years ago by @schwemmlein
show all tags
thesis
dataset
dissertation
text
thesisdatasetdissertationtext
copydelete
- community post
- history of this post
1NLP datasets
Data sets I have developed and used in my research.
7 years ago by @schwemmlein
show all tags
sentiment
hyponym
semantic
dataset
twitter
data
relation
analysis
extraction
sentimenthyponymsemanticdatasettwitterdatarelationanalysisextraction
copydelete
- community post
- history of this post
2NLTK Data
http://www.nltk.org/nltk_data/
10 years ago by @schwemmlein
show all tags
dataset
nltk
text
digital_humanities
datasetnltktextdigital_humanities
copydelete
- community post
- history of this post
1novels-project · GitHub
The dataset genres.json contains (sub)genre classifications for novels published between 1770 and 1915. The genres covered are gothic novels "silver fork" novels national tale novels
9 years ago by @schwemmlein
show all tags
literature
dataset
english
genre
digital_humanities
novels
literaturedatasetenglishgenredigital_humanitiesnovels
copydelete
- community post
- history of this post
1Open Research Corpus: Public Datasets of Scholarly Research Papers
http://labs.semanticscholar.org/corpus/
6 years ago by @schwemmlein
show all tags
set
dataset
corpus
data
publications
scholar
research
setdatasetcorpusdatapublicationsscholarresearch
copydelete
- community post
- history of this post
2Opinosis Dataset - Topic related review sentences | Kavita Ganesan
http://kavita-ganesan.com/opinosis-opinion-dataset
9 years ago by @schwemmlein
show all tags
summarization
code
dataset
text_mining
text
summarizationcodedatasettext_miningtext
copydelete
- community post
- history of this post
1Original data of NYT10
NYT10 is originally released by the paper "Sebastian Riedel, Limin Yao, and Andrew McCallum. Modeling relations and their mentions without labeled text."
7 years ago by @schwemmlein
show all tags
dataset
relation
extraction
datasetrelationextraction
copydelete
- community post
- history of this post
1PAN
PAN is a series of scientific events and shared tasks on digital text forensics and stylometry
6 years ago by @schwemmlein
show all tags
task
dataset
text
stylometry
classification
taskdatasettextstylometryclassification
copydelete
- community post
- history of this post
22Perseus Digital Library
Our flagship collection, under development since 1987, covers the history, literature and culture of the Greco-Roman world. We are applying what we have learned from Classics to other subjects within the humanities and beyond.
10 years ago by @schwemmlein
show all tags
dataset
perseus
text
digital_humanities
datasetperseustextdigital_humanities
copydelete
- community post
- history of this post
1Resourceful Reading datasets | Katherine Bode
These datasets represent months of work collecting and collating the information in AustLit on Australian novels
9 years ago by @schwemmlein
show all tags
dataset
australian
novels
datasetaustraliannovels
copydelete
- community post
- history of this post
1Resources for Abstractive Summarization of Long Documents
Resources for the NAACL 2018 paper "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents" - armancohan/long-summarization
3 years ago by @schwemmlein
show all tags
summarization
nlp
code
dataset
document
data
resources
naacl
discourse
summarizationnlpcodedatasetdocumentdataresourcesnaacldiscourse
copydelete
- community post
- history of this post
1SentiMerge/data at master · guyemerson/SentiMerge · GitHub
SentiMerge - Merging Sentiment Lexicons in a Bayesian Framework. Produced in the TrendMiner Project.
9 years ago by @schwemmlein
show all tags
sentiment
romane
dataset
sentimerge
analysis
novels
sentimentromanedatasetsentimergeanalysisnovels
copydelete
- community post
- history of this post
1Snorkel
Snorkel is a system for programmatically building and managing training datasets without manual labeling. In Snorkel, users can develop large training datasets in hours or days rather than hand-labeling them over weeks or months.
3 years ago by @schwemmlein
show all tags
system
dataset
labeling
automatically
data
training
systemdatasetlabelingautomaticallydatatraining
copydelete
- community post
- history of this post
1StereoSet measures racism, sexism, and other forms of bias in AI language models | VentureBeat
https://venturebeat.com/2020/04/22/stereoset-measures-racism-sexism-and-other-forms-of-bias-in-ai-language-models/
4 years ago by @schwemmlein
show all tags
lm
nlp
models
dataset
language
bias
lmnlpmodelsdatasetlanguagebias
copydelete
- community post
- history of this post
1The Blog Authorship Corpus
The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person.
9 years ago by @schwemmlein
show all tags
dataset
text_mining
author
authorship
blog
datasettext_miningauthorauthorshipblog
copydelete
- community post
- history of this post
1The Last.fm Dataset | Million Song Dataset
http://labrosa.ee.columbia.edu/millionsong/lastfm#numbers
12 years ago by @schwemmlein
show all tags
dataset
2011
last.fm
dataset2011last.fm
copydelete
- community post
- history of this post

publications (hide)3
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

2CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
N. Nangia, C. Vania, R. Bhalerao, and S. Bowman. EMNLP, (2020)cite arxiv:2010.00133Comment: EMNLP 2020.
4 years ago by @schwemmlein
show all tags
lm
models
dataset
language
bias
lmmodelsdatasetlanguagebias
copydeleteadd this publication to your clipboard
7Evaluation of post-hoc XAI approaches through synthetic tabular data
J. Tritscher, M. Ring, D. Schlör, L. Hettinger, and A. Hotho. (2020)Accepted but not published.
5 years ago by @schwemmlein
show all tags
myown
dataset
evaluation
explainable
XAI
myowndatasetevaluationexplainableXAI
copydeleteadd this publication to your clipboard
3One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, and T. Robinson. (2013)
6 years ago by @schwemmlein
show all tags
nlp
news
dataset
billion
language
model
benchmark
nlpnewsdatasetbillionlanguagemodelbenchmark
copydeleteadd this publication to your clipboard

⟨⟨
⟨
1
⟩
⟩⟩

BibSonomy

bookmarks (hide)44
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1Jstor - Data for research

1L3S Twitter Crawler

3Leipzig Corpora Collection (LCC) - the Datahub

5Leipzig Corpora Collection - Wortschatz

25Networked Digital Library of Theses and Dissertations (NDLTD)

1NLP datasets

2NLTK Data

1novels-project · GitHub

1Open Research Corpus: Public Datasets of Scholarly Research Papers

2Opinosis Dataset - Topic related review sentences | Kavita Ganesan

1Original data of NYT10

1PAN

22Perseus Digital Library

1Resourceful Reading datasets | Katherine Bode

1Resources for Abstractive Summarization of Long Documents

1SentiMerge/data at master · guyemerson/SentiMerge · GitHub

1Snorkel

1StereoSet measures racism, sexism, and other forms of bias in AI language models | VentureBeat

1The Blog Authorship Corpus

1The Last.fm Dataset | Million Song Dataset

publications (hide)3
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

2CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

7Evaluation of post-hoc XAI approaches through synthetic tabular data

3One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

uw_ss18_web20

browse

related tags

tags

bookmarks (hide)44 displayallbookmarks onlybookmarks per page5102050100 sort byadded attitle RSSBibTeXXML

publications (hide)3 displayallpublications onlypublications per page5102050100 sort byadded attitleauthorpublication dateentry typehelp for advanced sorting... RSSBibTeXRDFmore...

uw_ss18_web20

browse

related tags

tags

bookmarks (hide)44
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)3
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...