The Penn Treebank: An Overview

Zusammenfassung

The Penn Treebank, in its eight years of operation (1989--1996), produced approximately 7 million words of part-of-speech tagged text, 3 million words of skeletally parsed text, over 2 million words of text parsed for predicateargument structure, and 1.6 million words of transcribed spoken text annotated for speech disfluencies. This paper describes the design of the three annotation schemes used by the Treebank: POS tagging, syntactic bracketing, and disfluency annotation and the methodology employed in production. All available Penn Treebank materials are distributed by the Linguistic Data Consortium http://www.ldc.upenn.edu.

BibTeX-Schlüssel: Taylor2003
Eintragstyp: inbook
Adresse: Dordrecht
Buchtitel: Treebanks: Building and Using Parsed Corpora
Jahr: 2003
Seiten: 5--22
Verlag: Springer Netherlands
isbn: 978-94-010-0201-1
DOI: 10.1007/978-94-010-0201-1_1
URL: https://doi.org/10.1007/978-94-010-0201-1_1

BibSonomy

The Penn Treebank: An Overview

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf