Teil eines Buches,

The Penn Treebank: An Overview

, , und .
Seite 5--22. Springer Netherlands, Dordrecht, (2003)
DOI: 10.1007/978-94-010-0201-1_1


The Penn Treebank, in its eight years of operation (1989--1996), produced approximately 7 million words of part-of-speech tagged text, 3 million words of skeletally parsed text, over 2 million words of text parsed for predicateargument structure, and 1.6 million words of transcribed spoken text annotated for speech disfluencies. This paper describes the design of the three annotation schemes used by the Treebank: POS tagging, syntactic bracketing, and disfluency annotation and the methodology employed in production. All available Penn Treebank materials are distributed by the Linguistic Data Consortium http://www.ldc.upenn.edu.



  • @paulheinisch

Kommentare und Rezensionen