Article,

Building a large annotated corpus of English : the Penn Treebank.

M. Marcus, B. Santorini, and M. Marcinkiewicz.
Computational Linguistics, 19 (2): 313--330 (1993)

Abstract

There is a growing consensus that significant, rapid progress can be made in both text understanding and spoken language understanding by investigating those phenom- ena that occur most centrally in naturally occurring unconstrained materials and by attempting to automatically extract information about language from very large cor- pora. Such corpora are beginning to serve as important research tools for investigators in natural language processing, speech recognition, and integrated spoken language systems, as well as in theoretical linguistics. Annotated corpora promise to be valu- able for enterprises as diverse as the automatic construction of statistical models for the grammar of the written and the colloquial spoken language, the development of explicit formal theories of the differing grammars of writing and speech, the investi- gation of prosodic phenomena in speech, and the evaluation and comparison of the adequacy of parsing models.

BibTeX key: marcus_building_1993
entry type: article
year: 1993
journal: Computational Linguistics
number: 2
pages: 313--330
volume: 19
language: Englisch
url: http://dblp.uni-trier.de/db/journals/coling/coling19.html#MarcusSM94

BibSonomy

Building a large annotated corpus of English : the Penn Treebank.

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on