Article,

Building a large annotated corpus of English : the Penn Treebank.

, , and .
Computational Linguistics, 19 (2): 313--330 (1993)

Abstract

There is a growing consensus that significant, rapid progress can be made in both text understanding and spoken language understanding by investigating those phenom- ena that occur most centrally in naturally occurring unconstrained materials and by attempting to automatically extract information about language from very large cor- pora. Such corpora are beginning to serve as important research tools for investigators in natural language processing, speech recognition, and integrated spoken language systems, as well as in theoretical linguistics. Annotated corpora promise to be valu- able for enterprises as diverse as the automatic construction of statistical models for the grammar of the written and the colloquial spoken language, the development of explicit formal theories of the differing grammars of writing and speech, the investi- gation of prosodic phenomena in speech, and the evaluation and comparison of the adequacy of parsing models.

Tags

Users

  • @quesada
  • @schaul
  • @lepsky
  • @brefeld
  • @idsia
  • @jil
  • @dblp
  • @seb
  • @nlp
  • @gerhard.wohlgenannt
  • @slicside
  • @diego_ma
  • @huiyangsfsu
  • @butonic
  • @lama
  • @diana

Comments and Reviews