@jil

Simple unsupervised grammar induction from raw text with cascaded finite state models

, , and . Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, page 1077--1086. Stroudsburg, PA, USA, Association for Computational Linguistics, (2011)

Abstract

We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing---the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsupervised parser, Seginer's (2007) CCL. These finite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. Finally, we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries, both in our system and in CCL.

Description

Simple unsupervised grammar induction from raw text with cascaded finite state models

Links and resources

Tags

community

  • @dblp
  • @jil
@jil's tags highlighted