Corpus-Driven Splitting of Compound Words

Abstract

This paper presents a method for splitting compound words into their constituents based on cognate words in the other language of a parallel corpus. A minor extension to the method allows the decompounding of words which do not have cognates in the other language. By decompounding the training corpus for an Example-Based MT system, the incidence of word alignment failure can be substantially reduced, yielding a modest improvement in performance.

BibTeX key: Brown:02
entry type: inproceedings
booktitle: Proceedings of the Ninth International Conference on Theoretical and Methodological Issues in Machine Translation
year: 2002
Document: http://www-2.cs.cmu.edu/~ralf/papers/tmi02.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

BibSonomy

Corpus-Driven Splitting of Compound Words

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on