Abstract
This paper presents a method for splitting compound words into their constituents based on cognate words in the other language of a parallel corpus. A minor extension to the method allows the decompounding of words which do not have cognates in the other language. By decompounding the training corpus for an Example-Based MT system, the incidence of word alignment failure can be substantially reduced, yielding a modest improvement in performance.
Users
Please
log in to take part in the discussion (add own reviews or comments).