Abstract
Context-Based Machine TranslationTM
(CBMT) is a new paradigm for corpus-based translation that requires no parallel
text. Instead, CBMT relies on a light-weight translation model utilizing a fullform bilingual dictionary and a sophisticated decoder using long-range context
via long n-grams and cascaded overlapping. The translation process is enhanced
via in-language substitution of tokens and
phrases, both for source and target, when
top candidates cannot be confirmed or resolved in decoding. Substitution utilizes a
synonym and near-synonym generator implemented as a corpus-based unsupervised
learning process. Decoding requires a very
large target-language-only corpus, and
while substitution in target can be performed using that same corpus, substitution in source requires a separate (and
smaller) source monolingual corpus.
Spanish-to-English CBMT was tested on
Spanish newswire text, achieving a BLEU
score of 0.6462 in June 2006, the highest
BLEU reported for any language pair.
Further testing also shows that quality increases above the reported score as the
target corpus size increases and as dictionary coverage of source words and phrases
becomes more complete. 1
Users
Please
log in to take part in the discussion (add own reviews or comments).