Abstract

This paper reports an experience on producing manual word alignments over six different language pairs (all combinations between Portuguese, English, French and Spanish) (Graça et al., 2008). Word alignment of each language pair is made over the first 100 sentences of the common test set from the Europarl corpora (Koehn, 2005), corresponding to 600 new annotated sentences. This collection is publicly available at

Description

CiteSeerX — Building a golden collection of parallel Multi-Language Word Alignments

Links and resources

Tags