@seandalai

German Compounds in Factored Statistical Machine Translation

. Proceedings of the 6th International Conference on Natural Language Processing (GoTAL-08), Gothenburg, Sweden, (2008)

Abstract

An empirical method for splitting German compounds is explored by varying it in a number of ways to investigate the consequences for factored statistical machine translation between English and German in both directions. Compound splitting is incorporatedinto translation in a preprocessing step, performed on training data and on German translation input. For translation intoGerman, compounds are merged based on part-of-speech in a postprocessing step. Compound parts are marked, to separate them from ordinary words. Translation quality is improved in both translation directions and the number of untranslated words inthe English output is reduced. Different versions of the splitting algorithm performs best in the two different translation directions.

Links and resources

Tags

community

  • @dblp
  • @seandalai
@seandalai's tags highlighted