Abstract
Large language models (LLMs) have demonstrated remarkable potential in
handling multilingual machine translation (MMT). In this paper, we
systematically investigate the advantages and challenges of LLMs for MMT by
answering two questions: 1) How well do LLMs perform in translating a massive
number of languages? 2) Which factors affect LLMs' performance in translation?
We evaluate popular LLMs, including XGLM, OPT, BLOOMZ, and ChatGPT, on 102
languages. Our empirical results show that even the best model ChatGPT still
lags behind the supervised baseline NLLB in 83.33% of translation directions.
Through further analysis, we discover that LLMs exhibit new working patterns
when used for MMT. First, prompt semantics can surprisingly be ignored when
given in-context exemplars, where LLMs still show strong performance even with
unreasonable prompts. Second, cross-lingual exemplars can provide better task
instruction for low-resource translation than exemplars in the same language
pairs. Third, we observe the overestimated performance of BLOOMZ on dataset
Flores-101, indicating the potential risk when using public datasets for
evaluation.
Description
Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis
Links and resources
Tags
community