The Future of Multilingual Models: WMT21 in Review

Most of us have heard of machine translation: humans have harnessed the power of artificial intelligence to move a text from one language to another. But not many stay up to date on the latest developments in machine translation. What does the frontier of machine translation look like today? What kinds of issues concern researchers, and how do they relate to our lives? In today’s blog post, we hope to cover some of these questions by exploring the results of the 2021 Conference on Machine Translation. 

 

About the Conference

Last year, Punta Cana held the Sixth Conference on Machine Translation (WMT), which took place both in-person and online on November 10-11. An annual gathering of data scientists, linguists, students, and other professionals in related fields, the WMT21 continues the legacy of 15 WMT workshops and conferences, culminating in what is perhaps the most important event in the field  of machine translation. 

The main function of the conference is to partake in shared tasks in the field of machine translation; many of the tasks are repeated year after year to chart the progress of machine translation development. Such is a dry description of the kind of wonders that take place at WMT. 

While machine translation—and academic conferences in general—can be daunting for non-professionals, it’s important to keep in mind that events such as WMT reveal much about the status and direction of linguistics and how it can impact our immediate livelihoods. For example, WMT21 hosted a joint task on machine translation of news—critical technology capable of saving lives in war zones—and one on multilingual low-resource translation—which will one day allow previously marginalized linguistic groups to communicate with the rest of the world with great fluency and depth. 

Of the numerous shared tasks featured in WMT21, four stand out in this 88-page research paper, which document the findings of the conference. They are as follows: a news translation task, a Triangular MT translation task, a multilingual low-resource translation task, and an automatic post-editing (APE) task. For each task, participants were asked to build machine translation systems to perform translation or editing procedures. 

 

News Task

News translation is what pops into many people’s minds when they think of translation: it’s visceral, immediate, and relevant to current issues, whether it’s live captions on breaking news reports or more carefully localized coverage of issues happening elsewhere in the world. The Associated Press correspondent Philip Crowther made headlines with his 6-language coverage of the situation in Ukraine, dazzling viewers worldwide. Switching in and out of English, Spanish, French, German, Louxembourgish, and Portuguese, Crowther embodies the spirit of translation and its power to facilitate communication across borders and languages. 

As important as news translation is, participants were asked to “build machine translation systems for any of 10 language pairs, to be evaluated on test sets consisting mainly of news stories.” Machine translations into English were evaluated “against human reference translation”; translations out of English and non-English pairs were evaluated against the source text. 

In all, research found that document context was “extremely important for evaluation of high-quality MT systems… but tends to be surpassed by top systems when sentences are evaluated in isolation.” Such findings contribute to a more effective machine translation, revealing how factors—such as formatting—impact the quality of translation. With more research in this field, we might see more accurate machine translation of real-time news—an automated Crowther, if you will—covering important issues in the world and disseminating critical information to populations that need it.

 

Triangular MT Task

For the longest time, English has maintained a hegemony in the realm of translation. Aside from having the most source text to work with—online and offline—English is the lingua franca of the times, meaning translation research mainly focuses on English, and to a certain extent, other high-resource languages with much source text. But there are so many major languages out there, languages that are not mediated by or governed by the hegemony of English. Russian-to-Chinese translation (and vice versa) is the best example of this; as geographic neighbors and economic powerhouses in Asia, they don’t need, and perhaps would prefer not to use, English as a medium through which their languages traverse. 

The goal of the Triangular MT shared task is to promote “translation between non-English languages” by “optimally mixing direct and indirect parallel resources.” Triangular translation between Russian and Chinese was the main focus of this task; researchers evaluated system translations on a “mixed-genre test set,” and the translations were then evaluated on BLEU. 

While no definitive conclusions were drawn, the task served as a space for researchers to explore modeling choices and data augmentation strategies for translating between non-English language pairs. Moving away from English-centric models is a prerequisite for comprehensive multilingual machine translation and, possibly, a universal translator. After all, there is much in the world that goes on outside of the immediate Anglophone realm. 

 

Multilingual Low-Resource Translation Task

The star of WMT21, however, is the multilingual low-resource translation task, specifically for Indo-European languages. The task is aimed at utilizing data garnered from translations from and into English and exploring opportunities to transfer models to low-resource pairs. Research revealed that “the best performing systems used multilingual supervised machine translation models enriched with backtranslated data and additional sentences from higher-resourced languages in the same family.” 

Research into low-resource translation is particularly meaningful, as minority languages have often been ignored in translation research, not only due to its invisibility, but also due to the sheer lack of source texts to work with. However, translation between low-resource languages is still just as necessary as one between major languages; a majority of the world’s 7,000+ languages are spoken by small groups of people around the world. They deserve to have their voices heard, too.

 

Automatic Post-Editing Task

Post-editing is a relatively new translation paradigm in which a text is initially translated by a machine, then edited by a human translator. In WMT21’s automatic post-editing (APE) task, participants were asked to develop systems that are capable of correcting errors made by unknown machine translation systems. In other words, researchers are working to come up with better correction models to lessen human intervention in the process of translation. 

This is a procedure that, if developed further, would drastically change and innovate the face of translation. While the task suffered from a drop in participation, sadly, human evaluations of the task still revealed “significant gains by all runs, attesting… the effectiveness of the proposed methods” suggested by participating teams. While it doesn’t look like machines will completely take over the translation process anytime soon, developments in post-editing will help translators work smarter and faster. 

 

WMT21 carries out this research not only to make progress in machine translation, but also to provide a publicly accessible database for other parties to use, learn from, and integrate into their own research. Among the participants were representatives of Meta (formerly known as Facebook), Microsoft, and Apple; the findings of WMT21 prefigure upcoming developments in these widespread applications and systems. In that sense, the work done by inquisitive minds at WMT21 have widespread ramifications in the fields of news translation, multilingual (non-English) translation, low-resource language translation, and post-editing. 

If you’re curious about the current state of translation, ask for a quote and start your journey with SDTS. We utilize the latest advancements in technology to make sure our clients receive the best, most accurate translations possible—in all 72 of the languages we offer. Visit our website today and revel in the wonders of modern translation today. 

 

References
https://www.statmt.org/wmt21/
https://aclanthology.org/2021.wmt-1.1.pdf
https://www.huffpost.com/entry/philip-crowther-six-languages-ukraine_n_621471a0e4b0ef74d725483f