Rehashing Massively Multilingual Machine Translation in 2022

The year is 2022; the average tourist to Granada can whip out their smartphone and dictate—“where is the Alhambra palace?”—and have an automated voice respond: “¿Dónde está el palacio de la Alhambra?” But this is only scratching the surface of an impressive, rather miraculous area of research commonly known as machine learning, which allows for instantaneous cross-lingual translations of complicated, contextual sentences such as “The river shrinks and black crows gorge on bright mangoes in still, dustgreen trees” (Arundhati Roy, The God of Small Things).

DeepL, a neural machine translation service, would translate that to the Spanish language as “El río se encoge y los cuervos negros se atiborran de mangos brillantes en árboles inmóviles y polvorientos.” Despite the poetic, nearly transcreative nature of the original English text (“gorge on”, “dustgreen”), the heights of machine translation have somehow managed to capture the nature and complexity of the text without human guidance. It’s a feat of human genius; nothing seems impossible, even universal translation: the dream of not only linguistics experts but also a good majority of the world’s population who don’t speak the English lingua franca.

“Achieving universal translation between all human language pairs is the holy-grail of machine translation (MT) research”, writes Aditya Siddhant et al. in their latest, fresh-off-the-griddle research paper “Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning.” In the paper, the authors identify a serious problem with previous models of neural machine translation: namely, that it relies heavily on bilingual parallel sets of data to train language models—a costly, “unscalable” process for low-resource, minority languages without much linguistic data. To remedy this problem, the authors propose a “massively multilingual” machine translation model in a “mixture of supervised and self-supervised objectives” are used to translate hundreds of languages at a fraction of the horsepower and cost.

Bilingual mapping from Conneau et al. (2017) visualizing how unsupervised neural MT functions.

The researchers carried out a preliminary study using 15 multilingual models (to and from English), purposely leaving out parallel data for one language in each model to simulate a setting where there are no parallel data for all language pairs. The model was then compared to the performance of fully supervised multilingual baselines to great effect. The results of the research promise robust translation models that overcome the weaknesses of traditional bilingual unsupervised MT models. The massively multilingual model succeeds in improving the quality of low-resource language translation without the use of parallel data.

Afterwards, the research was scaled up to cover 200 languages, and the results were more ambiguous than the previous 15-language study. “For xx→en translation, translation quality is not well correlated with the amount of monolingual data available for the language… On the other hand, en→xx translation BLEU is high only for languages which have high xx→en translation quality,” writes the authors.

To sum up, languages with less bilingual data (Sindhi, Hawaiian, etc.) no longer rely on direct parallel analyses for translation; rather, these languages have the option to utilize pre-existing bilingual data from other languages to inform their own translation, a move researchers call “unsupervised” machine translation. A combination of unsupervised (i.e. self-supervised) objectives and supervised MT research can not only drastically improve the quality of translation between less-spoken languages, but also help massively multilingual MT make better use of monolingual data. “One could think of this as monolingual data and self supervised objectives… helping the model learn the language and the supervised translation in other language pairs teaching the model how to translate by transfer learning.”

While the findings are original and interesting, the concept is not entirely new, according to ModelFront CEO Adam Bittlingmayer, who tells Slator’s Seyma Albarino that “almost all competitive systems” now “utilize some target-side monolingual data, even for major language pairs.” After all, the research is only a continuation of an already-growing discourse on massively multilingual machine translation as a feasible method of large-scale machine translation.

A diagram from the MT group at the Universitat Politècnica de Catalunya charting recent developments in the field of unsupervised neural machine translation.


As a professional language service provider, Sprok DTS depends on the improvement of machine translation, which would aid our translators take on increasingly demanding tasks with less effort and more precision. We stay up to date on the latest AI technology and machine translation in an effort to best cater to the needs of our clients and customers.
And while the common dream of a universal language seems far off, this research on massive multilingual machine translation brings us ever so closer to a world of unadulterated communication and understanding, something all of us here at Sprok DTS strive for.