An Insider’s Guide to the History of Gender Bias in Google Translate

If you’ve ever used a translation engine online, you’ve most likely come across an error in translation. Some errors are small: grammatical mistakes, errors in word choice, etc. Some errors are graver: accidental swear words, culturally insensitive mistranslations, and so forth. But no problem has sparked as much outrage and inspired as much innovation as gender bias in machine translation.

Google Translate caused quite a stir on Twitter around this time last year, when history professor Dora Vargha posted this screenshot:

Image Credits: Dora Vargha

The input is a series of sentences in Hungarian—a gender-neutral language—translated via Google into English. What’s striking is the blatant yet familiar gender bias, the way Google attributes certain words and phrases (beautiful, washes the dishes, sews, cooks, etc.) to feminine pronouns, and others (clever, reads, teaches, makes a lot of money, etc.) to masculine pronouns. 

The picture, now retweeted 12.9K times, has drawn a considerable amount of attention and debate and marks a critical problem in machine translation: the transfer of human bias into machine translation language models. Soon after the photo went viral, Google made considerable improvements to the interface to allow for nuanced translations of gendered sentences, but Google’s been struggling with this issue for quite a while now. 

On December 6, 2018, Google Translate product manager James Kuczmarski admitted in a public announcement that Google Translate “inadvertently replicated gender biases that already existed” and promised its users “both a feminine and masculine translation for a single word.” But the initial update only provided multiple translations for four major languages (French, Italian, Portuguese, and Spanish). The process of eliminating gender bias from the model is still an ongoing battle, as evidenced by Vargha’s Hungarian translation, more than two years after Kuczmarski’s announcement. 

 

The actual process of eliminating gender bias is quite simple, senior software engineer Melvin Johnson explains in a follow-up article, “Providing Gender-Specific Translations in Google Translate.” There are three steps: detect gender-neutral queries, generate gender-specific translations, and check for accuracy. Johnson’s team uses Turkish as an example for a morphologically complex language, devoid of simple gender-neutral pronoun lists and thus requiring a machine-learned system. 

Hence, the first step is to determine whether an input query is masculine, feminine, or gender-neutral in its nature. For this Johnson used “state-of-the-art text classification algorithms” and trained the model on “thousands of human-rated Turkish examples.” The result of the first step is a “convolutional neural network that can accurately detect queries which require gender-specific translations.”

Once a query is classified into one of the three gender classifications, the next step is to match it with a corresponding output—but only after an intense scrutinization of the gender of the query. Johnson’s team improves on their “underlying Neural Machine Translation (NMT) system,” which produces gendered translations when requested and default translations when no gender is requested. If a query is gender-neutral, the NMT model would add a gender prefix to the translation request. 

The final step of Johnson’s update is to check for accuracy. Johnson sums up the process like so:

Putting it all together, input sentences first go through the classifier, which detects whether they’re eligible for gender-specific translations. If the classifier says “yes”, we send three requests to our enhanced NMT model—a feminine request, a masculine request and an ungendered request. Our final step takes into account all three responses and decides whether to display gender-specific translations or a single default translation.

Johnson notes that this is only the beginning to addressing gender bias in machine-translation systems. Google has a long way to go, especially for genderless languages such as Hungarian, Malay, Finnish, Swahili.

 

A year and a half later, Johnson returns to report that the NMT has issues in scaling; the system resulted in “low recall, failing to show gender-specific translations for up to 40% of eligible queries.” To solve this, Johnson presents a completely new paradigm to bias elimination: rewriting-based gender-specific translation, which looks something like the following.

 

Fast forward a year. Mid-2021, Google Translate product manager Romina Stella introduces yet another development in bias elimination: contextual gender-bias elimination. Stella uses Wikipedia biographies to develop a dataset of English-to-Spanish translations that utilize context to better identify the correct gender of subjects. The results are as seen below:

Image Credits: Romina Stella, “A Dataset for Studying Gender Bias in Translation.” Above: Translation result with the previous NMT model. Below: Translation result with the new contextual model.

There is already a noticeable difference in the accuracy of gender classification. While Stella does admit that this dataset “doesn’t aim to cover the whole problem,” this development is noteworthy in its “aims to foster progress on this challenge across the global research community.”

Coming back to Vargha’s experiment on Twitter, it’s uncanny to see how far machine translation has come in the past few decades, yet stumble on such an essential linguistic concept as that of gendered pronouns. Much of it can be attributed to how systematically biased languages are in their daily use, and how male-dominated the machine translation field is. This issue of gender bias also goes to show how language is complex and counterintuitive, so much so that we still have yet to perfect a system to classify queries into one of three—just three—gender categories (male, female, non-binary) in widespread use. 

But it’s people like Vargha, Stella, and Johnson who shed light on these shortcomings of machine translation, pointing us in the right direction. Until machine translation proves successful, human translators do the noble work of correctly identifying gendered subjects and providing nuanced, unbiased translations in an effort to veer away from sexist modes of language. 

What has your experience been like with Google Translate and other frontend translation machines? Have you spotted any instances of gender bias—or other kinds of bias—in the translations you were working with? How has it impacted your work and your thoughts about machine translation?

Here at SDTS, our translators and localization experts are attentive to the ways in which machine translation iterates bias; we make sure our solutions are bias-free and inclusive. If you’re looking for a translation and localization service, give SDTS a try: our team of translators and localization experts ensure that your translations are of the utmost integrity and accuracy.

 

References
https://twitter.com/DoraVargha/status/1373211762108076034
https://ai.googleblog.com/2018/12/providing-gender-specific-translations.html
https://ai.googleblog.com/2020/04/a-scalable-approach-to-reducing-gender.html
https://ai.googleblog.com/2021/06/a-dataset-for-studying-gender-bias-in.html