Sprok DTS Blog

Google Area 120 Introduces Aloud, an Instantaneous Dubbing Program

By now, most of us are accustomed to auto-generated subtitles on YouTube videos. Some of us might miss the good old days of community captions and wonder why that feature ever left the platform (this post on Data Horde is required reading for anyone interested in the issue.)

But it’s 2022, and instead of subtitles, we now have dubs. From Google’s Area 120 incubator—a testing ground for up-and-coming applications—comes Aloud, a program that allows content creators to quickly and easily dub their videos into multiple languages. The product has yet to be released, but it already performs exceptionally well dubbing English videos to Spanish, Portuguese, Hindi, and Indonesian.

Why Aloud?

The founders of Aloud—Buddhika Kottahachchi and Sasakthi Abeysinghe—hope to make dubbing a more affordable practice and to bridge communication across languages. People on the other side of the globe can enjoy a content creator’s videos without having the learn the language. It’s important to note, however, that dubbing, as an aural format, doesn’t cater to deaf people enjoying YouTube videos (a whole important issue in itself), but it will allow other previously disenfranchised members—blind people, for example—to enjoy international content.

Kottahachchi and Abeysinghe mention how they were inspired by their childhoods in Sri Lanka, learning to read and understand English to learn more about the world. Their friends who did not pick up English, however, had a much harder time connecting with the world. Adding to this is the sheer number of global viewers who use video to learn (46%) and the number of people who use English (24%). The founders also mention major differences between subtitles and dubbing: subtitles are not ideal on mobile devices, require constant attention to the screen, and can be hard to read for people with visual or reading impairments.

How does it work?

Aloud uses “advances in audio separation, machine translation and speech synthesis” to reduce time spent on dubbing, translation, video editing, and audio production, allowing content creators to dub their videos without much effort or resources. Creators have the option of uploading a transcription to be used for the dub or using generated text transcriptions as a base for dubs. Aloud’s founders plan to release the program free of cost.

YouTube has yet to allow for multiple audio tracks for its videos, but the company is currently testing it; once multiple tracks are available, viewers will be able to switch to Aloud-generated dubs in their language of choice. For now, it seems as if creators using Aloud will have to post dubbed videos separately, as is the case with this English video translated into Spanish and Portuguese.

How does it feel?

Aloud’s website offers a few sample videos that showcase the product’s application on actual YouTube videos, and as to be expected, the voice is that classical artificial monotone we hear everywhere. It’s more fluent-sounding than some others we’ve heard before, but one can’t shake the feeling that this computerized intonation will, at one point, tire us out. There’s a certain humanness it lacks; halfway into a sample video, we find ourselves yearning for a human dub.

Human voice actors, for one, can convey emotions and verbal nuances that accompany the language. If it’s a video on why graves are dug 6 feet into the earth, then the dub should reflect the somber, horrific nature of the video. If it’s a lesson on biology aimed at children, then a factual tone and intonation should accompany the video—matching the slow speed of the original audio. For audiences that are sensitive to these nuances, Aloud isn’t the most effective tool.

For creators that have tried out Aloud, they seem positive about the new audiences they can reach with so little effort. Kings and Generals, a channel that creates and uploads animated historical documentaries and boasts 2.4 million subscribers, says that their “audience loved it,” and that they are “looking forward to trying this many more times.” The Amoeba Sisters, a 1.3-million-subscriber-big channel with fun science videos, say that “this tool was easy to use and so convenient,” and that they “are so grateful to have a way to reach more audiences through dubbed videos.”

Creators, small- or big-time, will gladly accept Aloud as a new way of diversifying their audience; the untapped market in video content is much bigger than the current English-speaking market many creators appeal to.

The verdict

Without metadata (tone, intonation, voice) taken into consideration, it’s hard to say how effective Aloud will be at conveying and translating the fullness of the original video. As of now, Aloud sounds like your average, run-of-the-mill frontend AI narrator. Theoretically, any content creator can transcribe their video, translate it, run it through a translator, record the output, and use that as the new audio for a translated video.

But that process will take hours, if not days, to complete. What’s important about Aloud is that it has streamlined the process for individual users and creators to take control of the dubbing process so that they can make their content more available and accessible to non-English speakers, despite the reduction in tonal, aural qualities. In that aspect, Aloud’s mission is a valuable one, capable of affecting the lives of hundreds of millions across the world who don’t have access to English videos.

How does this relate to me?

If you’re a translator or voice actor and are constantly on the lookout for the next big development in AI that will topple your (or should we say, our) career in the language industry, Aloud might not be the most welcoming news. With up-and-coming companies like DeepDub threatening to take away the livelihoods of many a voice actor, Aloud is just one more product to worry about.

However, as we’ve mentioned before in this blog, the limitations of automated dubbing and translation are manifold. Even if automated dubs become feasible, translators will still be needed, for example, to check or carry out translations before the text is fed into the dubbing program. Considering that it will take a long time before automated dubbing voices reach human parity, voice actors also shouldn’t have to worry about being replaced; these products aren’t going to be dubbing movies or TV shows anytime soon.

If you’re a creator, it will be beneficial to take a moment and reflect on what programs like Aloud really mean for the language industry and video content ecosystem. How will viewers react to videos made with automated dubs? Will this help or hinder my channel’s reach? Could this possibly affect translators, voice actors, and other professionals working in the industry? How are their translations and dubs different from the ones provided by products like Aloud?

It will also be helpful to think about what programs like Aloud leave out. For example, those who are deaf or hard of hearing will not benefit from these products. If we are truly aiming for an accessible ecosystem, then subtitling products will have to accompany dubbing products.

These are some things to think about as we move forward into the uncharted realm of automated video augmentation; no matter the benefits and shortcomings of artificial intelligence and machine translation, we must make sure no one is left behind in the wake of development.

References
https://datahorde.org/a-history-of-youtubes-closed-captions-part-iv-downfall/
https://blog.google/technology/area-120/aloud/
https://aloud.area120.google.com