Article by Sarah Hickey.
The “Languages & The Media” conference is an international conference focused on audiovisual language transfer in the media. The conference is hosted biennially in Berlin, Germany, and (in their own words) it:
“...brings together content creators and distributors, broadcasters, streaming services, language services providers, software and hardware developers, researchers, trainers, practitioners, and all those involved in the production, marketing and distribution of audiovisual content for information, entertainment or educational purposes through localisation and accessibility.”
This year, the event took place from November 7-9, 2022, and Nimdzi’s analysts joined the conversation to bring you the latest and greatest information and trends from the vibrant field of media localization.
Below is a summary of the main themes that stood out from what was discussed inside and outside the conference rooms.
Audio description (AD) was a major topic at this year’s conference. AD is a service that aims at providing equal access to visual media (movies, TV shows, video games) for people who are blind and partially sighted. In addition to the regular on-screen dialogue, a narrator describes visual items that are relevant to the story (e.g. actions and facial expressions as well as surroundings).
AD appears to be one of the services that are growing in importance. Yet, there remain many challenges with it that have to be overcome. Below is a brief summary of those that were discussed:
Considering these challenges, here is a short summary of some of the solutions that speakers put forward:
While describing what happens on screen might seem fairly straightforward to the untrained eye, this task is anything but — especially when it comes to sensitive content, a challenge that deserves a closer look.
One major question in AD is how characters on screen should be described. Should the narrator point out someone’s skin color and ethnicity, their gender, or their portrayed sexuality? On the one hand, we may argue that we do not need to know whether a character is white or black. In a truly equal world it should not matter and maybe to become a more inclusive world we need to stop mentioning it. On the other hand, the general rule is that if a sighted person can see/identify it, then it should be part of the audio description as well so that blind and partially sighted people have access to the same information. In addition, more often than not movies and TV shows contain scenes that highlight the unequal treatment of marginalized groups, in which case this information is integral to the plot. One potential solution discussed was to have audio introductions where people can choose whether they want these descriptions or not.
This is just one example of many complexities that were discussed to illustrate the kinds of challenges currently being faced and debated in this field.
There are many speech-to-text solutions in the field of media. In fact, taking spoken words and converting them into written text can be described as the bread and butter of media localization. In particular, closed captions and subtitles are well-established services. Let’s briefly differentiate between the two.
Aside from these media localization “classics,” today we can find new applications of speech-to-text solutions, and two stood out from the discussions at this year’s conference.
Speech-to-text interpreting is still a relatively new field (established around 2015, depending on who you ask) and one that shows that these days it is no longer quite as simple as translation being written and interpreting being spoken. While live-subtitling (see next section) contains an element of interpreting, speech-to-text interpreting comes close to audiovisual translation (AVT).
Before delving deeper into the topic, one important point to make upfront is that speech-to-text interpreting typically involves respeaking, which requires the respeaker to add punctuation and other formatting to their verbal output. In addition, respeakers may also use (or be required to use) predefined voice commands for special formatting or proper names.
The below list provides an overview of the five main workflows in which speech-to-text interpreting can be performed:
Editing might be added to any of the five workflows described above.
To date, the main use cases for speech-to-text interpreting are live broadcasts as well as meetings and events.
Interestingly, interpreting associations are still unwilling to accept respeaking and speech-to-text interpreting as a profession, even though both services exist and there are professionals making a living from them.
Live subtitling is a service that has picked up immensely since the pandemic-induced spike in video conferencing, and the technology in this space has been making tremendous strides. It is therefore no surprise that several talks at this year’s conference homed in on the latest from within this particular niche.
In essence, the service of live subtitling involves taking spoken content and converting it to written content in multiple languages with minimal delay. Live subtitles can be created in a few different ways: they can be generated by a machine without human input, the machine output can be edited live by a person, or the live subtitles can involve linguists from the get-go.
Since the Zoom boom, live subtitles have become popular for online meetings but are also being used in live broadcasts, at onsite events, and to make radio content accessible online.
What is interesting to note about this trend is that the providers of live subtitling are coming into this space from different sides of the industry:
Dubbing is the other bread and butter service in the media localization industry and, to date, one that is (almost) exclusively performed by voice actors. That this might change going forward became evident in presentations that showcased the latest developments in machine dubbing.
The quality of synthetic voices has come a long way. Not only do some voices sound so remarkably human that in some cases it can be hard to tell whether or not the voice is synthetic, but also the latest developments involve technology that is able to mimic the original speaker's voice in the translated, synthetic version.
Although not fit for entertainment purposes (yet), current use cases for AI dubbing range from international broadcasts to voiceover for documentaries and corporate videos.
The (alleged) talent crunch within the media localization space is certainly nothing new but it appears to have reached new heights due to ever-increasing volumes, new markets, and language pairs, as well as the economic pressure imposed by inflation. Unsurprisingly, the topic was discussed numerous times and in different formats at this year's conference, including as part of a panel discussion about the working conditions of audio-visual translators (AVTs).
What has changed? The topic appears to be taken seriously and is being discussed openly from all sides (AVTs, language service providers, and buyers). Below are a few key takeaways from the discussions:
The media localization industry is tremendously busy, which is certainly not a new trend but rather an ongoing one. The term "explosion of content" was used several times, as it has been in industry circles over the past few years. We can get a sense of what dimensions we’re talking about if we look at Netflix’s 2021 localization volumes:
These are huge figures, which our industry delivered on in months — and this is, of course, just one streaming platform.
At the same time, it’s not just the number of hours of content, but also the number of languages. In the past, the standard was 12 languages, now it’s more than double that (in the case of Netflix it’s 37 languages for subtitling and 35 languages for dubbing for productions in 50 countries). In addition, content these days is coming from and going into any number of languages and it is no longer limited to going from English into other languages. Particularly direct Asia-to-Asia content, without English as a pivot language, continues to grow in popularity.
With this change in direction and with more content being produced in languages other than English, we are also witnessing the emergence of English dubbing, which is a relatively new phenomenon.
For the providers, the challenge is not purely about handling rising volumes but also establishing new workflows and finding new talent in the right markets, at the right time. This all adds to pre-production and post-production times for dubbing and subtitling.
The end-user perspective confirms the trend. In a panel discussion on content globalization, Simon Constable from Visual Data shared data from a survey that asked end-users whether they watch productions in other languages. The results showed that 25% do it all the time, 26% mostly do so, and 28% sometimes do. In addition, 63% stated that poor localization has an impact on what they watch and whether or not they switch it off.
As one of this year's themes, access and inclusion were part of most discussions. This is no surprise considering that many services we group under the umbrella of media localization are also well-established accessibility services (as mentioned above). Several speakers encouraged the audience to "look around" at the lack of diversity in the industry and others remarked that you “can’t be what you can’t see” to remind us of the importance of representation of all groups in the media and entertainment space.
Let’s take a look at some of the challenges that were discussed regarding accessibility, inclusion, and sensitive content.
Keynote speaker Änne Troester from the German dubbing association Synchronverband e.V. - die Gilde pointed out that for a long time it was believed that the dubbing industry was colorblind because it is audio only but that this assumption could not be more wrong. For one thing, Änne noted that white actors have always been allowed to voice black people but not the other way around. Efforts to change this and also to cast people who can represent marginalized and underrepresented groups of society accurately comes with its own challenge. Take the example of a character on screen transitioning from male to female or the other way around. Änne rightfully points out that we as an industry should not assume how this can best be represented in the dubbing script and the voice of the actor. To do it right, we need to ask representatives from these groups for their input. But then the challenge becomes one of how to do that. You cannot and should not ask people about their sexuality, so how do you recruit for this in a respectful manner that allows for a respectful and accurate representation on screen?
While there was no final answer to this challenging question, it is still a good thing that people are actually talking about it, that they are questioning current practices, and are actively looking for ways to make the media industry more inclusive.
How should offensive content be translated? For one thing, translators need to consider the intensity of the word in the specific language and culture. Something that is relatively harmless in one language may be completely inappropriate in another.
Another layer of complexity is added when transferring spoken words into written ones, whether that is in the form of captions or in the form of subtitles. Because, as many speakers reminded the audience throughout the conference, oftentimes seeing something in writing can come across much stronger than the verbal equivalent. Some even argue that there are some words that should never be written (e.g. the n-word).
When we look at the overall picture, not much has changed. For years now, volumes have been rising and continue to do so. Hand in hand goes the notion of a perceived talent crunch, which mostly comes down to price pressure as well as finding talent in new markets, but also to the changing roles of linguists as language service providers continue to leverage technology in an effort to increase efficiencies.
Most notably, AI is increasingly entering the media space both in the form of machine dubbing and live subtitling. However, it isn’t necessarily aimed at the entertainment industry but rather the meeting and events space as well as the broadcasting industry. At least for now.
For media localization, accessibility is not just another buzzword. This was evident both from the discussions inside and outside the conference rooms as well as from a “practice what you preach” standpoint. Indeed, two conference rooms were equipped with a large screen (positioned next to the stage) with a running machine-generated transcript of the conversations happening on stage. That being said, peers recognized that to achieve true accessibility and inclusion, there is a lot more work that needs to be done in the coming years. And they aim to do it.
This article was prepared by Sarah Hickey, Nimdzi's VP of Research. If you have any questions about interpreting technology, reach out to Sarah at [email protected].
The Nimdzi Language Technology Atlas maps over 800 different technology solutions across a number of key product categories. The report highlights trends and things to watch out for. This is the only map you will ever need to navigate your way across the language technology landscape.
Access to and usage of language services are increasing exponentially as technology shrinks the globe, connecting us more and more across culture and language. In addition, there is increased awareness and efforts towards diversity, equity and inclusion, in particular with regards to access to essential services such as education, medical care and legal services.