Article by Rosemary Hynes.
Language technology providers are scrambling to jump on the speech-to-text bandwagon which means users can view machine-generated live subtitles (translated from the original) as well as multilingual captions (monolingual transcripts available for different languages) of speeches in their preferred language. While this sounds great, some providers are taking it a step further. How so? By offering speech-to-speech translation (S2ST), otherwise known as machine interpreting (MI).
Today, on January 24, 2023, the first remote simultaneous interpreting (RSI) platform is set to release their very own MI feature. The household name, KUDO, is best known for providing the technology (and also, if needed, the interpreters) that facilitates the use of RSI in video conferences and at large events.
The release of their proprietary MI feature is a smart move by KUDO. Video conferencing platforms with an RSI feature, like Zoom and Microsoft Teams, dominate the multilingual online meeting sphere. In order to stay relevant in this space, RSI platforms need to innovate and constantly improve their provision of services.
That being said, although KUDO might be the first one in the RSI space to add their own MI solution, KUDO AI is certainly not the first MI solution out there. In fact, there are as many as 23 MI technologies listed in Nimdzi’s 2022 Language Technology Atlas, but only a handful provide MI for online events and conferences. The majority of MI solutions in the image below are handheld devices used for two-way communication. In the conference and events space, Wordly is currently the most specialized and well-known solution on the market.
So what does KUDO AI entail? KUDO’s MI solution is available on the KUDO platform and on its partner events platforms like On24 and Hopin. In the coming months the company will then also integrate its MI solution with video conferencing platforms, such as Microsoft and Zoom. KUDO AI will be publicly available by the end of the first quarter 2023. The solution uses a cascade model and has been built on a combination of open-source technology plus in-house technological building blocks. It has been tested to the point where KUDO was satisfied with the quality of its output both for accuracy and fluency. Already, speakers can choose the gender of their synthetic voice before entering the meeting. Moreover, a voice cloning feature will be included in the coming months — meaning the original speaker’s voice will be retained in the synthetic, translated output.
As it stands, there are five languages available in KUDO’s MI feature: French, Spanish, German, Portuguese, and English. KUDO intends on adding a new language every two weeks to cover the majority of European languages by the end of the summer. During a meeting with KUDO AI, listeners can select their preferred language from a drop-down menu or choose to listen to the original. Meeting participants will then hear the live translation of the original speech via a synthetic voice in the language they selected. In addition to the synthetic audio output, participants can also choose to see the written translation in the form of machine-generated live subtitles (or turn them off).
Currently, one disadvantage of the subtitles is that they do not tag the speaker so it can be difficult to follow speaker changes. The same applies to the MI output, if two male participants are speaking interchangeably, the MI does not indicate a change of speaker. This combined with the considerable lag can make it difficult to follow a conversation between two or more speakers. Hence, in its current state, the use-case is more likely to be applied to one-to-many webinars or online training courses, where the exchange is limited.
By providing MI to their clients, KUDO can cater for the increased demand for multilingualism in virtual meetings. The mass shift to online meetings during the COVID-19 pandemic has stabilized somewhat, with the return of in-person meetings, however, virtual events have not gone away and, with that, the need for virtual solutions. Event organizers are increasingly looking to add languages to their meetings, but human interpreters are not always a viable solution for short events (such as one-hour long meetings) or smaller budgets.
That is where machine-generated live subtitles and MI come into play. Users can select the language they want to read the subtitles in or listen to the synthetic voice. MI solutions are useful if event attendees want to listen to the meeting while doing something else. It also serves as an accessibility feature for people who are blind or partially sighted and want to partake in a multilingual event.
So, although this software release by KUDO is not the first of its kind, it does set a precedent for other RSI platforms to potentially follow in its footsteps. KUDO prides itself on being at the cutting edge of language technology and this product demonstrates the company’s willingness to innovate.
As for human interpreters, the situation is likely not to change considerably. Long, more complex meetings will continue to require human interpreters, particularly when the stakes are high or the situation requires a certain degree of emotional intelligence the machines cannot (yet) provide. dMI will most likely fill the gap where interpreters were reluctant to provide their services in the first place — short, relatively simple, and low-budget online meetings.