Report written by Rosemary Hynes.
Machine interpreting (MI) is a hot topic right now as technology providers boast their latest advances in this field. It is likely that the advent of MI will revolutionize the interpreting industry as we know it, similarly to how machine translation (MT) upended the translation industry and ushered in a new era for all stakeholders involved. So, now is the perfect opportunity to take a deep dive into the world of machine interpreting.
First things first, what is MI? It is the transmission of a spoken message in one language into a spoken message in a different language using artificial intelligence (AI), without the input of a human interpreter. With MI, interactions between people who speak two different languages can be facilitated by technology bridging the language barrier. For example, an English-speaking patient could speak to their doctor who speaks Mandarin using just their smartphone to overcome the linguistic differences. What we might deem the first attempt to build a speech-to-speech translation (S2ST) system was carried out in 2016 by a group of French researchers in a proof of concept paper that can be read here. The team found that they were able to obtain quite promising results from their MI engine using only a small corpus. They concluded with a recommendation of training future MI engines on more varied datasets such as TED talks or audiobooks available in the public domain.
How, then, does MI work? S2ST technology uses automatic speech recognition (ASR) to identify the source language and uses AI and a synthetic voice to speak the message in the target language. As it stands, current MI uses a cascade model. Like a waterfall of information flowing from one step to another, this model transfers language through a series of steps in order to reach the final product: a synthetic voice producing the speaker’s message in a different language from the original.