Report by Yulia Akhulkova, Sarah Hickey., and Rosemary Hynes.
The Nimdzi Language Technology Atlas offers a unified view of the current language technology landscape. This freely accessible annual report provides readers with insights into major tech advancements and demonstrates the value and potential of technology solutions designed for the language industry.
The Atlas is a useful tool that helps with language technology-related decision-making. Technology providers use the Atlas both to benchmark their competition as well as to find partners. Investors refer to it to gain a better understanding of the leading market players. Linguists and buyers of language services turn to it to see what tools are out there to help them in their everyday jobs. It allows students of language programs around the world to discover just how many tools may be just a click away from being leveraged in their future careers.
That being said, market transparency is not enough to keep pace with the changing technology environment of today. That’s why the Language Technology Atlas serves as a starting point, a map of sorts. Only you can make your language technology journey really work for you. So, let’s open our newly updated guide to the world of language tools, and walk this road together. Happy travels!
In this year’s edition of the Nimdzi Language Technology Atlas, we collected data from providers of more than 800 technology solutions.
The data gathering behind the Atlas is based on four main sources:
These sources have given us a comprehensive understanding of the state of technology development in the industry. Before we continue, let’s review the definitions of key language technology.
Translation management systems (TMS) are systems that feature both translation (and editing) environments and project management modules. Core components of a typical TMS include:
You can check out more features of a modern TMS using Nimdzi’s free TMS Feature Explorer.
Inside the TMS category, there are four subcategories:
Unlike TMS, translation business management systems are systems that do not have a bilingual translation environment. They only have management options for translation project enablement. We call such technology a BMS or (T)BMS, since that’s exactly what it does: it helps manage business operations around translation.
Here, we feature various tools and platforms for audiovisual translation enablement: from project and asset management tools to AI-enhanced dubbing tools.
The section demonstrates major MT engines brands subdivided into four subcategories depending on the MT providers’ specialization.
Here we list systems that integrate other systems with each other. The middleware subsection demonstrates major companies that specialize in integrating various language technologies. The products in the MT Integrators subsection not only provide smart access to MT engines, but support certain procedures around MT, so that the users could leverage MT in the best way possible.
In this section, we feature platforms and marketplaces focused specifically on translation and localization talent. In a marketplace, you can post a job and accept responses from linguists or other professionals who are interested in doing the work for you. Then you book this talent or directly assign the job to the chosen talent within the platform. If you’re a linguist, you sign up and set up your profile in the system, get vetted and/or tested (on some marketplaces), and then start receiving job offers.
There is also the platform LSP option where you not only get access to a library of linguistic resources and agencies, but also to the workflows for the projects and PMs who support you. You can upload your files to the platform, get an instant quote, and after quote approval and project completion, receive the result.
At Nimdzi, we coined the umbrella term ‘virtual interpreting technology’ (VIT) to describe any kind of technology that is used to deliver or facilitate interpreting services in the virtual realm. There are three ways in which virtual interpreting can be performed or delivered: via over-the-phone interpreting (OPI), video remote interpreting (VRI), or remote simultaneous interpreting (RSI).
As the name OPI suggests, two or more speakers and an interpreter use a phone to communicate. This is an audio-only solution and the interpretation is performed consecutively. VRI is also performed consecutively. However, in this case, there is both an audio and a video feed. Depending on the VRI solution, users and interpreters either connect via an online platform with video calling capability or via a mobile app. As for RSI, it directly evolved out of the field of conference interpreting and is intended for large online meetings and events with participants of many different language backgrounds. As the name suggests, the interpretation is performed simultaneously — that is to say, at the same time as the speakers give their speeches.
Artificial Intelligence (AI) is increasingly entering more areas of our lives and the interpreting market is no exception. We have included machine interpreting solutions in our definition of VIT and are subsequently listing them in our Language Technology Atlas as well.
Interpreter management and scheduling (IMS) systems are also included in our definition of VIT because, even though they do not focus on delivering interpreting services, they facilitate them. An IMS is a useful tool that allows for efficient management of interpreter bookings for both onsite and virtual interpreting assignments.
You can check out more features of VIT systems using Nimdzi’s free VIT Feature Explorer.
This section is devoted to quality management in translation. It features three separate subcategories which correspond to three main product types in this area: QA tools, review and evaluation tools, and terminology management tools.
Also known as automatic speech recognition (ASR). The section features solutions that focus on automatic transcription and automatic captions.
Writing about language technology year after year, we can’t often help but wonder whether we’ll ever see a new idea so innovative that it could transform and reshape our industry as a whole.
"There have only ever been three disruptive innovations in the language industry: E-mail (enabled the notion of in-country native speakers as translators), translation memory software (TMs and the concept of CAT-tools, which then transformed into TMS), and machine translation (and Google Translate in particular)."
Renato Beninatto, Co-founder, Nimdzi Insights
Diving deeper into the question of whether there’s already anything else, in July 2022 Nimdzi put together a list of the top 25 most innovative companies in the language industry, which goes beyond the language technology companies we normally write about in the Atlas.
As the technology landscape evolves, so does our thinking about the areas of focus for the Atlas. We have already started to map technology systems and innovative initiatives that use language data, and their core is big data, simulation of natural language, and the specific innovative applications the latter offers. There are also new companies in our industry that use blockchain technology, peer-to-peer (P2P), payment tokens, and other concepts from the world of big IT.
Let’s take a look at some of those solutions in further detail.
When thinking about the natural language processing (NLP) landscape, we aim to consider the applied NLP solutions that both relate to and extend localization enablement, which are essential to the content and language strategies of digital enterprises. We focus on multilingual learning and applications that catalyze growth in new markets.
For now, we’re not yet adding a special category to the Atlas infographic, but still would like to mention a couple of solutions that have caught our attention.
The boundaries of localization and associated technologies are expanding as the use of AI becomes more widespread and more features are added.
"Both global enterprises with a strong localized presence and localization service providers alike are focusing on new NLP usage opportunities to enable better business decisions and reduce operational costs."
Roman Civin, VP of Consulting, Nimdzi Insights
Not only do automated messaging solutions and entity sentiment analysis models create distinct types of value in specific verticals, but so do content creation and classification systems, search intelligence and language assets optimization. In businesses where localization drives content strategy, localized experiences are part of the game, not an afterthought.
Even though the current applications of NLP in localization are not large in number, we will continue tracking NLP platforms and technologies, how they help, and how they are relevant for our industry.
In addition to Translateme, who we included in the 2021 edition of the Atlas and mentioned in the data-related section of the Nimdzi 100 in 2021, it’s important to also mention Exfluency when discussing the blockchain arena. The latter is a relatively new system built on the concept of privacy by design with an idea of creating a space for a secure multilingual asset store. They now have a community of more than 1,000 users.
Exfluency offers two levels of anonymization (one for GDPR-compliance, plus another one for anonymization of certain data following customers requirements). They have hybrid anonymization, when it's about 85 percent AI, 15 percent human, and three levels of blockchain use:
Exfluency was created with an idea of developing a concept that would empower language as a natural human asset and exclude layers of middlemen on which the regular localization supply chain is based (with actual linguists at the very end). They succeeded, and it already works for certain use cases. For this concept to really disrupt the industry, though, a lot of future development is expected. For example, implementation of customized quality management systems (for when the P2P judgment happens).
Invariably, it would be interesting to monitor how Translateme and Exfluency develop, as well as see more companies and use cases in the translation and localization industry with blockchain-powered solutions.
TMS can be considered a prerequisite of professional localization today and a TMS is oftentimes central to a client’s efforts to localize their content. In fact, 92 percent of translation and localization managers that Nimdzi interviewed as part of its Lessons in Localization series between July 2020 and December 2021 stated that their companies either use a commercial TMS or one that had been developed in-house.
There are a great variety of TMS solutions available on the market, each striving to address the specific needs of their clients. In this year’s Atlas, we reference over 160 different TMS solutions, 10 more than last year. This means that regardless of the wide variety of options out there, new solutions continue to emerge.
Some language technology companies turn to the TMS arena from related areas — for example, (T)BMS. Within the last year, a couple of BMS tools added translation editor functionality and joined the TMS category. For example, Taia was moved from the BMS category to the TMS for enterprise category.
"The TMS market is experiencing growth that is, at least for some players, outpacing the growth of the language services industry as a whole. This growth, in turn, attracts both investment opportunities and opportunities to consolidate market position, as evidenced by a slew of mergers and acquisitions in this segment of the technology market."
Gabriel Karandyšovský, COO, Nimdzi Insights
While it is not exactly within the scope of this research to evaluate the performance of specific companies, a few points are worth mentioning:
If we take a closer look at some of the most regularly occurring challenges associated with the process of selecting a TMS in 2022, we’ll most likely see well-known issues with connectivity (a modern TMS should be able to connect to a diverse set of systems, from web content repositories to home-grown software solutions), full compliance with GDPR as well as HIPAA, ISO/IEC 27001:2005, PCI, and other standards and protocols, and security. How do you go about ensuring that a file does not leave the TMS environment on the translator side, for example? Or that the document is not compromised when transmitted via unsecured email servers? Another issue frequently reported by enterprise customers is calculating the ROI of a TMS.
|No.||TMS||Response count||Percentage of total respondents|
The top-5 companies in the TMS arena — memoQ, Memsource, RWS, XTM, and Smartling, as the data from the Nimdzi survey suggests — have all increased their individual brand awareness over the past 10 years. The conclusion could be drawn that, for newcomers in the TMS developer world, top-tier mindshare status might be hard to obtain. However, the trending data over time tells quite a different story, showing that it is indeed possible for companies to improve their mindshare status with the right marketing strategy (see Memsource example).
With all that vivid TMS development, the terminology management question remains an area yet to be tackled. Developments on this front are few in number.
Worth mentioning, however, are new features for lexiQA. They launched the first version of a morphological terminology engine and are working on developing morphology support for more languages as well as planning a pilot with one client at the end of summer. This results in a significant decrease in QA false positives along with no reported false negatives. On top of that, lexiQA also introduced a QA-as-you-type feature as well as review capabilities. An LQA mechanism available via API takes most of the manual work out of the quality assessment process. By having mapped all error classes onto the MQM model, lexiQA is capable of assigning severity weighting to each error type, while automatic QA checks produce scorecards based on specified requirements.
Another name that popped up a lot in conversations with localization professionals keen on quality was Sketch Engine. For example, an online term extractor with monolingual and bilingual term extraction capabilities, OneClick Terms, is powered by Sketch Engine’s term extraction technology. It presents both a powerful service and platform with many languages along with a huge corpora. The value lies in the intelligent use of language corpora, and one of the use cases is advanced terminology extraction (we haven’t added it to the infographic below, because we do not yet have a subcategory for terminology extraction).
Another important concept to keep in mind when talking about quality is the translation error correction (TEC) model, which was introduced in a recent paper published by LILT and the University of California at Berkeley in June 2022. TEC automates correction of human translations. The idea is that it could help with speeding up the review process, for example, by suggesting corrections. Even though some of the said suggestions may still be incorrect, professional reviewers accepted 79 percent of the TEC suggestions for correction overall. It has also been noted that it could be potentially helpful with tackling client-specific requirements.
A new addition to our Review and Evaluation subsection is LocalyzerQA by Lingoport. LocalyzerQA automates linguistic review of web, mobile, and desktop applications in context without requiring developer assistance. In their June release, new features were added to LocalyzerQA such as global replace of selected strings across files and projects during localization review as well as new statistics tracking for translations and reviewers.
Note that the well-known Multiterm was removed based on updated information by RWS, and Trados Terminology has been added instead.
Nowadays, the requirement of having enough data is usually brought up when discussing MT customization. However, it may also be a blocker for major IT brands who develop generic engines for low-resource languages.
In the past, MT researchers often wondered how to get around the problem of having adequate technology without having adequate data. This question is still relevant today, yet it is no longer just a low-resource-language issue. Fortunately, an increasing number of tools and technologies have been developed to deal specifically with language data.
Last year, we discussed the most prominent trends in data for AI and AI localization, highlighting the newly introduced solutions by SYSTRAN, Omniscien, and Pactera EDGE, among others. A major player in the AI data space it behooves us to mention here is Appen.
Founded in 1996, Appen was historically the first LSP to recognize the importance of collecting and producing quality language data at scale required to train multilingual AI. To support its global resourcing and production models, Appen developed an internal proprietary platform ADAP (Appen's Data Annotation Platform) that helps procure and annotate training data, enabling MT, ASR, and NLU solutions relied upon by many of the language technology suppliers listed in our Atlas.
Appen has several proprietary systems including Ontology Studio, which helps with the creation of custom multilingual ontologies across multiple locales for market-specific search relevance and recommendation engines, Ampersand for automated and human-in-the-loop speech recognition data processing and transcription editing, and ADAP (available internally and via subscription), which is a resource management and data curation/collection platform.
Crowdsourcing language data is one option, but there’s also the option of data synthesis. As the name suggests, data is synthesized, or created artificially, in order to overcome the limitations of real-world data. It’s cheaper, doesn’t contain personal information (as with human-generated data), and has numerous other benefits. Nonetheless, synthetic data currently accounts for only 1 percent of all market data. However, Gartner forecasts that by 2027 the data market segment will grow to USD 1.15 billion (48 percent CAGR).
Speaking of synthetic data, let’s not forget to discuss the rapidly developing arena of synthetic voices and ‘AI dubbing’. As we noted last year, AI-enhanced technology for video localization extended its scalability outside the entertainment industry. Synthetic voices are also used in e-learning, educational materials, broadcasting, and advertising.
Voiseed, who was already on our radar last year in the “AI-enhanced dubbing tools” subcategory of the Atlas, just won a PIC challenge at LocWorld Berlin with “A New AI-based Technology to Synthesize Controllable, Expressive Speech in Multiple Languages, with Multiple Voices.”
New additions to this category include Aloud by Google and Dubverse. The former is a part of Area 120, Google’s in-house incubator for new products and services, while the latter offers AI-powered video dubbing using text-to-speech (TTS), advanced MT, and AI. The platform features human-like AI voices from a range of more than 100 speakers of various gender, age, and style to match a particular content type.
Interestingly, everyone that enters this arena seemingly dubs their solution a “new way to dub” despite the fact that this category of the Atlas already includes 25 such companies and solutions.
As early as 2020, South African news provider Media24 built its own synthetic voice generator. While we’re on the subject of Africa, let’s also mention Abena AI, a voice assistant fluent in Twi, the most widely spoken language in Ghana. The Abena AI app for Android is called “Africa's first hands-free offline voice assistant” and opens up voice AI to those who don’t speak English or other common voice assistant languages.
In 2021, African demand fueled the USD 3.4 million investment by Mozilla Common Voice to create linguistic databases as a precursor for a voice assistant that can speak Swahili. The Common Voice project itself was launched five years ago to support voice technology developers who do not necessarily have access to proprietary data. It has become a platform where anyone can donate their voice to an open-source data bank. It now includes more than 9,000 hours of audio in 60 different languages.
It is expected that there will be further developments in the area of building native voice assistants, including adding both more African languages and additional features that are not yet common in voice assistants like Alexa or Siri.
As noted by Waverly Labs earlier this year, the world has been slowly embracing a “not-only-English approach.” Waverly Labs has built technology for near-instantaneous communication among multiple languages and dialects, including their latest solutions, Subtitles, Audience, and Ambassador Interpreter. Subtitles provides transactions and service exchanges between different language groups. Designed as a two-sided screen, the device’s mic array picks up conversation between two parties, sends it to the cloud for translation, and flashes the processed text to the screen. “It’s almost like watching a subtitled movie, but at the bank, hotel, airline, grocery store, or other such organization or business.”
And for English-speaking countries where people want to understand what is said on the screen without using subtitles, there are also new tools appearing on the market. A piece of technology to keep an eye on is the auto-dubbing solution Klling, by the AI startup KLleon from Singapore. Klling is an app that dubs media content in English, Korean, Chinese, and Japanese. So if you want to watch Korean content in English, you can use this dubbing solution. They also have an interesting product called Klone.
Klone enables the creation of an AI virtual assistant/chatbot using a virtual avatar. AI virtual chatbots can be used as a user interface for people to interact with. Klone’s underlying deep learning technology, called Deep Human, requires a single photo and 30 seconds of voice data to create a digital human. The resulting auto-multilingual dubbing solution is able to dub the video into five languages while maintaining the person's voice, with automatic lip sync enabled. Samsung uses Klone for their AI virtual assistant for Samsung Display.
Digital humans for use in the metaverse and virtual influencers in general are nothing new, but they are becoming ever more popular and widespread. In June 2022, Lava, the “first Armenian” virtual influencer appeared on the scene, created with the help of Unreal Engine. Lava publishes content in English but does not yet talk — unlike South Korean AI anchor Kim Ju-ha.
In her case, the cable TV network MBN even has a page for their AI anchor news labeled "Virtual Reporter News Pick," where the AI anchor explains, “I was created through deep learning 10 hours of video of Kim Ju-ha, learning the details of her voice, the way she talks, facial expressions, the way her lips move, and the way she moves her body.” As MBN officials stated in 2020, “News reporting using an AI anchor enables quick news delivery in times of emergency for 24 hours non-stop.”
Will we see more international examples of news delivered with help of such AI? Probably so. And not only in news and broadcasting. For example, KLleon's own virtual human trended on TikTok with over 2 million views.
In October 2021, Microsoft Translator reached a major milestone: it was capable of translating more than 100 languages. In July 2022, Meta announced that it had achieved another milestone related to the No Language Left Behind (NLLB) initiative: NLLB-200, an AI model that can translate content to and from 200 different languages. This is almost twice the number of languages covered by current state-of-the-art models.
In the company’s first announcement of NLLB, it was noted that they were particularly interested in making MT more accessible for speakers of low-resource languages. The model has already been put to use. For example, Meta partnered with the Wikimedia Foundation to give Wikipedia editors access to the technology to quickly translate Wikipedia articles into low-resource languages that do not have a particularly prominent presence on the site. Meta AI is also providing up to USD 200,000 of grants to nonprofit organizations for other real world applications for NLLB-200. The model itself is open-sourced.
In addition to Meta, in July 2022 Amazon also announced another “break through language barriers.” They offered to combine three of their services — Amazon Transcribe, Amazon Translate, and Amazon Polly — to produce a near-real-time speech-to-speech solution whose aim is to quickly translate a source speaker’s live voice input into a spoken target language, and with zero ML experience. Fully managed AWS services work together in a Python script by using the AWS SDK for the text translation and text-to-speech portions, and an asynchronous streaming SDK for audio input transcription.
As suggested by Amazon, the workflow is as follows:
The real life use cases for this may include medical and business domains, as well as events. Amazon developers also encouraged users to think about how they could use these services in other ways to deliver multilingual support for services or media.
Even though the language industry largely agrees on the benefits of MT in its current state, the billions of words processed by MT engines each and every day may still result in translations that are, overall, not quite good enough. For this reason, a number of companies have taken on the mission of changing this via incremental improvements of their technologies aimed at bettering the MT output quality.
This is one of the reasons the MT market is so complex and dynamic. But with so many options available, the good news is that, unlike early adopters of MT, modern users need not necessarily study the market themselves or evaluate engines inhouse. One good example of speeding things up with customization is adaptive MT. Adaptive MT engines are capable of learning from corrections in real time. Known examples of such engines are LILT and ModernMT.
However, even though the concept of adaptive MT means improving engines fast, which is in itself a clear benefit, it may not be the best option for every use case, simply because people who introduce those corrections from which the engines learn on the fly can and do make mistakes. And, then, there’s also an issue of having the final say — when multiple “correctors” may be improving the same engine introducing contradictory changes. As confirmed by Globalese, whose specialty is, precisely, MT customization, improving engines should be done using quality approved data. An asynchronous retraining of a custom engine (e.g., on a daily basis) with qualified data can be more efficient compared to a real time learning based on unconfirmed content.
Moreover, there are now technology companies that can help with preparing data, MT engine assessment, and overall implementation of an MT program at an enterprise, be it MTPE training for linguists or setting up effective MT customization workflows. As we’ve already covered examples of such providers in last year’s report, this time we’re going to focus on a couple of more recent examples of “better MT.”
Most glossaries available on the market still have search-and-replace functionality. But we’re already seeing changes to this approach. For example, DeepL launched a glossary feature in May 2020 that allows users to define and enforce custom terminology, but they didn’t stop there. They also launched new language models that more accurately convey the meaning of translated sentences, while overcoming the challenge of industry-specific professional jargon.
With the continuous improvement in MT technology, engines are expected to get even better, enabling everyone to use glossary terms with morphologically correct inflections.
In the beginning of 2022, a brand new TAUS service was launched: TAUS Data-Enhanced Machine Translation (DEMT). TAUS DEMT was set to deliver affordable, customizable, high-quality MT output with a single click using the training datasets most relevant to the user’s source files. This is a type of real-time MT where training and customization happen based on selected datasets.
Evaluations performed by third-party MT experts proved that the available TAUS datasets used in the customization of Amazon Translate improved the BLEU score measured on the test sets by more than 6 BLEU points on average and 2 BLEU points at a minimum in the medical, ecommerce, and finance domains. This means an increase of 15.3 percent on average is achieved.
Source: Evaluating the Output of Machine Translation Systems
As a result, TAUS DEMT delivers quality similar to post-edited MT. But delivery is virtually in real time and prices are 50 percent to 80 percent lower than the “human-in-the loop” service.
Speaking of TAUS, according to their research, anything below an 85 percent fuzzy match in Romance languages is potentially better handled by MT than by translation memory (TM). That’s most likely why, for instance, the MT suggestions in MateCat are allocated an 85 percent match by default. In the absence of higher percentage matches, the MT will be displayed.
The representation of MT as a fuzzy match, like in a TM, can give users an idea of the extent to which this MT may be used. And while we’re on this subject, it’s worth mentioning what Memsource engineers are trying to do. They have addressed the overall feeling of uncertainty inherent in the MT evaluation process with their machine translation quality estimation (MTQE) feature, which helps users automatically estimate the quality of MT, especially if there’s no match from the TM. Based on MTQE results, users can decide how to treat the output. Knowing that linguists save time with quality scores for TM matches, the decision to provide a similar option for MT was made.
Looking at the question of fuzzy matches in MT from another perspective, there is ongoing research into combining TM+MT in a single segment. This actually might become another MT innovation: part of a given segment may be found in a TM, and therefore presented to the linguist, but another part (missing from the TM) comes from MT.
"MT as a technology has reached a certain level of maturity in terms of baseline quality and customization, which implies some serious setbacks for traditional translation memory technology."
Jourik Ciesielski, MT Specialist, Nimdzi Insights
A fuzzy TM match entails a predefined error threshold — if you populate a 90 percent match, you know for sure that 10 percent of the segment needs to be corrected. The edit distance in a machine-translated segment can be smaller, especially when the MT is leveraged by a well-trained model that includes company-specific or domain-specific terminology. As a consequence, some organizations already prefer MT over fuzzy TM matches below 95 percent.
Paying attention to the tone of voice helps a text appear more aligned and sound more human. In writing, signals like emotion, body language, gestures, voice, and so on, have to be represented by the tone of voice, and MT is learning to reflect this significant part of modern communication.
In translation, this is especially useful for conversational scenarios in languages where tone of formality matters, like German or French. DeepL already has a trigger for formal/informal settings, i.e., way of speech, and features native tone-of-voice control. Formal/informal options are available in Amazon Translate. Intento is also leveraging existing technology for tone of voice. Their focus represents an effort to control involuntary bias when using MT. This involves taking practical steps towards dodging these biases, piloting a tone-of-voice control that works independently of the MT providers. So they added MT-agnostic NLP which enabled tone-of-voice control and provided a wider choice of MT engines for such cases.
Remote interpreting solutions have been both in development and in use for a long time now. However, prior to the COVID-19 pandemic, uptake was slow. The onset of the pandemic changed this drastically, and, ever since, it seems that the growth, innovation, and investment in this field has been unstoppable. Once considered an afterthought or sub-par alternative to onsite services, remote interpreting has stepped out of the shadows to become the key to continuity of business and care in many industries.
Because so much has happened and is still happening in VIT, this year’s Language Technology Atlas is dedicating a special section to this thriving field within the larger market for language technology.
An interesting side-effect of the boom in remote interpreting is that interpreting has gone more mainstream. This trend can be observed across different segments of the interpreting market — from vaccine centers across the US being equipped with portable, on-demand video remote interpreting (VRI) devices (e.g., from AMN Language Services), to Walgreens pharmacies partnering with VRI platform VOYCE to enable efficient communication between customers and employees (including language access for the Deaf and hard-of-hearing through sign language interpreting).
However, this trend is particularly noticeable in the remote simultaneous interpreting (RSI) space. Ever since the onset of the pandemic, it appears to have opened the door to new clients, so that these days RSI is no longer limited to conference interpreting (its field of origin). Instead, RSI has started to branch out into other areas of the market. For instance, LSPs suddenly received requests for RSI for parent-teacher conferences and other school events. Local governments reached out wanting to add RSI to their town hall meetings and COVID-19 announcements. Educators from various fields, including healthcare education, have added RSI to their classes, and at least one large esports company is looking to add RSI to its virtual live events. So not only did the forced move to the virtual realm remove fears and concerns surrounding remote interpreting on the side of existing clients, it also created a whole new set of opportunities and brought interpreting to clients who previously never even considered it.
A likely explanation for this development is the popularity of and increased exposure to (monolingual) video conferencing platforms like Zoom, MS Teams, and Google Meet, that have been booming ever since in-person meetings became restricted. Although these platforms were already being used prior to March 2020, the pandemic took things to a whole new level as video conferencing became the norm in, more or less, every area of society. From businesses to governments to schools to the average person — no matter the age group, no matter the setting (weddings, birthday parties, and funerals included).
But sooner or later the now well-known Zoom fatigue started to set in and so people were trying to find ways to make their virtual meetings more engaging, and started exploring new features and meeting formats — including multilingual meetings, facilitated by RSI.
These days, it really seems like RSI is everywhere. From US President Joe Biden holding a virtual Leader Summit on Climate, facilitated by Zoom with RSI from Interprefy, in April 2021, to virtual wine tours in multiple languages in 2022.
Because RSI is in such high demand right now and there is so much innovation happening in this field, a large portion of our VIT special will focus on RSI this year.
Before delving deeper into RSI and the different ways it can be performed, it is important to briefly distinguish it from VRI because both are remote interpreting solutions that use audio and video.
From a service standpoint, the main difference is that VRI is performed consecutively (speakers and interpreters taking turns), while RSI is performed simultaneously (interpreting at the same time as the original speaker). And while VRI is predominantly used for smaller meetings or in healthcare and public sector settings where only two different languages need to be supported, RSI is meant for larger events and conferences with people from many different language backgrounds.
From a business standpoint, it is also worth highlighting that interpreters are paid either a day rate or half-day rate for RSI for larger events or conferences. In addition, since the beginning of the RSI boom, hourly rates are becoming increasingly common for shorter assignments of only one or two hours. In comparison, VRI assignments are charged by the minute.
Over the past two and a half years, we have learned about a whole host of different solutions for RSI. There are so many that it can be easy to get them all mixed up. So we have broken them down into four categories.
Let’s take a look at each category in more detail.
An example of a video conferencing platform which doesn’t have its own RSI capabilities is Skype. However, that doesn’t mean that it is impossible to do simultaneous interpreting on Skype, it just means that a workaround is needed. This could be:
Zoom is the biggest de facto RSI platform, judging by the number of meetings. Platforms like Zoom, Webex, Google Meet, and Microsoft Teams (MS Teams) fall under this category. MS Teams is the latest video conferencing platform to add an RSI feature in August 2022. Their RSI tool will be discussed in more detail in the section on video conferencing platforms. Video conferencing platforms like these were not designed with multilingualism and RSI in mind, but added an RSI feature onto their interface when demand for remote multilingual meetings peaked during the pandemic. Subsequently, the RSI features on these platforms are relatively limited. For example, Zoom only added relay (i.e. when an interpreter interprets from a colleague’s output rather than from the original speaker, in a case where the interpreter doesn’t work with the current speaker’s language) in Spring 2022, and the Google Meet interpreting extension doesn’t allow for multiple booths. Features like a mute button, individual interpreter chats, a handover feature/button, a timer, and an audio volume-control button are often missing from these platforms as well.
This type of platform can host its own meetings but RSI is its raison d’être. Many examples of such platforms can be seen in our Tech Atlas. The interpreter control panel is quite complete and often aims to resemble that of an in-person booth as much as possible. There are two typical scenarios for the use of designated RSI platforms:
In both cases, whether the meeting happens on the RSI platform or not, the interpreters are typically not visible to the speakers and attendees (although some platforms can enable this upon request). They act in the background, just as they would in an onsite meeting when they interpret from a physical booth.
Just like with a physical soundproof booth at an onsite meeting, these platforms function alongside the original meeting taking place on a video conferencing platform. The two major distinctions from standalone RSI platforms are that virtual booths do not integrate with video conferencing platforms but, rather, run in parallel and that they don’t function as standalone meeting platforms. When using this technology, interpreters join the original meeting on a video conferencing platform so they can access the audio and video feeds directly. From there, the interpreters listen to the speeches and deliver the interpretation into the original meeting. However, the interpreters’ rendition is also transmitted into the virtual booth tool alongside the original meeting, so that the interpreters can listen to each other’s interpretation and take relay.
Interpreters and clients may prefer virtual booths to standalone RSI platforms for three primary reasons:
The best RSI solution for you or your clients will depend on your requirements and budget. Here, we have created a brief table outlining the different RSI solutions and their advantages and disadvantages.
|Type of RSI solution||Advantages||Disadvantages|
|Video conferencing platforms without an RSI feature||Low cost|
Use the platform of your choice with a workaround
Interpreters can be present in the original meeting
|Quite complicated setups that often require a tech team |
Limited interpreting features
The larger and more multilingual the meeting, the more complex these are to set up
|Video conferencing platforms with an RSI feature||Low cost|
Basic RSI features
Interpreters can be present in the original meeting
Good event management features
|Often requires tech savvy interpreters to workaround the shortcomings of the interpreting features|
No designated RSI technical team
Large, multilingual meetings can be hard to set up
|Standalone designated RSI platforms||Often have their own interpreter database for booking interpreters|
Excellent RSI features for the interpreters
Can hold the meeting on the platform
Designated RSI technical team
Interpreters don’t need to perform beyond their role
Complicated setup if client chooses to use own technicians (vs. the RSI platform’s tech team)
Interpreters are typically not part of the original meeting but operate in the background
Event management features aren’t as developed as video conferencing platforms
|Virtual booth RSI platforms||Mid-range cost|
Excellent RSI features
Can work alongside the video conferencing platform of your choice
No injection of original video and sound into the RSI platform
Copyright and privacy issues are minimized
Interpreters can be present in the original meeting
|It cannot function as a standalone meeting platform|
No event management features
It can be confusing for interpreters to have to mute their mics in both the original meeting and the virtual booth
Often interpreters need at least two devices to be in the original meeting and the virtual booth
Conference interpreters were in large part forced to adopt RSI during the COVID-19 pandemic as a means to keep their businesses afloat. A couple of years down the line, feelings are still mixed. The audio quality of remote speakers is a major thorn in the foot of RSI interpreters and, despite repeated attempts to educate speakers, the problem persists. At the time of writing, in July 2022, European Parliament interpreters are on strike and reducing interpreting for remote participants due to the poor quality of their audio. On the flipside, other interpreters have expressed their willingness to work from home, in comfortable conditions, without needing to travel left, right, and center for conferences.
In terms of the best RSI solution for interpreters, the answers are manifold. Consultant interpreters and interpreters that have their own clients largely tend to prefer being in the original meeting when performing RSI. By being in the original meeting the interpreter can be visible when needed and can communicate directly with the client and participants. They don’t have to give the client over to the RSI platform but can manage everything themselves, including the team of interpreters. Being a participant in the original meeting is also useful in order to highlight poor sound issues, deal with technical glitches, and even set up the interpreter channels. However, it is important to bear in mind that the interpreter is going above and beyond their role of being “just an interpreter” in such cases. They are wearing multiple hats, including that of educator, technician, mediator, and chief interpreter.
The opposite is also true. There are many interpreters who do not want to step outside their role of interpreter. The situation explained in the previous paragraph adds tremendous cognitive load to the interpreter, who is juggling these different roles, and it could have a negative impact on their interpreting performance. In some cases, less tech savvy interpreters prefer to just log onto the RSI platform and do their thing — that is interpreting — without having to worry about secondary devices, sound mixers, setting up the meeting for participants, and booth channels for interpreters. For the interpreters who prefer this scenario, a standalone designated RSI platform is probably the best bet because there is no interaction with the client and no need to be in the original meeting at all.
As already alluded to earlier on, there has been a lot of innovation on both the RSI and overall VIT scene ever since their popularity went through the roof with the onset of the pandemic. The latest, most relevant developments are discussed in this section.
In the 2021 Language Technology Atlas, we included Zoom for the first time. This year’s Atlas also includes Webex by Cisco as well as the Google Meet simultaneous interpreting extension. While these products do not have interpreting as their raison d’être, they are the largest RSI platforms by simple number of meetings.
In many cases, clients prefer to use regular video conferencing platforms for all of their virtual meetings, including multilingual ones. This is for a number of reasons, of which the most common ones have been summarized below:
While the interpreter experience may be better on designated RSI platforms, most clients don’t prioritize this aspect. Their needs tend to focus on the points mentioned above.
"It is the strength of video conferencing platforms in all of the other areas aside from RSI that makes them so strong even on the RSI scene. As long as they have a functioning RSI feature, they are incredibly competitive."
Rosemary Hynes, Interpreting Researcher, Nimdzi Insights
The fact that tech giants from outside the industry have not just taken note of the RSI boom but acted on it, is yet another confirmation of how strong the demand for multilingual meetings has become. What started with a rudimentary feature on Zoom has taken on a life of its own, with new announcements released all the time by different market players.
In Spring 2022, Zoom added relay to its simultaneous interpreting feature potentially making it more attractive to both clients and interpreters alike. This feature isn’t perfect and it still requires a secondary device or a sound mixer to be able to hear the floor and the booth partner, but it has overcome one of the major difficulties of using Zoom for RSI previously — the complicated workarounds to do relay.
Webex has quite a complete RSI feature which includes an interpreter handover, relay, and a volume mixer to listen to both the floor and the booth partner. The Google Meet RSI extension is perhaps the most primitive at the time of writing because it requires two simultaneous Google Meet meetings to be open at the same time and the interpretation is unidirectional, meaning that the interpreter cannot interpret simultaneously from the interpreted meeting into the original meeting if a participant has a question.
MS Teams released its own RSI feature in August 2022 which was both welcomed and frowned upon by the different industry actors. On the one hand, it was hailed as a game changer by companies that have the Microsoft license and use MS Teams for their meetings. This is because multilingual meetings can finally take place on the platform without having to use a workaround or an integration with a separate RSI platform. On the other hand, like all regular video conferencing platforms that added an RSI feature, its RSI capabilities remain limited. For example:
The MS Teams RSI feature is still in its beta phase, so it is too early to tell how it will hold up in real-life scenarios and what impact it will have on the market. The announcement made waves in the interpreting community and industry circles alike and it was even proclaimed that this could be the death knell for designated RSI platforms. However, here at Nimdzi we believe otherwise, for three main reasons:
While it is too early to tell what the impact on the market will be, for the reasons mentioned above, it is most likely that this latest addition to the RSI marketplace will make a rather small splash after all.
We already mentioned the virtual booth in our section about the different types of RSI platforms. Still, it deserves a mention here, as it is also one of the latest innovations on the RSI scene and one that ties in with another development, namely interpreter videos.
Several studies, including ones by the UN, found that remote interpreting is more stressful for interpreters than onsite work. It was also found that this stress can be worsened when the interpreters cannot see each other. This is because without a visual the interpreters cannot know if their booth partner is present, paying attention during tricky speeches, or even ready to take over the microphone. Of course, the interpreters can message each other, but this adds to the already tremendous cognitive load of remote interpreting.
To overcome this issue, some platforms have provided their interpreters with video as well as audio to enable them to see each other. This reassures the interpreters that their partner is present and may also facilitate the handover through hand signals. While the booth partners can see one another, the interpreters remain invisible to the meeting participants. This way the interpreters still have the privacy of the booth, but with the added value of having visual contact with each other.
The latest RSI platforms tend to have incorporated interpreter videos into their interface and we believe that the more established RSI platforms might also include this feature in their future versions. In addition, the aforementioned virtual booths also enable interpreters in a virtual meeting to see and speak to each other without being heard or seen by the meeting attendees, just like in a physical interpreting booth.
Up until this point, the scenarios described all focus on regular, scheduled virtual meetings, events or webinars. However, for a while now, livestreamed events and interviews have been picking up steam (see Nimdzi LIVE for instance). Let’s briefly clarify the difference between the two.
Webinars and regular virtual events or meetings typically focus on a smaller target audience of invited or registered attendees. Livestreamed events, on the other hand, are designed to reach a much larger audience and can be broadcast to potentially hundreds or thousands of viewers. They also do not necessarily need to be scheduled in advance (although this is also possible) but can happen spontaneously. In most cases, livestreams are broadcast to a company’s or person’s social media and YouTube channels.
The majority of RSI solutions on the market today are focused on the first group — scheduled events. However, tech providers in the VIT space have started to recognize the potential of bringing RSI to livestreamed events. The main challenge to overcome is to make sure that original audio and video as well as the interpreter’s audio output are well synchronized so that there is minimal to no delay.
Akkadu, a China-based RSI platform, appears to be ahead of the game in this regard. Akkadu uses RTMP technology to synchronize the different audio and video feeds so that audiences get the full experience without a delay in the video and the interpretation audio.
To further improve its livestreaming capabilities, Akkadu has recently released its Video Player. This is a livestreaming video player that can be simply embedded into a client’s webpage using an iFrame. In this case, the original video and interpretation audio have already been synchronized by Akkadu behind the scenes using RTMP technology and have been put into an easy-to-use video player for the client. The video player on the client’s webpage can livestream the original meeting and the listeners can choose which language to listen to and even access chat in the video player. It, therefore, has the advantage of being easy to set up, but with the added advantage of being customizable, and already synchronized.
Both in this year’s Nimdzi 100 and in last year’s Interpreting Index we wrote about the need for a Multilingual Meeting Provider (MMP). Many companies already describe themselves as facilitating multilingual meetings — and they do. However, what we typically see in the market are either companies offering RSI or VRI, live captioning or machine interpreting, or a combination of a few of these services. Interviews with market players show that the needs of clients are shifting and buyers are increasingly looking for a provider that can do it all.
"Clients don’t want to go to one company for their interpreting needs, to another for translation and again to another for captioning — and potentially all for the same event. Especially as new meeting formats are emerging all the time and from clients who never used language services before. These kinds of clients do not want to have to think about the complexity of language services. What they want is one provider that can facilitate all their requirements for multilingual meetings and events in the virtual and the physical world."
Sarah Hickey, VP of Research, on behalf of Nimdzi Insights
When we wrote about this before, we asked who was going to fill this gap and in what way (through partnerships, acquisitions, building or buying new tech, adding services, etc.). Since then, Nimdzi has identified two solutions which may provide an answer to this question and which have started filling the MMP gap in two different ways
Bridge GCS describes itself as a virtual events platform that enables immersive and interactive experiences. The platform was designed with multilingualism in mind and offers a myriad of different language solutions, such as RSI, multilingual closed captioning, real-time translation of chats (using MT), as well as AI-powered subtitling. It also offers a localized interface for ten different languages, including localized waiting room messages, automatic emails, and data analytics.
However, Bridge GCS does not just focus on multilingualism but is also very strong on the event management front, providing features, such as real-time analytics, RTMP streaming, backstage communication, breakout rooms, integrations with social media as well as CRMs, direct uploading of videos into meetings, and Q&A moderation.
When the platform was built, it was designed with different roles in mind — interpreter, technician, moderator/presenter, event planner, host, and participant — and provides corresponding features that cater to each one.
This mix of features that enable a multilingual experience (RSI, captioning, subtitling, chats with MT) paired with its superior event management capabilities are the reason we can consider Bridge GCS as being on the road to becoming a true MMP.
vSpeeq is another company that focuses on facilitating multilingual events. However, the company is neither an RSI platform nor an event management platform. Instead, vSpeeq describes itself as an ecommerce platform for language services. The main purpose of vSpeeq is for clients to be able to quickly select and purchase the language services they need for an upcoming event and in this aspect the user experience really does resemble that of ecommerce platforms, such as Amazon and the like. Clients can simply select the language service they want from the website, add it to their cart, and pay for it online. vSpeeq then handles the rest in the background and facilitates the client’s language needs on the day of the event and on the platform the client chooses.
For now, vSpeeq predominantly offers translation services and RSI and has its own pool of translators and interpreters. However, vSpeeq is planning on expanding its offerings to create a one stop shop for language services.
It is important to stress again that vSpeeq is not an RSI platform — the company provides the service, but not the technology. For the RSI technology, vSpeeq has partnered with a third party. This is why we do not list vSpeeq in the RSI section of our Atlas but in the “Platform LSP” category instead.
In a first of its kind acquisition, Boostlingo announced on March 23, 2022, that it has partnered with Interpreter Intelligence (an interpreter management and scheduling platform) and VoiceBoxer (an RSI platform). This shows as two things:
"In the past, RSI moved in its own circles far removed from VRI, OPI, and other areas of interpreting. This is because RSI typically comes with a very different client base, born out of the conferencing sector. Now, however, we can see those circles starting to overlap and new frontiers are on the horizon. Tech providers in the interpreting world are asking themselves ‘Should we invest in developing new software ourselves or buy a competitor who is already an expert in this area?’"
Rosemary Hynes, Interpreting Researcher, Nimdzi Insights
The acquisition of VoiceBoxer by Boostlingo, therefore, is a smart move and will help the company stay competitive at a time when RSI is branching out from the niche into the mainstream. The acquisition is also a first example of how RSI is gradually being considered a standard interpreting service, needed to complete the full package on offer.
Already in other Nimdzi publications, we reported that demand for remote interpreting in the healthcare sector increased by more than 50 percent ever since patients were asked to call before making an in-person visit due to COVID-19 related safety measures.
Remote interpreting in healthcare is certainly nothing new. However, the lockdown restrictions and spike in requests created new challenges. For example, before March 2020, VRI typically only required two channels — one for the interpreter and one for the doctor and patient, who usually were in the same room. However, once lockdowns hit, the situation shifted to all three parties typically being in different locations. This created a need for VRI with three-way call capabilities, which presented a technical challenge for established providers.
In addition, companies offering VRI or OPI reported a spike in requests from telehealth vendors, looking for ways to integrate interpreting into their own platforms. So, also in this segment of the interpreting industry, the race for integrations began.
This trend is confirmed by investment and mergers and acquisitions in this field. For instance, AMN Healthcare — a multi-billion dollar company in the healthcare staffing industry — acquired Stratus Video in February 2020 (rebranded to AMN Language Services). The acquisition and subsequent integration of remote interpreting services into AMN’s telehealth platform happened just in time for the pandemic, which gave the company a competitive advantage once lockdowns hit.
In a similar deal, UpHealth — a telehealth service provider — acquired Cloudbreak Health and its remote interpreting solution Martti in June 2021. The acquisition of Cloudbreak allowed UpHealth to integrate remote interpreting services into their platform, thereby expanding their reach to include people of all language backgrounds and thus increasing their value proposition to existing and new clients. Prior to the acquisition by UpHealth, in February 2020, Cloudbreak had already received USD 10 million in funding.
Last but not least, Jeenie, a US-based VRI platform, raised USD 9.3 million in a Series A funding round on March 31, 2022. The company specializes in healthcare interpreting.
"What all of this shows us is that the demand for interpreting services in the telehealth field continues to grow. The pandemic created the framework for people who resisted remote interpreting to embrace it. And now that the genie is out of the bottle, and providers, clients, and interpreters alike have embraced remote interpreting and are well set up for it, it is hard to go back."
Sarah Hickey, VP of Research, on behalf of Nimdzi Insights
Not much has changed on the machine interpreting front since our 2021 edition of the Language Technology Atlas. However, there are few things worth mentioning and a few others worth repeating.
Some people will insist that machine interpreting does not exist but that it should exclusively be called speech-to-speech translation because the output is not the same as if an interpreter were to assess a speech and give their rendition in another language. Those people are not wrong. However, couldn’t we say the same about machine translation? It’s not the same as human translation and yet we have come to accept the term. The distinction is already being made by adding the term “machine.”
Whatever you decide to call it, machine interpreting or speech-to-speech translation has come a long way. At this point in time, machine interpreting solutions are already ready for use — just not for all use cases. What the machines are good at is processing pure information, so assignments that are more technical in nature or require less nuance are optimal. What the machines are not (yet) good at is conveying emotion, irony, or tone, as well as transferring gender from one language to another. This is where the expertise of human interpreters is required, at least for the foreseeable future.
Two years ago we wrote that the majority of the machine interpreting solutions currently on the market target individual consumers, e.g., in the form of handheld devices for tourists. While this is still true, it appears that the tide may be (slowly) turning. The B2B market for machine interpreting solutions is growing. This is, for example, evidenced in the fact that Worldly has added a Zoom integration for multilingual captions.
US-based company Wordly predominantly provides speech-to-text and speech-to-speech translation for conferences. The latest development now shows that RSI providers are not the only ones catching a ride on the Zoom boom. It also brings us back to the heightened interest from buyers of all industries to make virtual meetings more accessible — in many different ways and on different budgets. And all of this is moving the language industry closer into the focus of the mainstream consumer.
This section wouldn’t be complete without the mention of CAI tools, which continue to be one of the “hottest” new developments on the RSI scene.
The purpose of CAI tools is to be a form of AI-booth mate for interpreters performing (remote) simultaneous interpreting. CAI tools allow interpreters to extract terminology and build their glossary within seconds. During an assignment, CAI tools can also call out figures and names, and instantly convert units (e.g., for measurements and currencies). The goal of CAI tools is to make the interpreter's preparation time more efficient and to ease the cognitive load during the assignment.
Regardless of the turbulating world events, the market forecasts for language technology are generally quite promising. There’s a lot of valuable interest in this sphere from a broader (than a regular localization folk) audience — from everyday users to major investors.
The big IT narrative around LT is well-captured in Meta’s July statement: “Translation is one of the most exciting areas in AI because of its impact on people’s everyday lives”
With the rise of AI, NLP, and MT, language technology is no longer perceived as a peripheral area to the language industry. That is one of the reasons for many other language technology matrices, catalogs, platforms, and lists emerging here and there — in addition to Nimdzi’s Atlas. This trend itself can be considered indicative of the ever-increasing interest in the language technology arena. As visibility is essential to informed decision-making, we are glad to see this increasing popularity of the subject as well as of the Nimdzi Language Technology Atlas itself.
Even though we publish our free report annually, similarly to the behavior of this market, the data contained in the Atlas infographics is subject to change. Therefore, we update the visuals more often than once a year. So let’s stay in touch: don’t hesitate to reach out to us at [email protected] and tell us about your favorite localization tools and new language technology solutions that should feature on everyone’s radar. Let’s join forces to properly track how the language technology landscape evolves in the years to come.
oday, machine translation (MT) is so pervasive that — for many young or early-career localization professionals, at least — it’s hard to imagine a time without it. But such a time did exist. Those with a decade or two of language industry experience under their belt have, no doubt, witnessed firsthand MT’s evolution into the nearly omnipresent entity that it is today.
As of November 2022, everybody in the language industry is talking about ChatGPT. It is an undeniable trend firmly occupying the minds of many. New implementation scenarios and use cases for ChatGPT emerge daily, and GPT-4 has just been released. But will it stay as hyped in the next five years, or will it become as normal as Machine Translation (MT) for us?
The language services industry is a shadow industry that is driving the growth of all global brands. It is a transformation business that does not create anything from scratch but transforms content from all other industries.
Today is International Women’s Day (IWD), a day that originated in the early 1900s as a platform for women to protest against working hours and pay inequality, and for voting rights. Although we’ve seen tremendous progress since then — a century ago, most women in the world lacked the right to vote, and today we have women leading governments — IWD is still accompanied by important protests against continued inequality for women and girls.