Report by Yulia Akhulkova.
The Nimdzi Language Technology Atlas offers a unified view of the modern language technology landscape. This freely accessible report provides readers with insights into major technological advancements and demonstrates the value and potential of technology solutions designed for the language industry.
The Atlas serves as a tool that brings transparency to the langage technology market and helps with related decision-making. Technology providers use the Atlas both to benchmark their competition as well as to find potential partners. Investors refer to it to gain a better understanding of the leading market players. Linguists and buyers of language services turn to it to see what tools are out there that can help them in their day-to-day jobs. It allows students of language programs around the world to discover just how many tools may be a mere click away for use in their future careers.
At the same time, the Nimdzi Language Technology Atlas should be seen as a starting point, a map of sorts. So, let’s open our newly updated guide to the world of language tools and walk this road together.
In this year’s edition of the Nimdzi Language Technology Atlas, we collected data from providers of over a thousand technology solutions. However, not all of them made it to the final infographics which still contains more than 920 tools.
With the boost of Language AI, we had to make a decision to stick to listing more established players in this area. Otherwise, with low barriers to entry and hundreds of new products emerging weekly, the list would be quite hard to put together as a single graphic, let alone to keep updated.
As usual, the data gathering behind the Atlas is based on four main sources:
These sources have given us a comprehensive understanding of the state of technology development in the language industry. So let’s start by reviewing the definitions of key language technologies.
Translation management systems (TMS) are systems that feature both translation (and editing) environments and project management modules. Core components of a typical TMS include:
You can check out more features of a modern TMS using Nimdzi’s free TMS Feature Explorer.
Within the TMS category there are four subcategories:
Unlike TMS, translation business management systems are systems that do not have a bilingual translation environment. They only have management options for translation project enablement. We call such technology a BMS or (T)BMS, since that’s exactly what it does: it helps manage business operations around translation.
At Nimdzi, we coined the umbrella term ‘virtual interpreting technology’ (VIT) to describe any kind of technology that is used to deliver or facilitate interpreting services in the virtual realm. There are three ways in which virtual interpreting can be performed or delivered: via over-the-phone interpreting (OPI), video remote interpreting (VRI), or remote simultaneous interpreting (RSI).
As the name OPI suggests, two or more speakers and an interpreter use a phone to communicate. This is an audio-only solution and the interpretation is performed consecutively. VRI is also performed consecutively. However, in this case, there is both an audio and a video feed. Depending on the VRI solution, users and interpreters either connect via an online platform with video calling capability or via a mobile app. As for RSI, it directly evolved out of the field of conference interpreting and is intended for large online meetings and events with participants from many different language backgrounds. As the name suggests, the interpretation is performed simultaneously — at the same time as the speakers give their speeches.
We have included machine interpreting (MI) solutions in our definition of VIT and are subsequently listing them in our Language Technology Atlas as well.
Interpreter management and scheduling (IMS) systems are also included in our definition of VIT because, even though they do not focus on delivering interpreting services, they facilitate them. An IMS is a useful tool that allows for efficient management of interpreter bookings for both onsite and virtual interpreting assignments.
You can check out more features of VIT systems using Nimdzi’s free VIT Feature Explorer.
Also known as automatic speech recognition (ASR). The section features solutions that focus on automatic transcription and automatic captions and subtitles. Many of the solutions listed here provide both options. However, as there is not 100% match between these two groups, we subdivide this category into two subcategories.
Here, we feature various tools and platforms for audiovisual translation enablement: from project and asset management tools to AI-enhanced dubbing tools.
The section discusses major MT engine brands subdivided into four subcategories based on the MT providers’ specialization. Historically, we also feature MT toolkits here separately, in case you’re adventurous enough to experiment with this area yourself.
Here we list systems that integrate other systems with each other. The middleware subsection discusses major companies that specialize in integrating various language technologies.
The products in the MT Integrators subsection not only provide smart access to MT engines, but support certain procedures around MT so that users can leverage MT in the best way possible.
This section is devoted to quality management in translation. It features three separate subcategories which correspond to three main product types in this area: QA tools, review and evaluation tools, and terminology management tools.
In this section, we feature platforms and marketplaces focused specifically on translation, interpretation, voice, and localization talent. In a marketplace, you can post a job and accept responses from linguists and other professionals who are interested in doing the work for you. Then you book this talent or directly assign the job to the chosen talent within the platform. If you’re a linguist, you sign up and set up your profile in the system, get vetted and/or tested (on some marketplaces), and then start receiving job offers.
There is also the platform language service provider (LSP) option where you not only get access to a library of linguistic resources and agencies, but also to the workflows for the projects along with PMs who support you. You can upload your files to the platform, get an instant quote, and after quote approval and project completion, receive the localized files.
In the 2023 edition of the Nimdzi LT Atlas we added a new, 10th, category: Multilingual Content Generators (MLCGs) put together by Nimdzi’s Lead Consultant Laszlo Varga. Why did we include this category in this year’s release?
The release of ChatGPT and the subsequent proliferation of similar large language models (LLMs) brought new, broader attention to language technologies. With the rise of AI language technology, specifically generative AI tools, multilingual content creation is now at everyone’s fingertips. This means social media content, case studies, product descriptions, blog posts, sales copy, marketing messages, and more. Generative AI tools can help content creators draft a first version, eliminate writer’s block, tailor drafted content to tone and style — or, as Canva’s Magic Write puts it: “Your first draft, fast.” As the latest LLMs support more than 25 languages, content can now be created simultaneously in multiple languages based on a single content brief.
While AI-supported content creation has been around for a few years, it was focused mainly on English content. Players such as Jasper, Copy.ai or WriteSonic had been providing such services before the public hype around OpenAI, and the release of GPT-3 by OpenAI in June 2020 marked the beginning of a boom in this segment. Around that time, multiple content generation startups were founded and released their first products, with various points of focus — and success.
The next big jump in the proliferation of these tools was (again, by OpenAI) triggered by the release of GPT-4 and the plugin ecosystem behind it. Suddenly the path opened for many technology companies to create tools and products supporting the generation of multilingual content. Nimdzi has been keeping an eye on the rapid progress in this field, which is buzzing with activity both by language industry experts and non-language players.
An interesting side-effect is the coming of single-language content creation tools. There are various products focusing on their local market only. For instance, you can find a custom-built tool for Brazilian Portuguese or Spanish that offers no additional language support, even though (since there’s an LLM under the hood) the technology used to build them would be capable of it.During our research, we observed a “platformization” of LLMs. Microsoft partnered with OpenAI and its GPT-family models, Google released its own models (Bard, PaLM2) as well as integration and customization toolkit on its cloud platform, while Amazon not only supports the open-source AI hub HuggingFace, but also hosts its own machine learning platform on AWS. In addition, other API-enabled platforms such as Cohere, AI21, and Anthropic also offer the promise of integrating LLMs into a wide variety of use cases, including multilingual content creation.
"The MLCG market is both converging and diversifying at the same time. As the underlying foundational tech (LLMs) is in many ways the same, most players offer a plethora of features: many languages supported, (M)SEO support, templating, etc. This makes it hard for customers to differentiate between the various tools. However, on top of this convergence, there is some visible diversification, especially when it comes to the ability to support enterprise customers."
Laszlo Varga, Nimdzi Insights
The availability of an API, integration with CMS, and/or a set of solid security compliance features typically signal that the tool is more oriented at large multinational corporations than small and medium business customers.
There is some consensus in the industry that multilingual content generation has the potential to replace the traditional create-translate-publish content cycle. In Nimdzi’s view, such tools ultimately create new opportunities in the language industry for service providers, technology companies, and language professionals alike.
Language technology providers (LTPs) are already integrating and customizing LLMs for various use cases, and content generation is certainly one of these. Language service buyers turn to their LangTech providers for both expert advice on, and the potential implementation of LLMs, or may opt to simply buy into a productized version that LTPs can also create.
LSPs — other than fearing the tools — see them as a door opener to content creation: to go upstream for revenues from marketing and product teams at their clients. Ultimately, human-in-the-loop is still the standing paradigm when it comes to AI.
Finally, very importantly, as these tools are based on foundational LLMs, they need to be checked for accuracy, bias, and consistency. Similarly to MT engines, the output of MLCGs is “raw”, and, at least for business-critical content, likely requires an expert editing hand. This is where the talent of language professionals will keep being critical in the new AI world of multilingual content creation.
With the introduction of the MLCG category, we are entering somewhat dangerous grounds, as it will not be easy to maintain the list of such tools up-to-date. There is a realistic prognosis that by the time you read this report, this section will already need major updates. This is indeed one of the most dynamic categories tracked on our radar.
However, it’s not only content generation that’s booming: everything, even slightly touching AI, is booming. In 2023, the market is flooded with technology solutions coming from outside the industry. Basically, startups just need a mix of ChatGPT and a bit of creativity to release a new product.
"Able to combine Whisper.AI with ChatGPT? Congratulations, you’re a language tech startup! We’ll have to wait and see, though, how many of those fast-born tools — which are in fact nothing but a thin layer on top of ChatGPT — will stay on the market over the long term."
Yulia Akhulkova, Nimdzi Insights
This question becomes even more relevant if we take into account reported degradation of ChatGPT performance. In Summer 2023, users started complaining about ChatGPT slowing down. Further on, researchers from Stanford University and UC Berkeley made a big splash in LLM circles (and beyond) with their paper on “How Is ChatGPT’s Behavior Changing over Time.” They evaluated the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on several diverse tasks and came to the conclusion that the performance and behavior of both models can vary greatly over time.
There were also several follow-up articles published after this viral paper that aimed to highlight the fact that none of it suggests a degradation in GPT capability. In other words, everything in that research is consistent with the behavior of the models changing over time. But daily users continue complaining, and the ultimate existential questions still stand: when we witness degradation of ChatGPT performance, is it because of human stupidity overriding machine intelligence? Can business rely on fluctuating capabilities of AI-powered applications? And how sustainable, then, are those hundreds of newly developed solutions based on GPT models?
Speaking of sustainability, as we further explore in the Nimdzi special “Focus on AI” section, another rapidly developing category is audiovisual technology. Our “Audiovisual Translation Tools” category has 142 solutions inside it, with the "AI dubbing" subcategory being one of the most dynamic areas on the LT market (quite expectedly). In fact, it grew 40 percent, as compared to last year.
Companies operating in this area are also growing fast. XL8 (founded in 2019) started off as a machine translation company, but in just a couple of years grew their product suite to also AI-enhanced dubbing, machine interpreting, speech recognition tools, and already raised a total of USD 13.7M in funding over three rounds.
Still, there are companies who weren’t able to ride the innovation wave efficiently enough and left the market, Dotsub being one such name.
Speaking of losses, while the TMS subsegment is still the largest in the language technology arena, we had to remove 18 percent of the companies from the “Generic TMS for every customer profile” subcategory (as compared to 2022). Those removed companies are no longer operating, for various reasons (including such unfortunate events as the death of a TMS creator). Several new names were added, though, even to multiple subcategories within the TMS category, so the total number of tools featured is still greater than last year.
"The dynamic nature of technology in these categories spells a future of rapid sequential improvements and what are standalone products today will certainly become mere features in the future."
Renato Beninatto, Nimdzi Insights
In the TMS space the key players stayed the same as in the recent years. We looked at the top 13 brands in this arena in an attempt to evaluate how a product’s age correlates with its innovative capabilities.
Source: Nimdzi TMS Compass
New additions are seen in the TBMS category, as LSPs are still searching for cheaper alternatives to market leaders (Plunet and XTRF). Some refer to companies like Lingo Systems for custom TBMS development.
Others, like Unicorn A-M-P, develop their own TBMS from scratch. A-M-P in the tool name stands for advanced management platform. It is integrated with memoQ, as this was the TMS of choice for the solution creators. At the same time, the tool already boasts a vast feature set and ISO/ASTM adherence. The platform has a market appeal beyond the language industry (with the focus on broader business management and not just translation management), featuring the ability for an asset inventory to be integrated with the platform and an interconnected knowledge base.
Another featured TBMS platform is coming from the years of experience of Mondia Technologies, a group of LSPs. They are publicly launching their Traduno system in September 2023, thereby commercializing a product developed in-house.
The trend of first building something inside an LSP for that LSP’s own purposes and then bringing it to the market is still quite relevant for the industry. The same scenario is also true for growing technology providers that do not fall into any of the LT Atlas categories, yet still deserve a mention. One such example is Skrapling, a web crawling tool. Nimdzi’s partners ask us quite often what tools there are for website localization, and specifically, what are the easy ways to calculate a website’s word count. Skrapling is a new addition to the suite of such tools. Originally developed in a small LSP, it is now available for broader use. It helps users quickly understand how much it costs to translate a website — without the need for time-consuming copying and pasting the content from the website or referring to larger LT companies whose core service is actual localization, and not calculation/quote preparation (even though they are able to provide this service). There are still things that Skrapling would like to refine and improve, but it's already working for LSPs and translators, and not only for word count calculation. Other Skrapling functionalities are helpful for scenarios a bit similar to LLM implementation: Skrapling can be used for extracting content for the purpose of creating glossaries or TMs, as well as for quick reviews of localized text via extraction and spellchecking, cleaning the whole content for terminology inconsistencies, and so on.
As for machine translation, which was the primary topic of conversation before the rise of ChatGPT, few-shot learning is worth noting here. Google Research is successfully experimenting with it, demonstrating the potential of few-shot translation systems trained with unpaired language data, for both high- and low-resource language pairs. They pretrain language models on monolingual data of two or three languages, do in-context learning with five translation pairs and, as a result, (are said to) outperform the best performing system in the task of English-Chinese news translation. The resultant models are two orders of magnitude smaller than state-of-the-art language models. Furthermore, the few-shot paradigm provides a way to control certain attributes of the translation, such as regional varieties and formality, using only five examples at inference, “paving the way towards controllable machine translation systems.”
From research to actual daily application, the “Custom/Trainable” category features most of the new additions to the whole MT space among the four MT categories in the Atlas, growing 30 percent as compared to last year. It most likely has to do with a widely shared notion of “customization is key,” which we will explore in more detail further in the report when we talk about quality. Newly added names here include, among others, Bering Lab, VERTO NLP, MT4client™ KERN, STAR MT, and ULG MT.
In other MT-related news, in July 2023 ModernMT introduced Trust Attention, a novel technique developed by Translated that links the origin of data to its impact on translation accuracy. The idea was inspired by the human brain's ability to prioritize information from trusted sources: ModernMT now uses a weighting system to prioritize learning from high-quality data reviewed by professional translators over unverified content. This approach is different from a “regular” one, where during the training process MT engines are not able to distinguish between trustworthy data and lower-quality material.
Based on data from over 700 language technology providers
The advent and subsequent evolution of LLMs marks a significant turning point in the language services industry. These innovative technologies have the potential to revolutionize translation, localization, and content creation, offering many possibilities for language professionals and businesses alike.
The use cases for LLMs in the language services industry are vast and exciting. From content creation and rephrasing to summarization and multilingual support, LLMs have already demonstrated their value in improving efficiency and generating high-quality outputs. However, it is crucial to balance the advantages of LLMs with human expertise and cultural nuances to ensure accurate and contextually appropriate translations.
"Given the rapid proliferation of LLM platforms, it’s difficult to test and vet models for individual use cases, unless there’s an engineer assigned to do just that. This, coupled with the multitude of options available for combining LLMs with other pieces of technology, requires dedicated focus from teams experimenting with GPT-like models."
Laszlo Varga, Nimdzi Insights
As we look to the future, the language services industry must navigate a set of challenges. Quality and accuracy, data privacy and security, fine-tuning and customization, ethical considerations, and the regulatory landscape are among the key concerns that need to be addressed.
A lot of language industry specialists spontaneously think of automated translation (and MT in particular) as the main use case for generative AI and LLMs. Nevertheless, it's important to mention that, while MT has been around for several decades, LLMs haven’t exactly been built with translation in mind. They are not exclusively trained on translation data, and they are trained to always provide an answer, even if it doesn’t make sense. Moreover, LLMs can hallucinate, may produce biased and harmful content, and sometimes lack logical reasoning and factual accuracy. A combination of ingredients that isn’t exactly aligned with the typical requirements for translation technology.
Source: ChatGPT & LLMs – Separating Fact from Fiction for Localization
Still, LLMs offer a much broader range of potential applications than translation-first systems. Examples include (but are not limited to):
LLMs bring two innovative assets to the language technology space. On one hand, they have the ability to incorporate context and reference data. On the other hand, they excel at following instructions from natural language descriptions. This combination makes them extremely powerful and opens up possibilities for augmented MT. By harnessing glossaries, style guides, and a myriad of client-specific or project-specific instructions, LLMs can produce, for example, spotless gender-inclusive translations.
Furthermore, LLMs possess the potential to generate translations without solely relying on parallel corpora. This versatility and flexibility become especially valuable in scenarios where sufficient parallel data for a specific language pair or domain is lacking.
However, it is important to note that LLMs are not (yet) a direct replacement for traditional MT systems. They excel at generating high-quality output but may still exhibit errors or lack domain-specific knowledge.
"The future of automated translation lies in discovering synergies between the versatility of large language models and the sustainability of machine translation. By leveraging the strengths of both approaches in conjunction with human expertise, automated translation presents an excellent use case for the language services industry. Furthermore, it has the potential to open doors towards redesigning other traditional processes, from authoring and review to quality assurance."
Jourik Ciesielski, Nimdzi Insights
So, LLMs can assist with content creation, rephrasing, and summarization tasks. They can be trained on specific data to adapt to industry-specific terminology, writing styles, or even brand voice. This versatility extends beyond translation and offers new opportunities for content generation and localization.
As we explain in the Language AI Alphabet article, all generative LLMs are GenAIs, but generated output can be more than just text. Diffusion models such as Midjourney, Stable Diffusion, DALL-E, and Bing Image Creator create images, from text or even image input.
Source: Nimdzi Language AI Alphabet
On top of that, conversational AI capabilities are driving significant growth in the worldwide contact center market in 2023 with multilingual chatbots becoming more popular than ever. However, research shows that, while chatbots have their place, most customers still prefer interacting with human agents, especially for complex issues requiring empathy and understanding. Though chatbots speaking a customer's native language can increase perceptions of empathy, as many previous studies like Nimdzi’s own Project Underwear prove.
More broadly, studies predict that, by 2025, most enterprises will be using GenAI not just for communication assistance, but also for writing and editing. Companies are adopting it to solve specific business problems and gain competitive advantages. The capabilities have matured to the point that business leaders recognize the potential impact on operations.
GenAI is definitely transforming the way work gets done in our industry. Language services industry representatives who Nimdzi interviewed in 2023 confirm they already use GenAI daily for written communication, from polishing up sales emails to creating marketing presentations and decks in minutes. Others go further than individual use and already benefit from use cases like “Enterprise Search” (when an organization’s whole knowledge base is transformed into an interactive platform where one can ask custom questions relevant to a given organization’s data) with the help from companies like sintetic.ai or custom.mt.
"While the localization technology space has crystallized, slowed down, and approaches stagnation, the gold rush in generative AI is on: investors poured more than USD 21 billion into startups, and probably even more investment is being made by tech giants such as Google, Apple, Nvidia, and others. If you're not part of that wave, you're missing one of the biggest transformative opportunities of our generation."
Konstantin Dranch, Custom.mt for Nimdzi Insights
Another working example of leveraging internal knowledge base has been developed by RWS. In addition to their OpenAI Translator app for Trados Studio, RWS has enabled a smart knowledge access box in their community forums. To deliver high-quality answers, a LLM relies on a comprehensive and carefully organized solution database of Trados documentation. Now, when users go to the RWS Community website and interact with a chatbot, it is actually leveraging the LLM so that users don't have to invest time searching through and reading the documentation. Essentially, one can just query the chatbot to learn how to do any number of things in Trados products. In addition to the community website, RWS is building that functionality into the product platform as well.
Under the hood, LLMs are prediction engines — with extremely large datasets and probability computations. Typing a prompt, we are basically asking the model to predict the next thing we want it to do. When we task LLMs with performing comparative efforts, results are often better than with generative tasks, as there’s less predicting involved. With summarization, extraction, rewriting, and classification type tasks, we are providing the model with the majority of the needed data, hence there’s fundamentally no need to predict anything brand new and the output quality is usually fit for purpose.
In fact, (AI) quality prediction and quality estimation are not new buzzwords in the LangTech world. For example, machine translation quality estimation (MTQE) has been here for a while with known solutions by ModelFront, TAUS or Unbabel. Quality estimation (QE) is an automated method for evaluating the quality of (machine) translation without the need for human review or reference translations. MTQE is driven by models that are trained to assess the accuracy of machine-translated content and is intended to predict how much post-editing a specific MT output requires.
In the MTQE space, on the TMS arena there are established solutions by Phrase and Smartling. New additions include MTQE in Honeybee by Centific and AIQE by memoQ. With the introduction of the new memoQ-TAUS integration, its users can now benefit from AI-based quality estimation when working with MT. Quality predictions are displayed on the translation editor and aim to enhance translation workflows. TAUS has also already been integrated with Blackbird. With TAUS DeMT Estimate API, more integrations are expected. It integrates with content workflows and provides real-time quality scores that can be customized with the organization's data, ensuring that the results match needed requirements for accuracy, tone, and authenticity. Custom scores are provided based on the specific language, terminology, and style used within a given organization.
According to TAUS representatives (who actually started data collection back in 2008), customization is key. In fact, the ability to customize the scoring model is one of the core features of DeMT Estimate API (though a generic QE model is also available off the shelf). This is achieved through an offline process where a team of NLP engineers collaborates with the customer to fine-tune the model based on needs and requirements of their data. Here’s an example of how it can work: based on sample texts from the client, the NLP team generates a large dataset and customizes a quality prediction model built on sentence embeddings. The quality scores produced by the custom model are then compared with samples of human translations to identify the correlation between how the model scores the translation and what a human specialist determines to be good quality.
Overall, in addition to reducing the time and effort needed for post-editing, QE offers other benefits, including (but not limited to) providing feedback to MT systems to improve their quality, comparing different MT engines to find the best fit for specific purposes, scaling up MT usage across new languages and domains, and risk management of publishing fatal errors in raw MT.
The data that is used to train MT models can be reproduced to train MTQE models. That way, it generates an additional layer of intelligence on top of regular MT systems. This presents an opportunity for localization managers with lots of MT-ready content such as ecommerce product owners.
As AI is advancing at an accelerated pace, new solutions are entering our everyday lives. Things that seemed impossible are suddenly becoming a reality — like speech-to-speech translation. In the language industry, there are two main fields in which speech-to-speech technology is coming onto the radar: machine interpreting and AI dubbing.
Machine interpreting (MI) is the transmission of a spoken message in one language into a spoken message in a different language using automatic speech recognition (ASR), followed by AI transcription, machine translation, and finally a synthetic voice to speak the message in the target language. This so-called cascade model is what all MI solutions on the market to date are built on.Current solutions can best be divided into two sections: handheld devices, earbuds, and apps for individuals (e.g. tourists), and software designed for business purposes (e.g. adding more languages to large-scale events or dubbing corporate training videos at a lower price point). Aside from what we reported on in previous publications, the main development worth highlighting when it comes to MI is that providers of remote simultaneous interpreting (RSI) platforms (whose core business model to date was focused on facilitating interpreting by human interpreters) are starting to come into this space.
In January 2023, KUDO was the first RSI platform to release its own MI feature (which was awarded a patent in August 2023). In May 2023, Interprefy followed suit, and we can expect other large RSI platforms to jump on the same bandwagon to stay competitive — with each other but also with the video conferencing giants out there. Platforms such as Zoom, MS Teams, and Webex that have all added their own RSI features alongside other language access tools (e.g. closed captions and machine-generated live subtitles) are typically superior when it comes to event management capabilities and price point. So offering MI can be a smart move for RSI providers to round out their portfolio and reach clients with smaller budgets.
Dubbing is one of the bread and butter services in the media localization industry and, to date, one that is (almost) exclusively performed by voice actors. However, the latest developments in AI dubbing might change the landscape.
AI dubbing has come a long way thanks to significant advancements in synthetic voice technology. Noteworthy AI dubbing tools currently on the market come, for example, from Deepdub, Voiseed, Matedub, and AppTek (among others), and new product releases are popping up all the time. While most AI dubbing solutions to date are not ready to be used for entertainment purposes (yet), AI dubbing is already being used for international broadcasts, voiceovers for documentaries, and dubbing corporate videos. In addition, media and game localization providers, as well as traditional LSPs are exploring further use-cases (see here).
Although the exploration of this technology is ongoing, it is significant that the idea of AI dubbing has transitioned from being rejected to being actively researched and considered by major media localization players.
While we have segmented these tools into MI and AI dubbing because of their intended use cases, both kinds of solutions are based on the same technology, which we can group under the umbrella term of speech-to-speech translation (S2ST). S2ST is something that is being developed in many different fields, ranging from machine translation providers, to media localization companies, to tech companies from outside the language industry.
Aside from improving the quality of the translation, the latest developments in S2ST focus on retaining the original speaker’s voice in the AI output to make the synthetic speech sound more authentic. This is also commonly referred to as voice cloning and true for both the fields of MI and AI dubbing. A noteworthy solution (and good example of how this field is expanding) comes from Ericsson — a company from outside the language industry that does NOT provide the actual interpretation/translation. Instead, Ericsson takes the audio (and optionally video) files with the interpretation/translation (either human or machine generated) to then provide AI voice conversion and/or AI video dubbing.
In both cases, the convergence can either happen in post-editing or live with an API (e.g. integrated into an interpreting platform). The current latency is between 0.5 and 1.0 seconds.
Another noteworthy development in this field comes from Meta AI. In June 2023, the company introduced Voicebox, a generative AI model for speech. The Voicebox model and code are not being made publicly available at this time. The company writes: “While we believe it is important to be open with the AI community and to share our research to advance the state of the art in AI, it’s also necessary to strike the right balance between openness with responsibility."
Voicebox can perform a number of (synthetic) speech-generation tasks, including:
Although similar, there are a few key differences between the solutions from Ericsson and Meta AI. While Ericsson uses audio files for its AI conversion tasks, Voicebox predominantly is a text-to-speech generator, using text files as the main input source. Naturally, both solutions use audio files for the voice cloning aspect but there is a difference in that Ericsson uses the audio from already interpreted speeches plus the original speaker’s audio, whereas Voicebox uses translated text in combination with the original speaker’s audio. The results are similar, but the difference in input source could ultimately mean that different users will adopt the solutions depending on the use case (e.g. live conference vs generating audio and video files from purely text-based files). This is also true because of the other tasks each solution can perform: where Ericsson has its own lip sync application for dubbed videos, Voicebox can perform a number of text-to-speech tasks in addition to voice cloning.
"With the advances of AI, we are starting to see crossovers between segments of the industry that used to be quite separate for a long time, as well as pure tech players from outside the industry coming onto the scene."
Sarah Hickey, on behalf of Nimdzi Insights
Another important aspect of voice cloning is its malicious use. What have been previously only seen as scam schemes in movies has become a dreadful reality. Hand in hand with pros and benefits of voice cloning come the ultimate cons, when people are becoming victims of this technology, scammed by AI voice cloning applications.
CAI stands for computer assisted interpreting and refers to tools that facilitate the interpreter’s work by providing real-time translations, numbers, names, and unit conversions on screen. There are a few such solutions available, for example smarterp&me, INTERPRETBANK, and Cymo Note. Typically CAI tools are used by interpreters who are interpreting simultaneously in a remote environment, for example on an RSI platform, but Cymo Note can also be used to facilitate consecutive interpreting. The tools function via a mix of AI solutions (e.g. speech recognition, speech-to-text translation) and manual input from the interpreters in preparation for the assignment (e.g. a glossary).
Effective CAI tools can benefit interpreters through reduced preparation time. For RSI providers (as well as LSPs), having a well-functioning CAI tool means being able to offer clients a wider pool of interpreters at a moment’s notice as even new interpreters on the roster can be brought up to speed quickly thanks to adapted glossaries and the assistance of a CAI tool during the assignment. In addition, CAI tools function as an interface that allows for client-interpreter exchange of documents, removing the need for an intermediary from the provider side, thus saving cost and further increasing efficiencies.
However, CAI tools are currently underdeveloped: they can be costly for interpreters, have limited customization and automation capabilities, and require time to learn. As a result, efficiency benefits remain limited and adoption is low. For CAI tools to realize their potential, further investment and development are needed. While we do not have the answer as to who will invest in CAI tool development, without it, it could easily become the type of technology that just never got off the shelf.
"Interpreting technology providers, in particular RSI platforms, are beginning to see the added value of having a more efficient interpreter workflow, and interpreters are increasingly adopting new technologies."
Rosemary Hynes, Nimdzi Insights
More on the (video) conferencing subject, let’s not forget live subtitling. The service involves taking spoken content and converting it to written content in multiple languages with minimal delay. Live subtitles are used for online meetings as well as in live broadcasts, at onsite events, and to make radio content accessible online.
As we mentioned earlier this year in Nimdzi 100, even for live broadcasts, a mix of human and machine is becoming the norm in an effort to increase speed and efficiency. Ever since the Zoom boom, the use of purely machine-generated live subtitles has increased both as a result of higher demand and advances in MT.
Similarly to the wider multilingual meetings space, the providers of live subtitling services are coming from different sides of the industry:
In other words, with the advances of AI, we are starting to see crossovers between segments of the industry that used to be quite separate for a long time.
We concluded one of our previous reports (from pre-ChatGPT years) with a semi-joking remark on “who knows, maybe the next report will be written by a machine.” And here we are in 2023, with all our hands-on LLM expertise, still writing 7.5 K words from scratch, using no AI help. We interview and quote our own human experts and colleagues, and we thank everyone who contributed to this year’s edition of the Atlas.
Is this happening because despite all the AI-driven developments in the language technology arena, we are leaning towards human empathy and human touch? Indeed, “no human-in-the-loop” is already facing some resistance, and according to above-mentioned research, most people prefer to interact with a human rather than a chatbot, with customer satisfaction scores significantly higher for human agents compared to (even multilingual) chatbots.
However, there’s no doubt that the role of AI, from customer service to practically all customer experiences, is huge. Moreover, attitudes towards AI might change in the future as people become more accustomed to AI interactions, and younger users already have a more positive outlook on AI.
"LLMs are increasingly used to create insights and uncover new perspectives when managing projects, sales, and service delivery. They help businesses and people ask better questions, learn, and solve problems."
Roman Civin, Nimdzi Insights
While it’s safe to approach AI with both caution and optimism, there’s no time left just thinking about it, as there needs to be action. If you are not yet a prompt engineering master, you should become one. Knowing what tools are being developed in the language industry, how they leverage AI, and getting proficient with them now, is a must for any language professional. Recognizing AI potential in language technology and building further understanding and clarity about its impact on our lives is something everyone needs to be doing. And Nimdzi is here to help you with that.
"It's important to remember that the language industry is focused on transforming pre-existing content, rather than creating it. The technologies discussed in this report serve to support the services provided by language service providers (LSPs). As content continues to evolve, it's expected that new services and technologies will be developed. It will be interesting to see how many more logos the Atlas will feature in the coming year."
Renato Beninatto, Nimdzi Insights
Even though we publish our free report annually, similarly to the behavior of this market itself, the data contained in the Atlas infographics is subject to change. And we update the visuals much more often. So let’s stay in touch. Don’t hesitate to reach out to us at [email protected] and tell us about your favorite localization tools and new language technology solutions that should be on everyone’s radar. Let’s join forces to properly track how the language technology landscape evolves in this new human-in-the-loop world.