Report by Yulia Akhulkova, Sarah Hickey and Belén Agulló García.
In this year’s edition of the Nimdzi Language Technology Atlas, we collected data from providers of more than 750 technology solutions.
The data gathering behind the Atlas is based on four main sources:
These sources have given us a comprehensive understanding of the current state of technology development in the industry, which we are now eager to share.
Compared to the 2020 version of the Nimdzi Language Technology Atlas, where the number of mapped individual products reached 660, for this year’s version we studied over 770 tools. That’s why we had to make the difficult decision not to feature smaller technology groupings such as TMX editors, term banks or internationalization tools on the main infographics. At the same time, our audiovisual translation section grew significantly to include two new subsections, remote recording and AI-enhanced dubbing tools.
As a result, the Atlas 2021 represents the following major groupings of tools:
Let’s have a look at what has changed in language technology over the past year.
We are approaching the golden age of AI-enabled language technology. After playing around with images for quite some time, the titans of machine learning (ML) have turned their full attention to the localization and translation slice of the AI pie.
The language technology landscape is following digital transformation trends and evolving speedily. As Nimdzi has previously noted, last year saw several major releases of new AI models and projects centered around content creation and summarization. The main goal of these systems is generating, communicating, and understanding language.
Both text summarization startups and larger companies got in on the act. For instance, in December 2020, Facebook announced a project codenamed “TLDR” (too long, didn’t read) with the goal of reducing articles to bullet points and providing narration options. Summarization is used mainly for the purpose of getting the gist of the content and obtaining insights quickly. It also has applications for speech recognition. Both speech recognition and text processing are seen as ways of improving information retrieval and analysis.
Further, Google’s two newest language models are already capable of holding human-like conversations and tackling complex search queries. In May 2021, Facebook released code aimed at building speech recognition systems using unlabeled speech audio recordings and text. With the goal of reducing the dependence on annotated data, Facebook AI introduced three new models, including wav2vec Unsupervised (wav2vec-U), which learns from recorded speech and unpaired text — eliminating the need for transcriptions. Such research is especially helpful for languages that do not have sufficient collections of labeled training data.
Facebook’s researchers didn’t stop there in their “self-supervised AI” game of not using models with supervised data. In their fresh-from-the oven TextStyleBrush research project, a StyleGAN2 model is used, which helps transfer source text style to the target. The system can do this using only a single-word example as input. Given a detected text box containing a source style, technology extracts an opaque latent style representation, which is then optimized to allow photo-realistic rendering of translated content in the initial style.
But even with this “self-supervised learning” trend in mind, data annotation and labeling remains as prominent a service as last year. For example, for companies that want to optimize their end users’ search in different languages, the common method is to crowdsource data annotation and leverage a multilingual community of skilled experts (who help create data that’s used for training the AI model). Text annotation, image annotation, audio annotation, and video annotation are services now offered by many LSPs.
There is interest among customers to ride the AI wave and leverage MT in their workflows. But to bring natural language processing (NLP) to the enterprise forefront, every part of these AI-enabled workflows needs to be integrated — and integrated smoothly, to boot. Integration is a core topic in this area and it not only means connectors to popular support systems, CMS, and TMS, but also focusing on switching the collective mindset of the localization industry from human to human-in-the-loop.
While the annual Nimdzi 100 survey shows that machine translation and post-editing score as the second most popular service provided by LSPs, overall hesitation about the benefits of MT for particular scenarios as well as a good level of uncertainty about MT implementation continue to linger.
When asked about the main challenges that have held customers back from using MT in an organization or on particular projects, respondents outlined the following top five concerns:
Further, as the general quality of MT is going up, the cost of post-editing is going down, which leaves the linguists questioning and resisting the method itself. However, to prepare data, curate MT training, evaluate MT engines, and fine-tune the MT processes, human specialists are still needed. To make their life easier, new toolkits for complex MT curation have emerged. It may seem that by using clever modern tools almost anyone can create a valid custom MT engine. For the solution to really work though, experience and pointed expertise are required. That’s why technology providers in this area now offer various customization options for both a do-it-yourself approach as well as professionally guided MT customization.
But before even launching MT, it’s important to manage expectations and understand what outcome can be achieved for different content types and language pairs (the MT engine performance varies significantly depending on multiple factors) and how exactly it’s going to be measured — and which tools will be used for the purpose.
As noted above, the inability of buyers to estimate and quickly evaluate MT performance used to present a challenge. But we see improving attempts in different types of language software (Intento, Memsource, ContentQuo among others) aiming to help with MT evaluation and MT engine selection.
Just so that we’re on the same page: there are automatic evaluation metrics and then there’s quality estimation (QE), a method for predicting translation quality.
QE now remains an active field of research. For instance, there are known open-source QE frameworks like QuEst, one of the first of its kind, deepQuest (that can generalize from word and sentence-level QE results to the document level), and OpenKiwi.
As Nimdzi noted before, the MT sector has already been using quality estimation features and new ways of leveraging MT output to predict quality. Language I/O is an example of a modern technological way to predict translation quality.
One of the best-known technologies in the field is ModelFront — a system that instantly predicts whether or not a translation is correct and provides a risk score for each translation. At ModelFront, they use a binary classification when determining the post-editing effort (approved vs. post-edited) and graph the correlations across the data to make predictions and find outliers.
TMS providers, such as Memsource, monitor and analyze MT data as well. With support for over 30 different MT engines, Memsource uses this data to both develop its features and provide insights for their customers. By tracking the post-editing effort required for individual MT segments, Memsource is able to automatically provide segment-level predictions of quality through its MT Quality Estimation feature. Its MT Autoselect feature uses similar performance data to recommend the best performing engine for a user’s document based on its domain. Both these features are available through its MT management hub, Memsource Translate.
MT data in Memsource are also made available to customers. Memsource publishes a quarterly report on the general performance of MT in its TMS, with more advanced insights available for individual customers through its integration with Snowflake. This connector allows full access to performance data, which includes metrics like editing time and MT scorings like BLEU and TER.
While MT Quality Estimation by Memsource automatically scores MT output quality at the segment level, allowing users to find out what needs to be post-edited and what doesn’t, another technology company focused on quality, ContentQuo, provides “adequacy-fluency for MT engine evaluation.” This functionality helps assess each segment as a whole using simple ratings. They also help automatically get multiple edit distance metrics on any MT post-editing done inside or outside ContentQuo.
In 2020, ContentQuo introduced the concept of Augmented Quality Management named Ada, which, in 2021, made it to the finalist round of the Process Innovation Challenge (PIC) #9. The tool automatically schedules and launches quality evaluations for MT engines and/or translation vendors based on set budgets, strategic priorities, risk tolerance levels, and previous quality metrics. Based on a defined strategy, Ada decides which translations will go through quality evaluation, who will evaluate them, and how much budget to spend. The technology adjusts based on the gathered data.
The NLP and AI trend influenced the decision of many LSPs to shift their focus from language services to multilingual data provision. One example provided above is that LSPs are involved in data annotation.
Back in the 2020 edition of the Nimdzi 100, we mentioned that large players like TransPerfect, Lionbridge, and Welocalize had identified AI support services, such as data annotation and data labeling, as growth drivers. TransPerfect, for instance, has developed its own platform, Dataforce, which is designed to streamline its rapidly growing AI division. Dataforce provides a cloud-based solution to coordinate the efforts of over 1,000,000 globally distributed experts involved in making AI systems smarter in multiple languages. And there’s also DataForce Contribute, a mobile data collection app for iOS and Android.
Some LSPs are eager to leverage the existing language data assets (bilingual files, termbases, TMs) they’ve collected over time. They can sell datasets on platforms such as TAUS Data Marketplace (released in November 2020) to MT providers, enterprises, etc. And it’s also possible for LSPs dealing with MT training in-house to buy data there.
In January 2021, following the language data trends, SYSTRAN announced their SYSTRAN Model Studio, a multilingual catalog of industry-specific translation models. These models are designed by a community of experts called 'trainers' and are offered in a self-service manner to professional users. The models can be offered to the global community with data/language experts keeping 70 percent of the royalties.
Another development worth mentioning within this space is Neural Translation for the EU (NTEU), which is said to provide “near-human quality NMT for public administrations.” In a nutshell, it’s the data-gathering and neural MT engine-building initiative of three companies that may be considered gurus in the MT space — Pangeanic, KantanMT, and Tilde — who joined forces with the Spanish Agency for Digitalization and Modernization of Public Administrations (SEDIA). The partners have been collecting data and manufacturing synthetic data to train MT engines which will be made available via the European Language Grid. The datasets are prepared for (and from) 23 official EU languages, and this two-year project aims to build direct combinations for all EU language pairs except English.
With modern solutions for data synthesis, even monolingual data can be leveraged. For example, Language Studio by Omniscien is able to create new data assets out of monolingual content for the purposes of MT customization. Synthesized bilingual sentences help build the data volume needed to customize an MT engine. Omniscien’s examples show how a single original bilingual sentence can be transformed into a wide variety of other similar sentences that use relevant vocabulary, gender, tense, possessive, positives, and negatives to synthesize new, relevant bilingual sentences while avoiding nonsensical ones. Ultimately, brand new sentences that do not include even a single word from the original sentence can be created. According to Dion Wiggins, CTO, it’s often possible to create as many as 10 to 100 times the volume of original data using this approach.
Omniscien also offers bilingual data mining. As Mr. Wiggins told Nimdzi, this approach mines websites for any files that include text such as HTML, PDF, Microsoft Word, etc. An AI process then automatically identifies the language of the text and a secondary AI process analyzes all the source and target language pages to automatically determine documents that are paired translations of one another. A final AI process then matches sentences within the document pairs to create translation memories that are used to train machine translation engines. One existing use case was able to download and match 4,612 EN-FR documents out of a possible 4614 documents. This produced 2.3 million bilingual sentences.
A different innovative localization services paradigm is offered by Pactera EDGE, who introduced AI localization earlier in 2021. As defined by Pactera EDGE, "AI localization represents a fully managed in-market data collection and data curation service producing clean, fair and thoughtfully balanced AI training datasets at an enterprise scale."
Pactera EDGE offers AI training on hyperlocal content, which results in a vastly superior human-centered multilingual digital customer experience. Their custom built-to-order datasets can contain almost any data type delivered in the preferred format. From extensive domain- and region-specific multilingual parallel corpora for MT models and hyper-personal virtual personal assistants or chatbots to hundreds of thousands of pictures and videos of various objects for vision AI solutions; from many hours of professionally recorded audio containing predefined or random sentences spoken by people of diverse age groups and ethnicities, in different languages with different accents for NLP enablement, to 3D images and geospatial data captured and annotated for AR wearables and self-driving vehicles.
The company presents AI localization as a means to provide for the global AI data needs while keeping ML training data locally relevant. Over the past years, Pactera EDGE has processed a significant amount of target market-specific multilingual data to enable globally effective deep learning recommendation systems for eCommerce and multimedia streaming platforms, as well as to train search relevance algorithms.
In both the 2020 edition of the Nimdzi Language Technology Atlas and the 2021 edition of the Nimdzi 100, we wrote about the significant boost that remote interpreting received as a byproduct of the COVID-19 pandemic.
Prior to March 2020, remote interpreting was largely regarded as a solution in search of a problem but once lockdowns hit and in-person events and gatherings were restricted, remote interpreting suddenly became the solution to the problem. Since then the market for various types of virtual interpreting technologies (VIT) has been booming and VIT providers have been busier than ever.
In particular, the demand for video remote interpreting (VRI) in the healthcare sector and remote simultaneous interpreting (RSI) for various types of multilingual online meetings and conferences has gone through the roof. Providers within this space reported record growth over the last year. The sudden increase in demand also came with new challenges:
Prior to the pandemic, remote simultaneous interpreting (RSI) was a real niche within the language services industry. However, once the pandemic hit and all types of meetings, conferences, and events pivoted to the virtual world, RSI suddenly was in high demand. This demand came both from existing clients who held more virtual meetings than ever before, as well as from new clients from a variety of sectors, ranging from corporations to NGOs to international organizations like the United Nations, and government institutions like the European Union and the White House. In April 2021, for example, US President Joe Biden held a virtual Leader Summit on Climate, facilitated by Zoom with RSI from Interprefy.
The boom in RSI is not only reflected in the growth figures of existing providers but also in the number of new RSI platforms. Since March 2020, new RSI providers have entered the market, trying to cash in on the new trend. Looking at our Language Technology Atlas, in 2020 the “RSI and conference interpreting” category featured 20 technologies, whereas this year’s version boasts an impressive 34 different solutions. Projections indicate that it isn’t going to stop there either, as interviews with market players revealed that more companies, especially traditional LSPs in the interpreting space, are thinking about building their own RSI platforms.
While this might be controversial for some in the language industry, we added Zoom to the RSI category in this year’s Language Technology Atlas for the very first time, as we believe the video conferencing giant has earned its place. Although Zoom’s main focus is not on multilingual meetings, the platform has added an RSI feature and, most importantly, has become the de facto largest RSI platform judging by the number of meetings. Since the onset of the pandemic, Zoom’s market valuation has more than tripled and the number of daily meeting participants increased from 10 million (December 2019) to more than 300 million. This sets Zoom miles ahead of its competitors, even though the number of daily meeting participants has increased by 70 percent, on average, for all major video conferencing platforms since COVID-19 (the next closest is Microsoft Teams with 115 million active users per day).
Judging by feedback gathered via interviews with LSPs in the interpreting space, the general consensus seems to be that platforms that were built specifically with RSI and multilingualism in mind, such as KUDO, Interprefy, Interactio, VoiceBoxer, Olyusei, Ablioconference and many others, provide a superior experience for the interpreters (e.g. handover function, relay function, multiple language channel selection, etc.). However, when it comes to the events offering of these RSI platforms, the feedback is that they are lagging behind established platforms like Zoom, Microsoft Teams, and Webex.
Keeping the previous stats in mind, tech providers in the interpreting space had a choice to make: See Zoom as a threat and try to compete with it, or see Zoom as an opportunity to reach a wider client base and cash in on the hype. The majority chose the “If you can’t beat them, join them” approach and decided to bring RSI to where it was needed instead of trying to force people to use their own platform at all costs. Subsequently, this development kicked off the next trend in RSI, the race for integrations with Zoom and other major platforms.
The push for these integrations is buyer-driven — both for VIT providers as well as for clients of major video conferencing platforms. For interpreting technology providers, the common theme is that people want to work with what they know — and what they know is Zoom, Microsoft Teams, Google Meet, Webex, etc. Clients don’t want to invest in getting to know yet another platform, training their team on it, and creating new logins, but they do need interpretation. So, VIT providers were tasked with figuring out how to provide their RSI services at the same level of quality on a different platform. Initially, VIT providers solved this problem via complex workarounds where meeting participants and interpreters needed to be connected both to Zoom (or another video conferencing platform) and the VIT platform. This allowed all participants to be part of the Zoom meeting, but receive the interpretation from the VIT platform. The interpreters took the original audio from Zoom and interpreted on the VIT platform.
While this worked fine for a while, the setup was quite complex for everyone involved, from the interpreters to the participants to the meeting organizers. Eventually, the push for full integrations also came from large clients of Zoom and Microsoft, who started demanding better and less complex solutions. At this stage, significant progress has been made. Lithuanian RSI provider Interactio, for example, figured out an advanced integration for Zoom. The integration takes the audio from Zoom, feeds it into the Interactio platform where the interpreters perform the interpretation and then the audio is fed straight back into Zoom. Meeting participants simply need to select their preferred language when logging into the Zoom call, which they will be prompted to do, but they do not need to log into the Interactio platform, making it a much more comfortable user experience. The key to this integration is that meeting organizers need to have a Zoom license that includes interpretation.
Taking it yet another step further, Switzerland-based RSI platform Interprefy created an official integration for Microsoft Teams in cooperation with the platform. Microsoft Teams clients can install Interprefy for Teams via the official Add-on Store. Once completed, users can request quotes for interpreting services directly in Teams and later simply click a button in the meeting, which will open a sidebar where participants can select their preferred language to listen to the interpretation.
Both Interactio and Interprefy, as well as other RSI providers are working on similar integrations for other platforms.
Another area for growth in the VIT market comes from the area of telehealth. Remote interpreting in healthcare is nothing new. However, lockdown restrictions and the spike in requests created new challenges. For example, before March 2020, VRI typically only required two channels — one for the interpreter and one for the doctor and patient, who usually were in the same room. However, once lockdowns hit, the situation shifted to all three parties typically being in different physical locations. This created a need for VRI with three-way call capabilities, which presented a technical challenge for established providers.
In addition, companies offering VRI or over-the-phone interpreting (OPI) reported a spike in requests from telehealth vendors looking for ways to integrate interpreting into their own platforms. So, the race for integrations also took off in this segment of the interpreting industry — not only for telehealth vendors but also for the major video conferencing platforms.
In March 2021, VRI provider Boostlingo announced its addition to the Zoom App Marketplace and became the first company in the VIT space to officially integrate with the conferencing giant. Anyone with a Zoom account can now install the Boostlingo app for Zoom and set up an account. Once this has been done, Zoom users can simply invite interpreters to any of their calls. As soon as the request has been processed by Boostlingo (more or less instantaneously), the interpreters will be placed directly into the Zoom call. Existing Boostlingo clients have access to the same feature. The interpreters work in the backend from the Boostlingo platform, which allows for proper tracking of call activity.
All this growth did not go unnoticed and soon attracted investment from outside the industry. In the last 18 months alone (early 2020 to July 2021), the following companies in the VIT market received significant funding:
Investors believe that there is growth potential in the area of remote multilingual meetings. This is not only a validation for VIT providers and LSPs in this space, but also for the interpreters who sit at the heart of these operations.
The flipside that needs to be mentioned here is the increasing influence of AI. In the interpreting market, there are two main AI trends to highlight: machine interpreting and computer-assisted interpreting (CAI) tools.
While machine interpreting is not yet as advanced as machine translation, the technology for speech-to-speech translation has made significant strides over the last few years. What might have seemed next to impossible in the past has become a reality. Skype users can already use the "Translator" functionality to have spoken conversations translated by a synthetic voice in real-time, and handheld devices with the same capabilities, predominantly aimed at tourists, are flooding the market, especially in Asia. These solutions, which mostly target individual consumers, are still imperfect in that the technology struggles with proper names, brand and product names, slang, and so on. However, they get the gist across and the technology is constantly evolving and improving.
In the language industry, a few companies have started offering machine interpreting solutions for businesses. US-company wordly, for example, provides speech-to-text and speech-to-speech translation for conferences. UK-company thebigword developed an app called Voyager Translates, specifically designed for use by the armed forces. Finnish interpreting company, Youpret, has included machine interpreting in its offering for simpler requests. In addition, Google is working on taking speech-to-speech technology to the next level. Currently machine interpreting solutions have a cascading model that uses speech-to-text functionalities in the first step, followed by machine translation, and then text-to-speech technology. Google plans to eliminate the need for any form of text representation and instead wants to go straight from audio to audio, using the data found in sound waves with its “Translatotron” model. In the meantime, MT and interpreting company XL8 has figured out a way to make speech-to-speech translation available faster (almost instantaneously) by using sound waves as well as training the technology to translate the speech content in chunks (i.e. units of meaning) rather than waiting for the full sentence to finish, which is also a technique employed by human interpreters. XL8’s solution is, for example, aimed at broadcasters but can also be used for video conferencing.
The purpose of CAI tools is to be a form of AI-boothmate for interpreters performing (remote) simultaneous interpreting. CAI tools allow interpreters to extract terminology and build their glossary within seconds. During an assignment, CAI tools can also call out figures and names, and instantly convert units (e.g. for measurements and currencies). The goal of CAI tools is to make the interpreter's preparation time more efficient and to ease the cognitive load during the assignment. While CAI tools have been developed for a few years now, they are still being fine-tuned, which is why we have not yet added them to our Language Technology Atlas. That these tools are, however, becoming more prominent is evidenced by the fact that RSI provider KUDO, for example, is currently working on developing its own CAI tools.
Earlier in 2021, Nimdzi published “The Ultimate Guide to Remote Recording for the Media and Entertainment Industry,” where we analyzed the remote recording marketplace in depth. We spoke with the major players in this area, which is to say, Deluxe One Dub (developed by Deluxe), ZOO Dubs (developed by ZOO Digital), and iDub (developed by Iyuno Media Group). However, since then, more companies have continued to emerge, such as Connection Open, VoiceQ, and Hollyvooz, meaning that the remote recording technology is very likely here to stay and not just a stop gap for the pandemic. Furthermore, Transperfect launched Media.NEXT in 2019 and the tool saw an increase in usage during 2020 as it allowed multilingual dubbing and subtitling to continue via remote work environments.
Remote recording has been around for years, but was reserved for specific types of recordings that didn’t require a cutting-edge studio or a complex workflow where different stakeholders have to be present. Freelance voice-over talents have been recording eLearning content, audiobooks, podcasts, telephony, and even ads this way for a long time. At the time the pandemic hit, remote recording solutions were still seen only as a disaster recovery alternative to do recordings as far as high-quality Hollywood productions were concerned.
The challenges that needed to be faced were:
While remote recording was accepted as a necessary evil during the pandemic by the big content creators, now it seems that this technology might just have started a revolution. Although some are still reluctant to embrace the possibilities brought forth by this new technology, many are already taking advantage of the ability to record from home (less commuting, more time to work on different projects, more free time to spend with the family or by themselves).
However, that doesn’t mean that anyone can record from anywhere at any time without standardized guidelines. For example, the Spanish start-up Hollyvooz, which has been developed by renowned voice dubbing actors and artistic directors, requires that all actors working on their platform have a proper home studio with professional equipment. Thanks to this approach, they’ve been able to record remotely for top-quality productions.
Here, we see an analogy with translators and TMS and MT adoption. Eventually, everybody understood that technology combined with talent was the way forward. Remote recording is a disruptor for the dubbing industry, especially in those countries with a long dubbing tradition. However, this technology brings with it more scalability and efficiency, especially when combined with recording in the existing studios. Both options are valid and necessary for this industry to keep growing and thriving.
In 2020, eLearning localization (62.7 percent) and media and entertainment localization (54.7 percent) were some of the most popular services offered by the LSPs surveyed for our 2021 Nimdzi 100 report. For the media and entertainment sector, the dubbing portion was the section that was the most heavily impacted by the pandemic, and, as explained before, remote recording solutions emerged as a result. However, there was also a major challenge for eLearning localization that was recurrent among most of our interviewees: budget limitations. Many companies had to reshuffle their training efforts and quickly create eLearning content to onboard new employees, or share knowledge with partners and customers of their products, as the in-person training possibilities were severely crippled.
Especially for international companies, localizing eLearning content was paramount to catering to the needs of an international remote workforce, and this is where LSPs came to the rescue. However, language vendors soon found out that many companies had severe budget limitations and were looking for automated solutions for different services — translation, subtitle creation, and voice-over. And here’s where language technology providers stepped into the game.
In last year’s edition of the Atlas, we mentioned the importance of speech-to-text (STT) technology and automated captions, and our list included 29 tools for transcription and 16 for automatic captions. Most companies working with video content are already taking advantage of out-of-the-box solutions such as Trint, Rev, CaptionHub, and Sonix, which are used to caption pre-existing content, increasing productivity and speeding up the localization process.
The tech giants have been offering TTS support for a while now with products such as Amazon Polly, Google Text-To-Speech, Microsoft Azure Text-To-Speech, and IBM Watson Text-to-Speech. However, these are not out-of-the-box solutions and they require integration with other tools. For this reason, they are not suitable for a company that doesn’t have the time or interest in building such an infrastructure. Some companies offer a full video localization interface that integrates TTS solutions such as VideoLocalize, Papercup, and Intento.
Intento’s approach to TTS is similar to what they do with MT. They integrate a number of vendors with a set of models for different languages and then offer the possibility of customizing and even cloning voices and emotions. They offer voice synthesis through the major TTS providers, and also provide speech synthesis evaluation, selection of voice models, and voice customization, including the use of custom voice models. Intento works with a range of different input formats, such as plain text, DOCX, XLSX, SRT, etc., and their proprietary workflow is compatible with major formats used to instruct human actors, which helps them seamlessly integrate with existing content authoring workflows, tools, and human services providers.
There are more and more companies offering voice cloning such as Resemble. Or even companies that go one step above and beyond, making the same original actor speak in several languages, as they claim at the Israel-based start-up Deepdub.
The next step in TTS is to achieve not only human-like voices, but also to be able to replicate emotions, creating expressive dubbing solutions such as the ones provided by pioneering companies like Voiseed, WellSaidLabs, Replica, and Sonantic. Some of these solutions are uncanny and are determined to disrupt the dubbing industry. Solutions such as the ones provided by Sonantic are already being used by some video game studios.
The most recent application of AI is to use deep learning to create perfected lip-sync in localized videos. That’s the last step in multimedia production in order for a video to be realistic and be considered high quality. And, with AI, companies such as Synthesia, Flawlessai and Jali Research are able to replicate the lip movements of speakers in all different languages so that the match is perfect, and hence, the user experience is improved. While Synthesia and Flawlessai are used for video, Jali Research specializes in animation, specifically in the video game industry. Synthesia’s business model goes beyond perfect multilingual lip-sync, though. They are aiming to create a product that any person can use to create synthetic media — think Canva, but for video production.
Yandex, the Russian tech giant, also joined the game of automatic video localization. Their ambitious prototype aims at simultaneously translating the video and dubbing it with synthetic voices. Their solution will also detect the sex of the speaker and choose a voice accordingly. For now, their software will be able to convert videos from Russian into English. Tweaking emotions and other voice traits is on the roadmap of Yandex’s prototype.
One of the main challenges of 2020 was online multilingual meetings. Even if remote interpreting technology was one of the winners of the past year, many international companies were craving for solutions that offered real-time subtitles in multiple languages for meetings involving audiences who speak a variety of different languages. This is a big challenge that has yet to be fully streamlined and, unquestionably, there’s room for improvement. Intralingual live captions aimed at providing access for persons with hearing impairments, non-native speakers, and anybody who is experiencing bad audio quality are covered by a number of different automated solutions, including WebCaptioner and Live Captions in Chrome. The results of using these automated tools are not always optimal, however, especially if the speaker has a strong accent or if there’s background noise and/or bad audio quality. But if the audio quality is up to snuff, then the results will be acceptable.
At the same time, most video conferencing platforms already offer this service. Webex provides automated closed captions, and they also accommodate third-party captioning services. Zoom also offers closed captions as well as live transcription directly on their platform. And the same goes for Microsoft Teams and Google Meet. These intralingual live caption solutions are not always available for more than one source language (English), which limits their applications for a global audience.
Technology for high-quality multilingual, real-time captions is another story altogether.
For example, SyncWords and their live meeting multilingual caption solution: SyncWord’s technology is integrated with Zoom, Socio, Pheedloop, Adobe Connect, N24, and Webex, and provides translations into more than 60 languages. Touchcast, whose offering for online events includes real-time captions, foreign subtitles and dubbing for online sessions, raised USD 55 million earlier in 2021. Additionally, XL8 offers live subtitles in multiple languages as well as API (SaaS), a web service where people can pop open a web page while they're in a meeting, and a streaming service that takes a video stream (HLS) and returns another video stream (HLS) that includes subtitles. Integrating their service with conference solutions is on their roadmap. AI-Media offers captioning and translation technology for live meetings, making the most of their proprietary technology with humans in the loop contributing to a better quality final output. Their solution can be added to streaming platforms such as YouTube, Zoom, Facebook, Vimeo, and Twitch.
As Nimdzi noted last year, the TMS sector has been booming since 2010 with dozens of TMS being released every year. This trend continues in 2021. Within the past year, we have been approached by several new TMS companies who have been developing their own technology as well as mature LSPs who have been upgrading their platforms and translation editors with the goal of offering their enterprise customers full-swing, AI-powered TMS solutions.
The existing “older” TMS providers continue to develop new features, the main ones being tailored toward solving the challenges of integration between multiple systems and adding more MT options as well as effortless options for in-context, in-country review.
For example, among the key changes and latest developments in GlobalLink Enterprise, are:
As Transperfect’s Senior Vice President, Matt Hauser, told Nimdzi, “2020 was a year of continued growth and development for TransPerfect’s technology solutions and service offerings. TransPerfect’s flagship TMS product – GlobalLink — received multiple enhancements to improve both usability and performance for its user base of over 6,000 global organizations.”
To answer the MT demand, LanguageWire launched LanguageWire Translate, an instant MT solution with TM enhancement features similar to DeepL, according to a company representative. They now also support custom-made MT engines for their enterprise customers.
To support more regular translation workflows, memoQ’s latest version 9.8, released in June 2021, features a Regex Assistant, a kind of a fast track for those using regular expressions in daily translation operations. Such users can now insert regex codes from the special “cheat-sheet” or select a saved entry from the regex library. The Regex Assistant also helps create its own regular expressions and is able to indicate when the regex is invalid.
Another interesting new feature was added by Smartling. In their Transcreation Workflow, it’s now possible to produce and store several variants of creative translations (with back translations) and select the best or most appropriate translation for a given context.
Speaking of productive ways of handling context, the popularity of in-context review toolkits is on the rise. These features help identify errors in translated content and allow users to fix them on the fly right in the working environment. One of the most popular use cases for this is an online tool that enables in-context translation and editing as well as in-country review of web-based products, desktop apps, and websites.
Last year we noted a few existing solutions in the area (from Lingoport, Rigi.io, translate5, Venga Global) that help with in-context in-country review. This year we saw and tested more solutions that include an in-context feature to help multilingual teams across the globe with the review process. Ideally, the reviewer (e.g., a marketing specialist) should be able to open a link and perform the in-context review with a live document preview directly in-browser and/or in-app, without needing to install and/or actually master any localization software or transfer working files via file shares.
Following the demand, more TMS companies now provide options for generating live previews. They are available in STAR, LanguageWIre (Smart Editor), GlobalLink Enterprise, XTM (Visual Editor), BWX.io (by Bureau Works), Wordbee, and other TMS products. Earlier in 2021, Smartcat added an in-context preview for Microsoft Word.
We expect to see increased availability of online TMS portals with in-context preview options that enable collaborative environments for translators and reviewers to work in an easily trackable validation process. The range of supported formats will increase for existing TMS solutions. For example, in GlobalLink Enterprise there are preview options for Adobe InDesign. Moreover, Wordbee is set to deliver an interesting product enabling easier InDesign workflows in September 2021.
As you’ve probably noticed, AI seems to be the leitmotif of this year’s report. In this regard, there’s another concept worth mentioning (which we briefly touched on in November 2020 when talking about new promising terminology-related techniques). The technology is called Inter-language Vector Space, and it’s developed by XTM.
So, following the idea around the need for modern TMS to be fully connected to content repositories as well as to downstream MT and other systems, XTM developed an AI framework to fill the gap between NMT and TMS. This framework, introduced in August 2020, enables XTM to offer their clients “greater consistency, cost savings, and speed to market in global communication.”
The term Vector Space itself is of course not new, and XTM leveraged an algorithmic approach to the advancement of language technology based on massive neural networks. By adding AI to modules in XTM, they have increased user performance not only in bilingual terminology extraction, but also in TM alignment and inline placement during translation.
Moving on with the idea of filling the gap between NMT and its actual use cases, let’s have a closer look at the modern MT world.
In order to more closely study the way enterprise-level companies consume MT in late 2020, Nimdzi and Commit Global partnered to conduct a joint survey of 101 translation buyers.
Source: Nimdzi and Commit Global survey of buyers of MT, 2021
Of those who had already tried MT, the majority of survey respondents confirmed the level of raw MT “could be used with minimum human editing.”
Replies | Response count | Response % |
Overall satisfying for the intended use | 25 | 38% |
Could be used with minimum human editing | 35 | 53% |
Was completely unusable | 4 | 6% |
I cannot say | 2 | 3% |
Source: Nimdzi and Commit Global Survey of translation buyers, 2021
All in all, the buyers’ choice of NMT and/or post-editing services depends on the customer profile and their previous experience — and whether they had purchased NMT services before. If they haven’t (or if they have, but were not happy with the results), it remains important to explain the quality difference between raw MT, light MT, and full MT and also consult the client on which type is appropriate for their content.
However, what buyers want is to, ideally, not worry about understanding this quality difference, or to have to compare and evaluate MT engines, set up API keys, and track the engines’ performance overtime. Moreover, if companies invest too much into MT engines and scenarios that aren’t supported by a current tech stack, they can find themselves in a situation where they have to spend extra effort in order to make the MT work.
Let’s see what the MT language tech market has to offer to respond to this demand.
A new kid on the block, Custom.MT is a specialist service enabling localization directors and LSPs to train and implement language ML models. It’s one of the few companies in the sector that is independent from any translation company. Founded in September 2020 in Prague, Czech Republic by language industry researcher Konstantin Dranch and business owner Philipp Vein, the company now includes eight people (engineers, project producers, lexicographers, and data scientists). Their team has already trained and evaluated more than 200 MT engines.
Custom.MT has shared some of their latest findings with Nimdzi:
TransPerfect has made significant investments in proprietary NMT solutions to support over 40 language pairs, as well as technology for data cleaning, engine preparation, and deployment. The resulting NMT capabilities are available as a stand-alone service (via API or portal) or as an integrated workflow process across a wide range of its GlobalLink Product Suite, including:
NMT solutions are also available in workflows for any of their 60+ pre-built integrations into leading CMS, PIM, eCOM platforms as well as third-party tools for e-discovery solutions like Relativity.
It’s worth mentioning the fact that some customers want to prevent the use of MT. For that reason, GlobalLink includes an option to prevent MT use that can be enabled in cases where MT is deemed inappropriate for certain content types (e.g. regulated or “high risk” content). Other examples of TMS with a “prevent MT use” option can be viewed using this filter on the Nimdzi TMS feature overview page.
Intento has had a banner year so far, and their momentum continues to build. They have seen continued innovation of their product suite, increased recognition of their achievements, and growth across several markets.
Intento describes its offering as an AI Integration Platform, which uses AI to automate common services in the content lifecycle, including content creation (text synthesis), content accessibility (transforming content between text, speech, and images), and content localization (MT). Intento provides the methodology for the evaluation and training of AI engines for all stages of content. Then the Intento AI Hub platform connects the best-fit engines to the required business systems and use cases. Intento also provides an AI dashboard for centralized data review and business reporting.
The good news for our industry as a whole is that, in June 2021, Intento was recognized as a Cool Vendor by Gartner, being included in the global research and advisory firm’s 2021 report, ‘Cool Vendors in Conversational and Natural Language Technologies.’ The report outlines the evolving conditions for enterprise-grade language automation, highlighting a selection of exceptional vendors offering AI methods and integration facilitation. Recognition from Gartner is a testament to the impact of novel language solutions and the promise of Intento’s cognitive AI services beyond MT.
The increasing presence of Intento within the translation industry has led to several strategic relationships being formed over the past year:
Intento supports MT customization features provided by MT vendors, such as domain adaptation and custom terminology. As Intento’s representatives told Nimdzi, “In 2021, they found that those features are often not enough to deliver the same level of quality for languages and use cases, and started developing their own MT-agnostic AI layer on top of third-party models to better fit them to solve specific enterprise problems.”
They launched a set of such features in 2021, providing tone of voice control, profanity filtering, and custom terminology on top of MT systems that do not support them natively.
Further 2021 additions to the Intento product catalog include the MT Studio and Office Productivity toolkits. Intento MT Studio offers evaluation insights in an easier manner for a more straightforward way of discovering and deploying best-fit MT. The system incorporates nine specific tools covering the entire evaluation and training process, from preparing data to employing neural models to evaluate the data and determine the best-fit MT to fine-tuning the MT output for the most effective result. Office Productivity includes tools to facilitate the global digital workplace. Translators for Word, Excel, Outlook, Chrome, and Windows allow instant collaboration with international team members through the immediate translation of documents, spreadsheets, websites, chats — in short, any kind of content sharing application. The Intento Translation Portal also enables ad hoc translation of texts and files in a simple web interface.
Moving forward, the innovators are setting their sights on expanding the scope of best-in-class AI solutions. They have widened their offering to include character recognition, speech transcription, text-to-speech, and more.
In October 2021, Intento published a new edition of their annual report on the state of MT.
In June 2021, RWS announced the revival of Language Weaver, which now unifies RWS’s MT technologies in a single platform. Originally founded in 2002, Language Weaver was acquired by SDL in 2010 and today combines SDL and Iconic teams and technologies with RWS’s linguistic expertise. Language Weaver allows real-time feedback on translations and fine-tuning of generic language models. Behind the scenes, a large team of research scientists are constantly looking for ways to improve translation quality. Language Weaver can be deployed in the cloud, on premise, or as a hybrid combination of both. It can be accessed via a secure portal or integrated with various software products using its powerful API and SDK. Connectors are also available for platforms working with multilingual content, such as Microsoft Office, e-discovery platforms, and customer support, to name a few.
Another language technology company, Translated, is riding the MT wave in a desire to bring MT to the forefront of audiovisual and creative aspects of the localization, thus bridging the gap between technology and the creative aspects of translation.
For their ModernMT software, they were inspired by the human brain’s capacity to adapt and learn from new experiences as well as to interact with others. As a result, the MT product itself adapts and learns from corrections in real time, supporting the human-in-the-loop approach we introduced above. Actually, continuous improvement by learning from corrections is a premium feature available in another of the company’s products called, unsurprisingly, Human-In-The-Loop.
As Nimdzi noted a year ago, full-document, context-aware MT was expected to become a significant asset for the MT industry. Now, as we can see, ModernMT is one example that supports the “document-level translation.” It's a full-document adaptive MT with hints of transcreation.
As stated above in the section on data, Omniscien successfully uses a wide variety of data synthesis, mining, and manufacturing approaches to create tens or sometimes hundreds of millions of sentences.
While in-domain monolingual data provides an important base for terminology that can be useful in building a bilingual glossary and for data synthesis, bilingual data mining creates a larger set of domain-specific data from external sources that further expands vocabulary choice, writing style, and context. Another way monolingual data is utilized is to match similar sentences against the millions or sometimes billions of sentences in the Omniscien data repository. The combination of these tools working together helps produce millions of quality in-domain and context relevant bilingual sentences.
And so, when it comes to MT customization, Omniscien offers a guided custom MT journey, where their expert helps tailor a customization plan to specific customer needs. User involvement is minimized with most activities handled by the Omniscien team and automated tools.
To get better results from a particular MT exercise, it’s not enough to just upload TMs and termbases to an engine. All the data assets need to undergo a certain cleaning process in order to get them ready for MT training. Following a known “garbage in, garbage out” pattern, the unclean data would result in inaccuracies, noise, and an overall unhealthy MT output.
That’s why companies are developing (and using) TMX/TBX file editors and converters. Due to a smaller size of this language technology grouping, it’s not included on the main Atlas infographic, but the most known names here include Okapi Olifant, Heartsome TMX Editor, and open-source TMX editor.
One of the popular tools in this area, which is available free of charge, is Goldpan by Logrus Global. Earlier in 2021 they released a new version of this editor and converter. As noted by Goldpan’s creators, “TMX is the only intermediate format suitable to convert the legacy multilingual translations (corpora) stored in different formats into the format that can be used for training MT.”
Another company with Russian origins, Egotech (part of the EGO Translating group), also offers Egotech TMX Editor, alongside their other software products around MT.
Demand for website translation has skyrocketed within the last year, due in no small part to businesses being forced online by the global health crisis. Not all of this demand was converted into demand for translation proxy, though. One of the roadblocks in the way of more widespread adoption of proxy technologies was increased attention to security and customer privacy — since proxies are external services, content owners are reluctant to allow them to handle customer information, which, coupled with the interactivity of websites, is becoming a block for proxy processes. This acted in favor of client-side translation solutions like Wovn and LocalizeJS. Transperfect also recently launched their OneLinkJS for website localization without integration requirements.
Easyling has rolled out its next-generation JS-based Client-Side Translation Solution to allow more privacy-conscious content owners to leverage its capabilities. In addition:
Not-so-lightweight translation proxy solutions, on the other hand, are handling security and privacy concerns through auditing processes. Most notably, PCI-DSS is a must-have so that credit card data can be “handled” (passed through) the proxy infrastructure.
LSPs partnering with proxy technology providers are better equipped to open new markets and establish multiple relationships with new clients. This model creates a symbiotic relationship between the parties and provides end clients with flexibility that “avoids being locked into a relationship that may not work at some point or charges fees where they need not exist,” as noted by Frederick Marx, CEO of Keylingo LLC, an LSP that is partnering with Easyling.
From Nimdzi’s conversations with the enterprise-level customers from multiple domains, it’s visible that for localization managers at an enterprise, it can be hard to make a case for localization to become a top-of-mind priority at the company (rather than an afterthought).
There’s a new tool designed exactly to help build such a case — by providing a way to measure the level of translation and language consistency of any chosen website. A spin-off from Xillio, LocHub, won the 10th Process Innovation Challenge at LocWorld 2021 for its newly launched localization testing platform LocHub Insights.
LocHub provides one-click analytics of multilingual websites to show errors that negatively impact digital customer journeys — all delivered via a centralized dashboard. The data generated by the tool highlights areas for improvement, especially in relation to SEO and user experience (UX). Localization managers can use the numerical data and specific action items suggested by LocHub to open a discussion with their marketing departments and major stakeholders and bring localization up to where it needs to be. The goal is to improve the overall performance of their multilingual websites, enhance visitors' experience and improve lead conversions.
At Nimdzi, we found that many product manufacturers have adopted DITA-based authoring. DITA is an open standard with multiple implementations. Besides independent companies like Ixiasoft, IBM, and PTC, language service providers Transperfect and RWS (and formerly SDL) also own DITA-compatible CMS. In DITA, disparate topics are authored separately and combined through a so-called ditamap, which combines the various topics into a publication. DITA files are standard XML, so after all, what’s the problem?
Working with enterprise customers, we have seen two problems:
One of the practical problems of the XSL transformation approach we noticed is that there is a shortage of XSL technical expertise among localization engineers.
Another important factor is that the terminology that translation uses should be the terminology that writers define and use, yet it's rare that editors of this kind would be integrated with any terminology engine. This is an area for integration in the future.
Even though we’ve managed to cover a lot of developments in language technology in this year’s report, there’s much more we weren’t even able to mention (check out the rapidly increasing amount of connectors and integrations being added to all major TMS, for instance, or MT for user-generated content), as so much is happening in this sphere right now. Whether or not you share our enthusiasm about language technology, in recent years we’ve seen significant progress and success in AI with NLP systems and various language models. New language systems are a powerful tool for humans to breach the boundaries of multilingual communication. For better performance and results, though, humans should stay “in the loop.”