Some machine translation (MT) providers are holding out hope for MT systems that adapt to document context. Could this development eliminate the need for custom MT engines? Will context-enabled MT help MT achieve human parity? Will we still need to customize a few years from now? Let’s discuss further.
The Conference on Machine Translation added a "document-level MT" task in 2019:
“We are particularly interested in approaches which consider the whole document. We invite submissions of such approaches for English to German and Czech, and for Chinese to English. We will perform document-level human evaluation for these pairs.” The task of assessing the effectiveness of document-level approaches will also be a part of the 2020 conference, which will be held online on November 19-20, 2020.
This approach may work well in research settings, though it’s likely to become more widely used within the next few years. While some providers of customized MT try to make it easier to select data for customization (e.g. Microsoft Office 365 subscribers can use the documents in their cloud as monolingual customization data), this new level of context has been raising questions from investors and other interested parties about the need to develop new pieces of technology supporting customization
Source: Nimdzi Language Technology Atlas, June 2020
There is a major discussion around whether MT, at least for certain language pairs, has reached human parity. “What is clear from research (e.g. Läubli et. al. 'Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation') [is that] achieving human parity in MT has to be evaluated in document context, not just in sentence context,” says Achim Ruopp, Adjunct Professor at Georgetown University.
“This implies that the MT systems also have to translate sentences within document context, as human translators do if they have the document context available in their translation environment. Document-context-aware MT is something researchers have been working on for a while (e.g., Google's MacDuff Hughes mentioned it as a priority at AMTA 2016). But where researchers/MT suppliers are with this is not so clear — because of the issue of evaluation, both in methodology and evaluation data,” Ruopp continues.
Producing high-quality, custom MT models requires some expertise and experimentation. Ruopp believes that this complexity is one reason for MT API providers to replace custom MT with systems adapting dynamically to document context. Another reason is that the MT providers need to provide the API features and underlying infrastructure to create, use, and maintain these custom models. This creates complexity on the provider side. And, although MT providers are not complaining about this, it’s still a significant factor that is reflected in the pricing of custom MT models.
It’s already been six years now since Google revealed that Google Translate processes 146 billion words a day — three times more than what all the professional translators in the world combined can do in a month. That was 2016 and things haven’t really slowed down in the machine translation (MT) universe since.
We recently introduced you to the two- (or five-) second rule, which is essentially the reaction or decision-making time a linguist should spend judging whether to post-edit a segment of machine translation (MT) output or to retranslate it. This rule of thumb aims to help increase the linguist’s productivity when working with MT.
If you’re a driver, you’ve probably heard of the two-second rule. Staying at least two seconds behind any vehicle is considered a rule of thumb for drivers wanting to maintain a safe following distance at any speed. The two seconds don’t represent safe stopping distance but rather safe reaction time.
On June 10, 2020, we published our Nimdzi Language Technology Atlas, the comprehensive resource that maps hundreds of language technology solutions from all around the world. Two months later, after receiving and reviewing feedback from more than three dozen companies who submitted requests to add new tools or change their categorization, we released an update to the infographic on August 27.