We recently introduced you to the two- (or five-) second rule, which is essentially the reaction or decision-making time a linguist should spend judging whether to post-edit a segment of machine translation (MT) output or to retranslate it.
Another way of looking at the task of increasing productivity is through an MT auto-select feature we described here last year. It’s an approach that’s available in tools such as Memsource Translate, which also includes a Machine Translation Quality Estimation (MTQE) feature. MTQE helps users evaluate the quality of the MT output: scores are automatically calculated before any post-editing (PE) is done and appear at the segment level together with other translation resources (e.g., the translation memory).
These are the four MTQE scoring categories:
|100%||Excellent MT match, probably no PE required|
|99%||Near-perfect MT output, possibly minor PE required mostly for typographical errors|
|75%||Good MT match, but likely to require PE|
|No score||When there is no score, it’s very likely that the MT output is of low quality; it is recommended that this output be used for reference only|
MTQE could help limit or even eliminate the two-second MTPE rule. Using a built-in feature like this means linguists no longer need to make a decision whether or not to post-edit: the machine does it for them. Whenever a score of 75% or above is predicted by MTQE, the corresponding segment would be a candidate to start post-editing right away. When no score is predicted, the MT output can be discarded.
So what exactly does all this tell us about how MT quality and post-editor productivity are correlated? Let’s have a look at the chart below that shows how MTPE productivity (in words/hours) changes with increasing quality of MT.
Source: Memsource. Productivity for EN>DE
On the X axis, 100 means “perfect” MT: no post-editing needed. Productivity is plotted against productivity when translating from scratch (the flat lines). There are two productivity lines. They differ in the corresponding segment length (3 to 8 words, 9 to 26 words). The absolute productivity numbers may be a bit higher. But what’s important here is the observed trend: post-editing of low-quality MT is less productive than translating from scratch, and higher-quality MT increases productivity considerably.
As noted in a LocWorld39 presentation, this is how Memsource measures performance (conceptually, the method is still the same):
Source: Memsource. LocWorld 39, “Quality Estimation in the AI Era”
At the far right are perfect MT outputs. And MTQE identified most of them as perfect (green bar), some as 75 (good MT, orange bar), and a very small amount as bad (blue bar).
If you’re a driver, you’ve probably heard of the two-second rule. Staying at least two seconds behind any vehicle is considered a rule of thumb for drivers wanting to maintain a safe following distance at any speed. The two seconds don’t represent safe stopping distance but rather safe reaction time.
Do you remember the last time when people were NOT talking about machine translation (MT)? We don't. Wherever you go, there’s someone talking about MT. With few exceptions, it seems like the only major disruptors in our industry over the past few decades have been breakthroughs in language technology.
Some machine translation providers are holding out hope for MT systems that adapt to document context. Could this development eliminate the need for custom MT engines? Will context-enabled MT help MT achieve human parity? Will we still need to customize a few years from now? Let’s discuss further.