We recently introduced you to the two- (or five-) second rule, which is essentially the reaction or decision-making time a linguist should spend judging whether to post-edit a segment of machine translation (MT) output or to retranslate it.
Another way of looking at the task of increasing productivity is through an MT auto-select feature we described here last year. It’s an approach that’s available in tools such as Memsource Translate, which also includes a Machine Translation Quality Estimation (MTQE) feature. MTQE helps users evaluate the quality of the MT output: scores are automatically calculated before any post-editing (PE) is done and appear at the segment level together with other translation resources (e.g., the translation memory).
These are the four MTQE scoring categories:
Score | Category |
---|---|
100% | Excellent MT match, probably no PE required |
99% | Near-perfect MT output, possibly minor PE required mostly for typographical errors |
75% | Good MT match, but likely to require PE |
No score | When there is no score, it’s very likely that the MT output is of low quality; it is recommended that this output be used for reference only |
MTQE could help limit or even eliminate the two-second MTPE rule. Using a built-in feature like this means linguists no longer need to make a decision whether or not to post-edit: the machine does it for them. Whenever a score of 75% or above is predicted by MTQE, the corresponding segment would be a candidate to start post-editing right away. When no score is predicted, the MT output can be discarded.
So what exactly does all this tell us about how MT quality and post-editor productivity are correlated? Let’s have a look at the chart below that shows how MTPE productivity (in words/hours) changes with increasing quality of MT.
Source: Memsource. Productivity for EN>DE
On the X axis, 100 means “perfect” MT: no post-editing needed. Productivity is plotted against productivity when translating from scratch (the flat lines). There are two productivity lines. They differ in the corresponding segment length (3 to 8 words, 9 to 26 words). The absolute productivity numbers may be a bit higher. But what’s important here is the observed trend: post-editing of low-quality MT is less productive than translating from scratch, and higher-quality MT increases productivity considerably.
As noted in a LocWorld39 presentation, this is how Memsource measures performance (conceptually, the method is still the same):
Source: Memsource. LocWorld 39, “Quality Estimation in the AI Era”
At the far right are perfect MT outputs. And MTQE identified most of them as perfect (green bar), some as 75 (good MT, orange bar), and a very small amount as bad (blue bar).
It’s already been six years now since Google revealed that Google Translate processes 146 billion words a day — three times more than what all the professional translators in the world combined can do in a month. That was 2016 and things haven’t really slowed down in the machine translation (MT) universe since.