Here’s a general approach to MT selection which we mentioned before:
Looks good and clear. But let’s dig further into how this nice scheme can be applied to a Machine Translation Post Editing (MTPE) workflow. Here’s a common 4-step way:
Then, depending on the project size, there may be variations:
Small PEMT projects | Medium projects / MT as a first step | Major projects / MT only |
---|---|---|
|
1. Prepare a set of segments for evaluation (about 1,500-2,000) |
1. Make a comparison of the basic stock engines: determine which maximum quality of translations can be obtained without training engines |
2. Select 4-5 engines to evaluate |
2. Prepare a list of stock engines for quality assessment per language pair |
2. Prepare a glossary of terms/phrases and a corpus for training (~10,000+ segments) |
3a. Plug the engines into a CAT tool |
3. Translate the test set of segments |
3. Train suitable domain adaptive engines |
3b. Translate materials through selected 4-5 engines and load into your CAT tool |
4. Calculate automatic reference-based coefficients |
4. Evaluate the quality of trained engines and compare with the base stock models
|
4. Choose the best option during translation | 5. Compare the best engines according to the results of the coefficients and quality |
5. When you collect feedback from translators on quality, re-train your favorite engines and evaluate quality again
|
6. Try translating through the winning engine |
|
See also the state of MT by Intento for an in-depth guide on what solutions there are on the market.
Some machine translation providers are holding out hope for MT systems that adapt to document context. Could this development eliminate the need for custom MT engines? Will context-enabled MT help MT achieve human parity? Will we still need to customize a few years from now? Let’s discuss further.
5 August 2020