Here’s a general approach to MT selection which we mentioned before:
Looks good and clear. But let’s dig further into how this nice scheme can be applied to a Machine Translation Post Editing (MTPE) workflow. Here’s a common 4-step way:
Then, depending on the project size, there may be variations:
Small PEMT projects | Medium projects / MT as a first step | Major projects / MT only |
1. Prepare a set of segments for evaluation (about 1,500-2,000) |
1. Make a comparison of the basic stock engines: determine which maximum quality of translations can be obtained without training engines |
2. Select 4-5 engines to evaluate |
2. Prepare a list of stock engines for quality assessment per language pair |
2. Prepare a glossary of terms/phrases and a corpus for training (~10,000+ segments) |
3a. Plug the engines into a CAT tool |
3. Translate the test set of segments |
3. Train suitable domain adaptive engines |
3b. Translate materials through selected 4-5 engines and load into your CAT tool |
4. Calculate automatic reference-based coefficients |
4. Evaluate the quality of trained engines and compare with the base stock models
4. Choose the best option during translation | 5. Compare the best engines according to the results of the coefficients and quality |
5. When you collect feedback from translators on quality, re-train your favorite engines and evaluate quality again
6. Try translating through the winning engine |
See also the state of MT by Intento for an in-depth guide on what solutions there are on the market.