New challenges brought about by doing business in our digital world demand new solutions. Some constants still remain, however, without which a text and the quality of its translation would be less than satisfactory. One good example of such a constant is terminology and terminology management.
Terminology management includes a number of different aspects, but it usually starts with terminology extraction. As we wrote in 2018, if there’s no glossary, the first task is terminology mining (or, terminology harvesting or gathering).
‘Term Extraction’ is understood as the formation of a list of terms, the translation of which should be consistent within the framework of a project. The result of extracting terminology is a list of terms with contexts listed in the glossary. Some extraction tools provide statistical solutions for gathering a list of terms for which translations do not yet exist. The translations are then created either in the course of a project, or as a separate process, by delegating this task, for example, to a terminologist.
To speed up the process, some tools offer extraction of bilingual terms from reference files and from previous translations. SynchroTerm (part of the Terminotix Solution by LogiTerm), for example, automatically extracts terms, their equivalents, and contexts from file pairs in any format, bitexts, SDLXLIFF, XLIFF, or TMX files.
Most terminology management systems (TBS) feature term extraction functionality, but some rely on third-party extraction tools like MultiTerm Extract. The same situation is observed with translation management systems (TMS). This means that in a regular translation workflow inside a TMS, a linguist would probably use third-party statistical terminology extractors. However, there are TMS that offer built-in options for this process. You can have a look at some examples of such TMS by selecting the “terminology extraction” filter on the Nimdzi TMS feature overview page.
Four examples of mainstream TMS with term extraction capabilities. Source: Nimdzi TMS feature overview tool.
Terminology management is an essential step in any successful translation project workflow — and productivity norms to measure it have been evolving. Earlier in 2020, we published a post about productivity in terminology management. It garnered attention from academic circles, representatives of which pointed out that the productivity metric used for the translation of a term should be less than that widely used within the localization industry.
Indeed, in some cases, five seconds for a terminologist to decide on a term candidate may be unrealistic and an hour may not be enough to translate 50 terms into one target language. In other instances, though, even higher productivity rates of constructing terminology lists are already being successfully achieved. For instance, Omniscien offers a solution with productivity already three times higher — their terminology extraction of subtitles and automatic terminology translation presents options to the user who then votes for the best suggestion. Of course, the machine may or may not be wrong, but, according to Omniscien, this scheme helps achieve a translation productivity rate of 180 terms per hour.
Another milestone in bilingual terminology extraction has been recently set by XTM. Their newly developed feature available in XTM v12.4 and later helps build terminology lists from existing translations with up to 90 percent accuracy.
Source: Process Innovation Challenge, Locworld
“XTM is an innovative company, more so than many other TMS providers. It invests in linguistic intelligence. Innovation is not something you can put amongst TMS requirements, but if you were to do so, then XTM would score very well.”
István Lengyel, Belazy Ltd.
For their automatic extraction of bilingual terminology, XTM utilizes Big Data, AI, and advances in computational linguistic technology including Inter-language vector space. The feature already works for 50 languages helping XTM customers save up to 80 percent of time on glossary creation.
“The XTM AI team has developed a new technology to take a mundane and tedious process away from the terminologist. The bilingual term extraction performed during the alignment of the parallel source and target texts produces a spreadsheet with the data required to review and add terminology. One implication of this is that XTM users will see 80% productivity improvement over manual methods.”
Sara Basile, XTM International
XTM sells both to enterprises and language service providers (LSPs). This presents an opportunity for many different localization industry players to try this promising automated approach which makes smart choices and helps tackle the challenge of aligning and extracting terminology in an efficient and innovative way.
The year is 2023. Six years after the big neural MT push of 2017, it seems appropriate to say that machine translation (MT) has finally found its way in the localization industry. Most MT providers are producing reasonably acceptable baseline quality and MT solutions have never been more accessible. As a result, MT is becoming a reality in many organizations. What’s more, MT technology has reached a certain level of maturity in terms of customization and training.
Developing your own approach to using generative AI models such as ChatGPT — one that is both practical AND ethically sound — is perhaps the best way of proving naysayers wrong and ensuring that you get the most out of this promising piece of technology. Perhaps surprisingly, the first key to success with generative AI models is to learn how to talk to them.