A History of Machine Translation: From the Type-Writer to ChatGPT in One Century


Report co-written by Nadežda Jakubková and Andrew Warner.


Today, machine translation (MT) is so pervasive that — for many young or early-career localization professionals, at least — it’s hard to imagine a time without it. But such a time did exist. Those with a decade or two of language industry experience under their belt have, no doubt, witnessed firsthand MT’s evolution into the nearly omnipresent entity that it is today. 

Even still, it may be surprising to learn that the history of MT dates back to the early 20th century, long before many of us were even born. In 1933, the first patents for machine-assisted translation tools were issued in France and Russia, and we’ve been building on that technology ever since.

Along the way, developers and translators alike have learned some valuable lessons that are worth looking into today — especially as novel technologies like OpenAI’s ChatGPT draw the attention (both positive and negative) of more and more thought leaders in our industry. 

The 1930s-1950s: The early days

Although human beings have long fantasized about machines that could miraculously translate text from one language to another, it wasn’t until the 1930s that such technology actually seemed like it could be a reality.

Georges Artsrouni and Petr Troyanskii received the first-ever patents for MT-like tools in 1933, just a couple months apart from each other, working completely independently of each other in France and Russia respectively. These tools were quite rudimentary, especially in comparison to what we think of when we hear the term “MT” today. They worked by comparing dictionaries in the source and target language, and as such, could really only account for the root forms of words — not their various declensions and conjugations. 

Troyanskii’s mechanical translation device, for example required a typist to transcribe the target language words, an operator to annotate their grammatical function, and an editor to turn it into a readable text in the target language. Without computers, the technology was little more than a glorified bilingual dictionary.

But the first general purpose electronic computers were not far off on the horizon — in the mid-1940s, developers like Warren Weaver began to theorize about ways they could use computers to automate the translation process. In 1947, Weaver proposed the use of statistical methods, information theory, and war-time cryptography techniques to automate translation with electronic computers. And shortly thereafter, academic institutions began devoting resources to the development of MT technology — the Massachusetts Institute of Technology, for instance, appointed its first full-time MT researcher in 1951.

These efforts culminated in the infamous Georgetown experiment, the first public demonstration of computer-powered MT technology. Researchers at Georgetown University partnered with IBM to create a tool that could translate Russian (albeit, Russian that had been transliterated into Latin characters) into English. The researchers hand-selected 60 sentences to present to the public — though their tool translated these sentences adequately, the technology still left a lot to be desired when it came to everyday use. 

Although we often hear grandiose, perhaps overly optimistic claims about human parity in MT today, it’s important to note that such claims are not at all new. The researchers on the Georgetown experiment, for example, claimed that they needed just five more years of hard work to perfect their tool, based on just 60 sentences translated from Russian into English alone (and romanized Russian at that!). 

Now, of course, we know that this was not the case. More than half a century later, we still can’t quite say that our MT technology is perfect. The lesson learned here is to be careful about over-promising when it comes to MT — perfection is not always as close as it might seem.

The 1960s-1980s: The dawn of RBMT

Though their technology was far from perfect, it seemed like the researchers at Georgetown and IBM had generated some nice momentum for MT.

In the United States, that momentum came to a halt in the 1960s. In 1966, the Automatic Language Processing Advisory Committee (ALPAC) published a report claiming that MT was too expensive to justify further research into, since MT was less efficient than human translators. MT research lost funding in the United States, but in other places, developers continued chugging along with MT projects.

During this period, researchers experimented with a handful of different MT methods, with rule-based MT (RBMT) becoming the most popular. RBMT relies on explicit grammatical and lexical information from each language, operating based on a series of rules in each language. 

Early RBMT systems include the Institute Textile de France’s TITUS and Canada’s METEO system, among others. And while US-based research certainly slowed down after the ALPAC report, it didn’t come to a complete stop — SYSTRAN, founded in 1968, utilized RBMT as well, working closely with the US Air Force for Russian-English translation in the 1970s.

Though it was the most prominent form of MT at this time, RBMT had several limitations caused by its technology constraints. It was no doubt an improvement upon the technology of the Georgetown experiment, but RBMT was time-consuming to create, since developers needed to manually input the rules of each language. Plus, it often generated inaccurate or awkward-sounding outputs, especially when the input was somewhat ambiguous or idiomatic.

In searching for ways to improve and upscale RBMT, some developers found solutions in different areas — in 1984, Makoto Nagao developed example-based MT in Japan, for instance. Although this method is not widely used today, it remains an example of the ingenuity of early MT researchers. This then led other developers to the field of statistics, as we’ll see in the next section. From these pioneering researchers and developers, we’ve learned the importance of constant vigilance, scalability, and maintenance of MT systems. 

The 1990s-2010s: More advanced methods

In their quest to improve RBMT, researchers developed another, more efficient method for MT: statistical MT (SMT).

In the 1990s, researchers at IBM developed a renewed interest in MT technology, publishing research on some of the first SMT systems in 1991. Unlike RBMT, SMT doesn’t require developers to manually input the rules of each language — instead, SMT engines utilize a bilingual corpus of text to identify patterns in the languages that could be converted into statistical data. Analysis of these corpora allows SMT engines to identify the most likely translation options for a given input — these models performed significantly better than RBMT and quickly became all the rage.

And as electronic computers slowly became more of a household item, so too did MT systems. SYSTRAN launched the first web-based MT tool in 1997, providing lay people — not just researchers and language service providers — access to an MT tool. Nearly a decade later, in 2006, Google launched Google Translate, which was powered by SMT from 2007 until 2016.

Alongside the development of SMT we can find inklings of neural MT (NMT) development as well. The same year that SYSTRAN launched its web MT tool, Ramon Neco and Mikel Forcada published the first paper on “encoder-decoder” structure, paving a pathway for the development of NMT technology. In 2003, researchers at the University of Montreal developed a language model based on neural networks, but it wasn’t until 2014, with the development of the sequence-to-sequence (Seq2Seq) model, that NMT became a formidable rival for SMT.

After that, NMT quickly became the state-of-the-art MT tool — Google Translate adopted it in 2016. NMT engines use larger corpora than SMT and are more reliable when it comes to translating long strings of text with complex sentence structures. That said, not all that glitters is gold: NMT requires a lot of time and computational resources, and may struggle with domains that lie outside their training data.

Though NMT is a far cry from the days of Artsrouni and the Georgetown experiment, it’s important not to completely dispose of the old MT methods. SMT is still used today by developers to check the relevancy of their training data, though RBMT is rarely, if ever, used on its own for practical purposes. 

As MT’s improved though, we’ve also learned that combining different MT methods can yield better results: hybrid MT approaches can utilize methods like RBMT, SMT, and NMT in conjunction with each other to refine the translation. This is a particularly helpful approach for low-resource languages where there’s little training data available: an RBMT engine creates a rough translation that can be further improved by SMT and NMT engines later on in the process.
Neural systems are also trickier to debug than SMT — because NMT engines require so much data, it’s impossible to know all of the words and phrases that go into training them, creating a black-box problem. Perhaps the most critical lesson learned from the early days of NMT, then, is that large quantities of high-quality data are critical to developing good MT engines.


The 2020s: ChatGPT and beyond

Although large language models (LLMs) perform a lot of other functions besides translation, some thought leaders have presented tools like ChatGPT as the future of localization and, by extension, MT.

OpenAI's GPT series, including tools like ChatGPT and the recently launched LLM GPT-4, feature language models built on large-scale neural networks with advanced features. Though their translational capabilities aren’t quite as great as state-of-the-art NMT, that’s not to say they can’t be improved upon — after all, these tools weren’t designed with translation in mind. 

As localization teams incorporate this technology into their workflow — and some have already begun to do so — you can bet that the technology will become more and more specialized for our field. Plus, combining this technology with pre-existing MT technology might yield interesting results. ChatGPT is a decent editing tool that could be used alongside MT tools to touch up their output. 

Moving forward, it’s important to take a grounded and principled approach to adopting and developing future technologies. By tracing back the history of MT all the way to its earliest incarnations, we can draw the following lessons:

  1. Be careful not to over-promise when it comes to MT — and be skeptical of grandiose claims.
  2. Perseverance is key when it comes to scaling and maintaining MT systems.
  3. Don't discard past technologies — while they might not be as useful on their own anymore, combining technologies can yield even better outputs than the new technologies alone.
  4. Large datasets of good quality have always been an important aspect of developing MT and will be even more important as generative AI becomes an important part of our workflow.
  5. An objective method for evaluating, analyzing, and comparing the quality of your output is important for building and improving upon the existing technology.
  6. Whatever technology comes next will become the new norm in its own time, if it brings value to the organization or the end client.

This report has been researched by Nimdzi's Localization Researcher, Nadežda Jakubková. If you want to learn more about this topic, reach out to Nadežda at [email protected].

This report has been written by Multilingual's Staff Writer, Andrew Warner. If you want to learn more about this topic, reach out to Andrew at [email protected].

22 March 2023

Stay up to date as Nimdzi publishes new insights.
We will keep you posted as each new report is published so that you are sure not to miss anything.

Related posts