LLM Solutions and How to Use Them


Since ChatGPT burst onto the scene in November 2022, everyone in the language services industry has tasted generative AI and large language models. Despite the industry hype persisting for over a year, many localization stakeholders are still in the exploration phase. Some may never move beyond this phase or try operating an LLM outside the familiar interface of ChatGPT.

This live event aims to offer an overview of the current LLM solutions market and shed light on the efforts of language technology companies to incorporate LLM-driven functionalities into their products. Additionally, we’ll explore the requirements and options for organizations looking to shift away from ChatGPT. We’ll discuss how they can operationalize their use cases in a sustainable environment, whether commercial or proprietary.

We encourage you to watch the full event to gain deeper insights into the fascinating discussions that unfolded. The live event provided a platform for Jourik to share their expertise and engage with the audience, fostering a rich understanding of the LLMs solutions and how to use them. Additionally, you can review the full slide deck of the presentation, which will be available for download at the bottom of this page.

The article at hand serves as a written record, capturing the essence of the event's key takeaways.

Separate opinions from facts.

Schedule a workshop or briefing about this topic directly with Nimdzi Insights

The first session of Nimdzi Insight's GenAI Event Series

Today, you are in the first session of Nimdzi Insight's gen AI event Series, so we've designed a series of events centered around generative AI and Large Language Models in the Language Services industry. The main focus will be on applications, implementations, and use cases, so we don't want to talk too much about disruption and impact.

We all know what's happening, the hype, and the trends we want to focus on.

You're a company or a business, and you want to get your hands dirty and use this new technology. That's what we want to discuss.

We've designed the series so that every session is rich enough in content to be seen as a standalone event. However, there will be some interconnection between the different sessions, so the next session will be an iteration of this one. We want to provide you with a top experience by offering a full series rather than one standalone event, and then we can move on.

If you are interested in generative AI and LLMs, there is an event that you might find useful. Perhaps you have experimented with chatGPT technology and you are intrigued to learn more about it. You might have some ideas about potential use cases and want to validate them or need guidance on where to start. Alternatively, you might have already started developing an application and need help improving it. Maybe you just want to stay up-to-date with the latest technology trends. Whatever your reason, this event aims to help you.

The event is packed with different sections, each with a key takeaway or thought.

The first section is an introduction to AI, covering essential concepts such as deep learning, attention, self-attention, transformer models, and foundation models. We will also discuss retrieval augmented generation, a powerful but lesser-known concept.

The next section will focus on use cases of LLMs, where we will prove that every idea is a potential use case. We will discuss machine translation and multilingual copywriting as examples and delve into how LLMs can be used in these areas.

The third section is about the Gen AI journey and how engineers can make a real impact. We will provide guidelines for engineers who want to get started with generative AI.

Finally, we will talk about the technology market and its recent developments. We will highlight the growing popularity of LLM catalogs in cloud services and the anticipated impact of LLMs and generative AI in the language technology market by 2024.

Introduction: Defining "AI"

I would like to share a controversial opinion about AI. AI is cool, it's a very nice toy, and it can do very impressive things, but unless you manage to enhance your productivity by deploying AI, it is pretty useless. That's a controversial opinion. I heard it yesterday I liked it, and I wanted to drop it in the room today as well.

IBM defines AI as technology that enables machines to simulate human intelligence and problem-solving capabilities. However, it is important to note that AI does not possess human intelligence, but rather mimics it through algorithms. This can be a simple rule-based algorithm such as a search and replace operation using regular expressions. Rule-based machine translation is another example of AI. It is essential to understand that AI is much more complex than what we see in chatGPT or other chatbots.

Neural Networks, attention, self-attention

A neural network typically consists of two parts - an encoder and a decoder. The encoder takes an input, which can be a text string or something else, and converts each individual word into a number representation known as vectors or embeddings. These vectors are then analyzed by the layers between the encoder and decoder, which determine the relationship between them based on the training data. This training data is crucial for the quality and amount of data used to train the network.

The number of layers between the encoder and decoder determines whether it is a deep-learning neural network or not. For example, a neural machine translation model contains hundreds of layers between the encoder and decoder, making it a form of deep learning neural network.

The decoder takes the embeddings and produces an output by predicting the next best word in the sequence. This continuous prediction of the next best word is the key takeaway of this session, as it is what Gen AI does.

The concept of attention is of utmost importance in neural network technology. By assigning varying levels of significance to different elements in the input, the model can process information differently based on their level of attention. This capability ensures that the model can interpret complex data accurately and efficiently. Take, for example, the sentence "The animal didn't cross the street because it was too tired." The model can correctly identify that "it" refers to the animal. Another example is the sentence "The animal didn't cross the street because it was too white." In this case, the model can determine that "it" refers to the street and not the animal, thanks to the power of attention.

If you have an interest in data science or neural networks, you might be familiar with the research paper titled "Attention is all you need," which Google published in 2017. This paper is exceptionally significant for a few reasons.

  • Firstly, Google introduced the idea of self-attention, repurposing the concept of attention.
  • Secondly, they created a new model architecture called the Transformer based on self-attention. I assume you have heard of this architecture before.
  • Thirdly, Google initially designed the Transformer model for machine translation tasks, and the language services industry played a critical role in the breakthrough of neural networks, transformer models, and large language models.

In 2017, Google introduced Transformers. Self-attention is a concept used by Transformers. It means that the model generates embeddings for every individual word in the input and computes attention scores for each word in the input with respect to every other word, regardless of their position. The model generates new embeddings for each word in the input, taking into account the entire context based on the attention scores. This process is repeated multiple times until it reaches the final representations of embeddings.

Self-attention has several advantages over traditional attention.

  • Firstly, it provides high accuracy, as it can capture long-range dependencies and relationships between words in the input sequence. For example, if the input sequence contains 1,000 words, the model can understand the relationship between the first and last words.
  • Secondly, it provides computational performance, allowing the model to digest more training data at a faster pace. Initially, Transformers were intended for machine translation tasks. However, Google and other companies discovered that Transformers were equally capable of understanding language in one language and generating an output for the input in the same language. This led to the development of large language models, such as GPT.

In summary, neural networks, Transformers, and self-attention have paved the way for large language models that we have today.

A large language model is trained on multimodal data, which can include text and structured data such as images, audio, video, and other interesting training data. Data is leveraged to run unsupervised pre-training. This means that the model is trained to be a general-purpose model rather than a task-specific model. The model is intended to perform numerous tasks. After pre-training, the model is decoupled from its training data, and we have a foundation model, such as GPT-3.5 Turbo or GPT-4. Further training can adapt the foundation model into an adapted or task-specific model. This adaptation can happen through further training and tailored prompts.

The typical tasks for a large language model include translation, text generation, copywriting, structured writing, question answering, customer support, chatbots, sentiment analysis, text classification, and text summarization. Moreover, large language models can work with images, speech, and many different types of data, such as object recognition, image recognition, and image generation. Fine-tuned adapted large language models can do a lot more than Deli.

ChatGPT caused a lot of excitement in the world of AI when Open AI released it. This is because it allowed people to experiment with a large language model, something that wasn't necessarily new in November 2022 when ChatGPT was released, as such models had been around for years already. However, Open AI was the first to provide easy access to this technology for everyone.

"GPT" stands for "Generative Pre-trained Transformer". Based on our discussion so far, we know that "generative" means that the AI has the ability to continuously predict the next best word in the output. "Pre-trained" means that the model has been trained for a general purpose rather than a specific task. The "Transformer" refers to the model architecture, as described in Google's paper "Attention Is All You Need" from 2017.

Retrieval-augmented generation, or RAG, involves using a second neural network exclusively for data ingestion and parsing. In contrast, a large language model like GPT or Gemini is used solely for content generation and Q&A based on the data provided by the second model. 

The advantages of working with a RAG-based mechanism are numerous:

  1. It leads to higher factual accuracy, akin to an open-book exam, as the model can access existing data to generate highly accurate answers.
  2. If the model doesn't know the answer to a question, it will simply admit it, reducing the chance of producing incorrect information or hallucinations.
  3. Multilingual data referenced by the second model in one language can be used to generate content in other languages, allowing for versatile deployments. 

RAG is particularly interesting for question-answering systems, such as company-specific chatbots and product-specific question-answering mechanisms. However, it can also be fine-tuned for more task-specific applications beyond Q&A. Furthermore, frameworks like the Llama Index or Lang Chain offer advanced prompting techniques, including condensed questions and prompting templates, adding to the versatility of retrieval-augmented generation. Remember the name, as RAG is a powerful and interesting approach.

Use Cases: Every Idea is a Potential Use Case

Let's move on to the next part of today's presentation, where we'll focus on different ways these ideas can be used. I always like to say that every idea has the potential to be a practical application. It's important to understand the difference between machine translation, which is limited to translating languages, and Large Language Models (LLMs), which can be used for a wide variety of purposes beyond just translation.

Before we explore how these models are used in the real world, let's talk about why they are so significant in the field of artificial intelligence. These models excel in two main areas:

  • First, they can understand the context. By giving the model source texts, previous translations, style guides, or glossaries, it can learn and adapt from this information.
  • Additionally, LLMs are good at following human instructions. While they're not perfect, they try their best to follow the guidelines given to them. Whether you need a specific word count or a particular style of translation, the model aims to meet your requirements. The model aims to understand and fulfill the instructions by giving clear instructions.

I would like to share an example. It's actually pretty stupid, but it's pretty representative, so my prompt is: "Write a sonnet about machine translation in Snoop Dogg style." In just a fraction of a second, the model can produce a response that captures the essence of machine translation in a poetic form similar to Snoop Dogg's unique style. While poetry isn't my strong suit, this example showcases the technical accuracy and creativity that large language models can achieve. The quick and precise response highlights the vast capabilities of these models.

When we put this concept into practice, we can use a large language model for machine translation. How does it work? When you interact with the model to create a translation, you have to communicate with it in your code. There are three main roles when using open AI models: system, assistant, and user.

The system sets the context, prepares the model, and specifies its role. For example, the model could be an English-French medical translator who follows style guides and glossaries.

Then, the assistant gives the model reference data and examples, like a style guide with specific instructions. For example, you can tell the model not to use masculine pronouns for gender-inclusive translations. The assistant may also provide a list of terms to use in the translation.

Lastly, the user interacts with the model by asking for a translation following the established guidelines. The user tells the model to make a translation while considering the translator's style, style guide instructions, and glossary terms. The user specifies that only the translation is needed, without any extra messages or explanations. This method helps the model create a gender-inclusive translation that meets the specified criteria.

By understanding and using this process, you can explore other possibilities like automated editing, evaluating translation quality, identifying keywords, and extracting terminology. Mastering this approach allows you to unlock many opportunities for using large language models in machine translation applications.

There are various use cases of Natural Language Processing (NLP) tasks, such as content creation, multilingual content writing, rewriting of content, summarizing content, detecting biased and harmful content, and different types of Quality Assurance (QA). You can even use NLP to perform machine translation, quality estimation, and terminology extraction. Multilingual terminology extraction is also possible.

NLP models can be used in software engineering to troubleshoot errors and write little scripts or automations. It can be your best friend for coding. Another use case is linguistic context. If you are a translator or a project manager working on a project, you can ask your model for more context on a given source sentence rather than going through Google search results.

People often forget that large language models can be used for other disciplines related to translation, such as technical writing, converting unstructured data into structured data, text to speech conversion, and more. Every idea is a potential use case for NLP.

In the agenda introduction, I mentioned a use case that has gained popularity lately, which is multilingual content creation. I have some arguments to support this claim. One major argument is that companies that provide pure translation technology are now releasing copyrighting and creative assistant apps for content writing. For example, DeepL, a machine translation technology, has an AI assistant. Lilt released L Create to generate multilingual content. Smartcat, another translation management system provider, has also released a similar feature. The most significant example, however, is Writer, formerly a translation management system called Cordoba. Recently, Writer collected $100 million to rebrand themselves as a multilingual content creation app. This proves that multilingual content creation is gaining momentum, and businesses are investing in it.

Embark on the Generative AI Journey

Why is it important to have localization engineering skills in-house?

  • Firstly, machine translation is very task-specific, and we cannot rely entirely on tech companies or technology partners to build the integrations and use cases we need. Additionally, machine translation is already widely available, whereas localization engineering skills are not yet commoditized.
  • Secondly, some tech companies may fail to adapt to emerging trends, which could cause delays or setbacks for their partners. Having in-house expertise ensures that we can avoid this scenario.
  • Thirdly, large language models offer a wider range of use cases than machine translation, and we cannot expect tech companies to support every possible idea or feature request.
  • Finally, most large language models are API-first solutions, meaning we need engineering skills to build our use cases on top of them.

Here are a few guidelines for companies that have in-house engineering skills. First and foremost, it's important to know your standards. You should be familiar with JSON, APIs, and be able to read API documentation, especially XML. This might not seem like a big deal, but in language technology, nearly every standard is based on XML. So, if you've got a great idea, like pulling an XLIFF out of your favorite TMS and sending the source and target to a large language model for quality estimation, you'll need to know your way around XML and XLIFF.

The second guideline is to focus on your prompting skills. It used to be that prompting was overrated, but that's changed. Every dot and every comma in your prompt is important, and advanced prompting techniques like few-shot prompting, one-shot prompting, and change without prompting can help you get better results from your large language model.

Guideline number three is to work in small, well-defined iterations and build things up. Don't try to run before you can walk. Similarly, focus on the logic rather than the product. Logic can be easily extended, enhanced, or adapted, and can be launched from a simple Python script or even from the command line. This is much easier than trying to make changes to a bulky product. So, focus on the logic first and you'll be able to scale up much more easily.

The Technology Market

The technology market is a very interesting space where a lot is happening. Although we won't spend too much time on it, I would like to share a few insights about it. Firstly, circling back to my key insight, LLM catalogs are closer than you think because technology giants like Amazon, Google, and Microsoft are in this space, and they have a lot of interesting things to offer.

If you have ever used AWS to host a website or anything else, you may already know that Amazon has its own LLM model studio called Amazon Bedrock. Amazon Bedrock contains many interesting foundation models, including Amazon Titan, Amazon's proprietary LLM range, Command by Cohere, Claude by Anthropic, and Llama 2 by Meta. The Llama 2 models are open-sourced and fully hosted on Amazon Bedrock. In addition to these foundation models, Amazon Bedrock has dedicated APIs and SDKs for different programming languages. So, if you have ever played with AWS, look at Amazon Bedrock.

Google Cloud is emerging as the next tech giant, and although other tech companies offer similar services, Google has a lot to offer in terms of quantity. While not necessarily superior in quality, Google's Vertex AI provides a range of options, including Google's Model Studio, Gemini Pro Models, and access to the CLaude Models by Entropic. There are also task-specific models, hundreds of models you can train yourself using your data, and much more available in Google Cloud. If you are working with Google Cloud or Google Workspace, take a closer look at these interesting features.

There is a similar situation to what was previously mentioned in Microsoft Azure. Azure has its own AI Studio called Azure AI and provides access to over 1,600 models, which is a staggering amount. Other than Microsoft, providers such as Hugging Face, Nvidia, and OpenAI have full access to their model range. There are task-specific models and models that can be trained on your own data.

Engineers should take a look at what their cloud services provider has to offer because there are a lot of nice things to work with.

The language services industry is primarily driven by language technology, particularly translation management systems (TMS). Since the release of chat GPT in November 2022, the TMS market has been very active, and I have been keeping track of all the feature releases by different TMS providers. I have categorized these features into three groups: automated translation, creative assistance, and quality assurance.

  • The first category, automated translation, includes features such as large language models for machine translation, adaptive machine translation, fuzzy match repair, and content adaptation to style and other style guide instructions.
  • The second category, creative assistance, involves AI assistants embedded in TMS, which can help with reworking, rewriting, shortening, rephrasing, suggesting alternative translations, and even SEO optimization.
  • The last category is quality assurance, and there are several features that fall under this category. One of the most promising is the use of large language models (LLMs) for QA and LQA. This approach replaces traditional rule-based algorithms that generate false positives. Other features include content filtering, detecting biased or harmful content, LLMS for quality estimation, screenshot testing, and internationalization and localization testing.

I have categorized some companies based on their strategies for implementing LLMs.

In the upper left corner are companies like RWS, GlobalLink, Smartling, and Lilt, which already have automated translation features and aim for high impact. This means they can take their time to create new business models, experiment with new pricing models, and pioneer new pricing models.

In the upper right corner, we have companies like Crowdin, Localized, Trans Effects, Smartcat, and Bureau Works. These companies don't have big dependencies and can freely integrate different models to work on new use cases. They have a head start compared to the competition.

In the lower left corner, we have companies like Phrase, who focus on optimizing their proprietary products rather than doing purposeless integration with openAI.

Finally, in the lower right corner, we have task-specific implementation-focused companies like memoQ, who have released an Adaptive machine translation feature built on GPT 3.5 turbo that takes fuzzy matches and term entries into account.

There are some interesting developments happening in the market of pure machine translation providers. However, these providers tend to be more reserved regarding new features and functionalities. This is because the foundational technology of their core products is currently undergoing a transition phase. Nonetheless, there are some notable companies and features to mention.

One of the biggest names is Modern MT, specializing in adaptive machine translation. They provide machine translation quality estimation through their API, which many people are not aware of. They have also released a trust attention mechanism that evaluates the trustworthiness of the data used to produce a translation before generating it. Additionally, they have developed a profanity filter to identify biased and harmful content in machine translation output.

Another noteworthy provider is Pangeanic, which recently launched its proprietary LLM, built specifically for the localization industry. Similar to Chat GPT, it operates in a private and safe environment for localization purposes.

Lastly, there is Globalese, which combines the power of domain-adapted neural machine translation models with custom prompting, post-editing, and automated post-editing capabilities.

There are other companies, tools, and providers in the market. These are often called "aggregators". You may already be familiar with some of the names on this slide, such as Intento Custom Mt and Blackbird.

Intento creates machine translation models using large language models from Azure, Google, and OpenAI. It offers an Enterprise genAI portal that lets you run automated post-editing using LLMs through Intento. Intento also has an API and 10+ TMS integrations.

Custom Mt is similar to Intento. They support translation through GPT 3.5 and allow you to upload style guides and glossaries. They also have API and connectors for MemoQ, Smartling, and Trados Studio. They support Google's adaptive large language model-driven machine translation solution.

Blackbird is a bit different from Intento or Custom Mt. It offers support for a variety of models, including Bedrock, Vertex, Cohere, Hugging Face, and OpenAI. It is also a workflow orchestrator, which means you can integrate its models anywhere in your workflow, regardless of the purpose of the LLM integration.


In conclusion, there are a few key messages that I wanted to convey in my presentation.

Firstly, our industry has been using AI for a long time already, through rule-based machine translation, statistical machine translation, and neural machine translation. It's important to understand what AI is, how it works, and that we have been using it for a long time already. We need to communicate this to our clients, stakeholders, and buyers.

Secondly, every idea is a potential use case. Unlike machine translation, which has only one use case, AI has endless possibilities.

Thirdly, localization engineering is more important than ever. If you have an idea or use case, you might need to get your hands dirty to make it happen.

Fourthly, large language models are closer than you think. AWS, Google, and Microsoft Azure all offer powerful LLMs. Lastly, a lot is happening in the technology space already. The TMS market has been very active, and there's some movement in the machine translation space as well. While the impact is still rather low, we may see more impactful LLM implementations in language technology by 2024.

Thank you for your attention. If you're interested in learning more about Nimdzi Insight or becoming a partner, please send an email to [email protected]. We'll be happy to follow up with you.

Still want more? Download the full presentation deck!

9 April 2024
Tags: , , ,

Stay up to date as Nimdzi publishes new insights.
We will keep you posted as each new report is published so that you are sure not to miss anything.

Related posts