Webinar: Continuous Localization

WEBINAR: Continuous Localization

Full Transcript


Hello and thank you everybody very much for joining us today. Welcome! Today we are streaming for yor from the USA and Russia. My name is Yulia Akhulkova. I am a researcher at Nimdzi Insights. In case you’re here by accident and have no idea what Nimdzi does, please take time visiting our fruitful website or follow us on social media for fun and insights. I’m your host today here to share some of those insights together with our guest from Evernote. We are conducting a webinar on this subject of continuous localization and before I pass the mic to our guest, let me guide you through some housekeeping items. First, please feel free to post questions in the comments section to the video. We’ll either address them after the presentation or reach out to you later. I also saw that actually we have some questions already, so we’ll cover as many as we can today. The hashtag for the Webinar is #Nimdzi_Insights and our Twitter page is @Nimdzi_Insights – make sure to subscribe to it, by the way. And now let’s get right into it. The approach of continuous localization brought to you by Igor Afanasyev, the Director of localization at Evernote. Igor, the floor is yours.


Thank you, Yulia, for inviting me to talk about continuous localization. That’s a really interesting topic, at least for me. But I bet it will be interesting for you, our listeners, if you’re just starting the localization efforts in your company, and want to see what the options are, how to better lay out the, the processes around localization. Or maybe you already have a manual localization process in place and have some ideas on automating it or you are shopping for new TMS and see what TMS will suit your needs better. Maybe you already have something that you call an automated localization process, but it doesn’t feel right. And, you may still experience some issues with quality or  you have still some manual tests here and there. Or maybe you are not on the client side, but on the vendor side, and you have clients that ask you about automation, about continuous localization and the idea of providing that layer of automation for every initial decline sounds a little bit daunting to you.


So if you want to stay ahead of the competition and offer your clients really nice smooth automation, you will learn how to do this in this webinar. So we will be talking about three different stages of automation or different approaches to localization. First we will briefly talk about manual localization. Basically to have a baseline for comparison of manual localization and automated localization. Then we will talk about automated localization, the way it is typically implemented and offered by TMS solutions about APIs, and that not every automation is actually a good one. And then we will talk about continuous localization in depth and explain how it differs from typical automation that you can currently  experience and what tools can you use to actually achieve that continuous localization. Okay, let’s start with manual localization first. So this is a high level overview of how manual localization works.


If you are developing a software product, and we’ll be mainly discussing  software development and localization today, you’ll have a version control system. This is where you have all the files. And on the localization vendor side, there’s a TMS (translation management system) where translators work. So with manual localization, what you do is that you periodically gather some files and then send them over to the localization agency. And that process requires multiple people who are handing files over to the next person. Usually those files are sent by email. And that process is a little bit slow. There’s nothing wrong with the process per se. It is suited for specific purposes. First of all, it works pretty well for large, infrequent projects. And if we are talking about software localization a couple of decades ago, it was pretty normal to release new software every year or every two years, maybe.


And with that it doesn’t really matter if the localization step takes an extra day or a week. Right? But as we are changing the shape of the software industry and software releases happen more often, that manual localization becomes a bottleneck. It is expensive because it requires the attention of multiple people. It really breaks apart for, for smaller projects and for faster cycles. There’s one good thing about manual localization is that if you’re, orders are infrequent, you are translating usually the entire project at once. So you send all the files belonging to a certain project and with that you provide more context to translators. That is good. But as I said, as we are moving into a new era of software development, when releases happen every month, sometimes every week. And I saw some companies that do this that can publish the new changes and new features  multiple times a day. Buyers wanted localization providers to provide some sort of automation.


What localization providers did is basically provided a very straightforward way of automating process of handing files over from the client to the vendor. They implemented some APIs that allow you to automatically, problematically, send files to the translation management system. This implies that on the client’s side there is some sort of integration that uses this API and connects API with your software development infrastructure with your version control. This integration is something that you need to build on your own- that you need to maintain on your own. And, and usually it requires some level of attention, like constant attention from either the engineer that works with that integration or from localization manager. So things like exporting, importing, stuff like that. This is usually done on your integration side that you build as a client. In that paradigm, because you should mention it, it usually gets access to the translation management system to allow to manage jobs.


It actually looks kind of good and probably works for some of the clients and for some of the processes. But if you dig deeper into the way those integrations work, you will see that they are not very efficient. So let’s see how the typical automation API looks like. When you go to a TMS documentation site and you learn about an API, this is what we’ll typically see in the documentation. So there are functions to create a translation project, then to upload files to that project to formally approve the order and start the translation. Then you can periodically check for the translation status and once translations are done, then you download files back to your system. So let’s look at this first create translation project step. So projects through that paradigm can be called jobs or orders or tests. But what they actually mean is a very small shopping order that says here are the strings or here are the certain files that he need to translate.


While once those files are translated, you get them back and your project is done, it’s closed. So when you are starting integrating with that sort of API, you’ll suddenly have a question, how often can you create the jobs? If you want to really make automation feel like instantaneous the first idea is to create jobs as soon as the new code appears in the version control system. What this means on the TMS side that you suddenly have multiple jobs per day, probably even hundreds of jobs per day. Can Your TMS handle that number of jobs? Will those jobs conflict with each other? How can localization manager actually manage that, that volume of work? If you’ll see there are some problems on the TMS side, then the next logical step would be maybe we shouldn’t be creating jobs every now and then, probably we need to create them only once a day or twice a week or once a week.


But this introduces artificial delays, so you already have an automation that has those extra artificial poses in the process. How do you pick strings and files for translation? Can you send just like all the files from your project within each order? Or do you have to pick up the files that actually contained some new strings for translation? If you do the latter, then it means that you need to build within your integration some custom UI. You need to have some state management. Remember what the files are already being ordered for translation, not to send them twice. So, this really complicates the integration when you upload the files. And some APIs even say that they don’t work with files directly they work with strings. So you need to figure out what files to upload. Do you need to send entire files or only the strings that changed?


And if you send entire files who is paying for a hundred percent matches? Because some vendors will charge you for that and you need to be aware of that before you start integrating with them. If you need to send only the new strings, then it means that you were small order that you have contains only those strings and you miss the context. What are the other strings in that project? And again, this requires some more complex integration that you as a client are now in charge of creating all the older parsers. What if you want to change the order that already was sent for translation? Can you resend the same files over and over or do you have to somehow cancel the previous orders? So this again requires some decision making, probably some manual intervention from the localization manager. When do you start the translation? Should your jobs be submitted for translation automatically or has to be there some sort of manual process? When the localization manager, we’ve used the order and approves every single of them. Again, this can either mean that you have extra delays or your integration becomes a little bit more complicated. And not all TMS will support automatically sending jobs for translation.


Finally, when you are downloading files, you also have to deal with merge conflicts. So how do you integrate the translations back? If you have an order that takes several days to complete, this means that for those several days, your engineers are still working on the software and they may be introducing some new changes. They introduce new strings, introduce new keys or delete keys from resource files and that may break their builds. So what they usually do, what they usually have to do is not only change the English resource files worth, but the source language, but also change resources for all localized versions of the same resource and make those amends to the resource files, simultaneously, delete or insert keys, et Cetera, et cetera. If they do that, when they get the translated files back from localization agency, this is where the merge conflicts can happen.


This can be kind of fixed by integrating translations often. So if your TMS allows you to download semi translated files, then it will allow you to at least partially resolve the chat that you will have those merge conflicts. But some TMS do not offer that and you have to wait until the actual, the entire order is fully complete. And then you’ll have a question how to change translations once the order is fully complete and closed. Do you have an option to just go back and fix something? For example, if you want to change some terminology, what do you do? Do you have to resubmit the order with all the strings that contain the same term.


So to summarize what I said that job based automated localization has some disadvantages. It simplifies life of vendors, but for clients you have to deal with pretty robust integration. Their API is simple and every salesperson on the TMS side, will literally say that they have a simple API you can integrate with. The problem with those simple API is that all burden of integration is transferred to the client. Each clan has to reinvent the wheel, maintain their own integration. And integrations can be really, really complicated. It still requires job management work. So if you have jobs, this means that you need to somehow approve those jobs or cancel those jobs if you want to resubmit the same files. So that’s, that’s extra extra burden on the localization managers side. Usually job based automation has some problems with passing context. The smaller the jobs, the less context you provide. Some TMS can somehow fix that problem with providing ways to do some in the context editing, visual context, et cetera. But again, your options become more limited in what TMS to choose. And finally, as I said, TMS have limits on the number of jobs. They’re simply not suited for having too many jobs with the same files in flight. And usually what they do is suggest you to add some artificial delays and not create jobs every now and then.


So the main takeaway, and this is why I dedicated special slide for that, is the idea if you want to create a really smooth localization automation, try to avoid job based tools or API.


Now let’s talk about continuous localization. What does it mean and how it differs from the typical automation that we discussed? There is no single like owner or official definition of continuous localization and many companies that talk about continuous localization, they talk about slightly different things. The way I understand it is that continuous localization is a way to automatically gather all new source material, publish it for translation, then acquire the translations and integrate them back into the product. And the main idea, the main goal of it, it’s not to be simply automated, but fully autonomous, and have an uninterrupted workflow.


You can consider continuous localization and automation that has no jobs or you can think about it slightly differently, and think of it as one never ending job that has all the projects, all the files and all the strings at all times. And that job is something that you constantly update.


So, imagine that you have a nonstop two way synchronization between your version control system and your TMS. All the projects as compared to the projects that we saw previously in the typical job based automation workflow are long-lasting. So you create them once and you never change them. That in that paradigm, a project could be like your website, your IOS client, you create it once and then you never delete that project because you’re continuing to develop the application for that platform.


It is very important for continuous localization to have all strings from all your projects exposed for translation and review. And sometimes when we talk about when we hear feedback about continuous localization, the reaction of the people can be that continuous localization, even worse, the state of things with automation. So we have automated localization, we have problems with context trust, fair to the  to the translators. They have to deal with smaller orders with just a few strings. Now we’re talking about continuous localization, which basically means that strings are published immediately. Maybe that’s even more like, even worse than a regular automated localization. But here’s the answer to that question. With continuous localization, it’s imperative that you have everything available for translation at once. That who don’t have jobs that limit the scope. And if you have all strengths available, this means you have the entire context. The amount of context that translators have at a single time is much better than even with manual localization. Because with manual localization, you send the entire project for translation and with continuous localization you see all the projects that you have in your company. So you can really check, like cross check, all the different translations across the projects, make sure that the terminology is right, et Cetera, et cetera.


So when you have all the strings available for translation, this means that you have full context, the full context is preserved and you can arbitrarily change strengths on your translation server. So remember, I provide an example of when you want to amend some translations when the  job is done. What do you do with that? With typical localization automation, this means that you need to gather files that contain certain terms and create a new order. Here, you don’t have to deal with it. The jobs are orders, all strings are available for translation. You just go to the capital, find the right translation and change that and that change will be integrated into the proper product.


So what are the tools that you can use to implement continuous localization? Nimdzi Insights has compiled a language technology atlas. This is a very nice and large picture with all logo types of different companies and tools that provide different services. And those low types are organized into different buckets. So one of the buckets they have is called localization for developers. And what localization for developers means is basically here are the tools that are optimized for synchronization with version control systems. As we talked before, that continuous localization is about seamlessly syncing between your version control and your TMS. Those are the tools that are good candidates to start researching the tool landscape. So below you have a link, you can ah, open that atlas and see what tools out there.


As I said before, many companies when they talk about continuous localization, they mean different things. So you definitely need to do your research and see what actually this means, what API’s they provide. Are those job based APIs or something else? So today I want to provide your three options that you could try to research more closely and see if they will fit your intended localization flow. The first option is two tools that were built by us at Evernote and that we use for continuous localization. The first is called Serge and the second one is called Zing. So on the Nimdzi atlas, they have something, they have a logo type for putting all this as the open source TMS and capital. But unfortunately poodle is no longer being actively maintained. We forked that and we have that project called Zing, which is similar to Pootle UI. And these are the tools that we use for continuous localization. So both tools were built specifically with that continuous seamless synchronization flow in mind. Serge is that magic synchronization that happens between your version control and TMS. And Zing, is the TMS and cat tool up front for translation. Both tools are free and open source. Serge is very flexible when it comes to configuring projects for localization. You can create virtual projects from multiple repositories and have all sorts of different file parsers and configurations.


All configuration in search is done as code, which means the set of configuration files for each of your localization projects are the source of truth for your entire localization automation. You can always see what happens where, what are the projects, what are the repositories that you work with and how they are processed. Those tools being free and open source mean that you host and maintain the localization infrastructure by itself. Some people may prefer to have managed solutions. Some prefer to host the things by themselves. One advantage of this is that your localization infrastructure will be more secure because it will run within your premises. The next option is to do your research and look at those individual tools that are in this localization for developers bucket and see if you can do a continuous localization with them directly. So here I highlighted a few tools that I know that work with this job. Last paradigm.


So they’re good candidates for you to start your searching. They may be not as flexible with configuration as compared to option A, but they may have some other advantages. They have different UI. So you may want to see how they differ and if they fit your intended workflow. Do your research, check their features and supported file formats. The pricing model is important. They all have different sorts of pricing based on the number of strings that are exposed for translation. And what would be the actual effort required to set up and maintain the configuration because you still have to write some integration with those tools. Some of those tools provide common land interface, which usually means that integration can be simpler. So I would suggest looking at this specific option when you’re shopping for your GMS.


And option C is a mix of A and B. Serge, If you use that as that synchronization engine between your version control and the TMS, can actually work with multiple TMS. It already has a list of TMS it is compatible with, there are plugins for them and you can use Serge as your hub for continuous localization and you get all the benefits of Serge like robustness and flexibility of configuration and configuration as code and common land interface, which you are no longer locked into specific TMS solution and you can switch between them pretty easily. So before that we talked about localizing software mostly, right? Version control system means you have a product, you’re localizing. So what about other things? How do you deal with translating marketing materials? How do you deal with translating help center content or your marketing website or your developer accommodation? So the solution that we use at Evernote that I highly recommend is that you build connectors not to TMS directly, but to your version control system. When you’re shopping for TMS, one of the things that you’re constantly thinking about is whether this TMS has all sorts of connectors to the systems that you need.


And TMS vendors, the only option that they have is to provide connectors directly to their system. The problem with this approach is that what happens if you don’t have any connector, you need to write that on your own. And you spend your money and spend your engineering efforts to lock into a specific TMS. That’s not good.


If you use your version control as a data hub, this means that your connectors that you write, you write them to your file system. So you have an external TMS or Google drive, for example. And what you need to do is basically create a synchronization logic that exports all the content that you want to localize and store them as files. And then it synchronizes them to your version control. Those connectors are much easier to develop and maintain.


And you don’t have this vendor lock in any longer. So you have a single localization process for both marketing materials and your software strings. And on top of that, having all your localizable content in a version control system gives you all the benefits of the version control system, which is you have free backup of your content, you can roll back to any previous content at will. If you accidentally delete something from your from your CMS. And there’s no way in that CMS to roll back that change, you still have an option to get the content from your version control system. So you have free logging of all the changes you have free audit.


Now let’s talk about key takeaways from our webinar today. First and foremost, try to avoid job based TMS if you want to do proper continuous localization. Refer to the tools that preserve and show you as much context as possible. Store all the localized content, in version control systems. And when you’re looking for external systems for marketing CMS tools always look at them and see if it will be easy to export content from them and import it back. That’s the main problem with localization when some department not related to localization signs a contract with some SAS platform that apparently has no means to, to automate the localization.


If you use that single approach to localize software and marketing through storing everything in version control, this means you can use the same process, the same TMS and CAT tooling for everything. And that’s really simplifies the overall localization for infrastructure. And besides technology, try to intentionally simplify all the surrounding processes. Do we need a very extensive QA on the TMS side or can you do the QA when the translations are already integrated back into the product? Most of the problems with localization, most of the changes can be seen when you see the actual localized product. And you see all the internalization issues, all the texts being clipped or not rendered properly. This is something that you can mostly never catch in the TMS, even if you do multiple rounds of proofreading and editing there.


And if you simplify your localization processes and your technology, and if your localization infrastructure is autonomous, this means that the time of your localization managers can be reused in a better way. With continuous localization, localization managers, main goal is to connect translators and engineers, provide additional context, upload that context to TMS or answer questions from translators or report issues that were found translating back to engineers. And no matter what current process you currently have, I bet there are ways to improve it and you can start improving your localization process and infrastructure today. Thank you.


Wow. Applause! That was really continuous. Thank you very much for your insights. I think we now have some time left for questions. All right. Okay. The first thing you mentioned  is not really a question, but the answer to a couple of those. So, the recording of this seminar will be available and the slides will be available as well. This information is open source as Serge is. And now to more technical questions. Igor, I think those are mostly to you. So you described the system which works particularly well. If you have like one step of the actual localization process, the translator, we know that ISO standards specify that the transition process should feature interviewer to show is that possible to customize this system of this continuous localization or which you shared with us so that there are more steps t p position, any proofreading for instance, the flow of the tasks can it be set automatically for as many steps as we meet.


Yeah. Let me open the slide to demonstrate this. So when I talked about manual localization, typical automated localization and continuous localization, all they showed is just like one translator that does their thing. This is what this was simply too, to simplify the visuals because you can’t have any process within your TMS. It can be a multiple TEP process if you want to adhere to ISO standards, that’s for sure. But it can be anything else. Well, specifically with Serge and Zing, Zing has ways to suggest and then approve strings. So you have like, you can have a one step process where translators provide translations directly or have a, a, you can have a two way process when translators are suggesting strings and then there’s an editor and proofreader that approves that. Our idea behind continuous localization is to intentionally simplify the process.


We at Evernote do not need to have this, this TEP standard implemented. Our idea is to make all the strings available in the product, in the internal builds of the product as soon as possible so that what we can QA them right in the product. This allows us to catch more translation issues than if we would follow the TEP standards. But I can see companies that want to follow the standard and what they need to do is to find a TMS that allows you to define those workflows. And the same TMS ideally should not be based on the idea of jobs. So you can implement that continuous localization at the same time, translation is integrated into the product only when it goes through all the steps of that workflow. And it’s considered complete.


Okay. Think it sounds quite inspiring and answers the question. So  another question is about MT and machine translation. Is there any way for this approach to feature an MT option? So can we implement, a machine translation into such a continuous localization and how if yes?


Yeah, absolutely. So you can use machine translation within that, a paradigm of continuous localization in multiple places. First of all, you can have MT integrated into your TMS. So when translators work, on the strings, they have an option to pre-translate, each of those strings using machine translation. This is something that we have in Zing. We have this integrated with Google translate engine, but for us, it’s a manual step. So we do not force that. We do not pre-translate every single unit. We just give translators another option to speed up their translations. Another option is, and it’s, if we’re talking about Serge being that synchronization engine between version control and TMS, it’s really simple to write a plugin that when it spots in your string that has no translation yet, you can go and fetch that translation from machine translation engine and integrate that back. So with that you can do proactive pre translation or follow the units with machine translation or for certain certain projects you can simply provide machine translation and not even expose that for translation on the TMS side. So for, for some projects, that makes sense. That might make more sense. But what I want to say is that the system, yeah. Again, if we’re talking about Serge specifically, the system is very flexible and allows you to do pretty much anything.


Sounds great. Igor, we have one more question. With a job-based automation approach, you said that the typical time of the job is like a day for instance. So how does this compare to the continuous localization approach and what the typical turnaround time is?


So the turnaround time for continuous localization can be as small as like 10 minutes for us. So our typical localization cycle runs around between 10 and 15 minutes. This means after software engineer submits a new string into version control system within 10 to 15 minutes, that string will appear for translation on the TMS side. And once translation is provided by a linguist, that translation is being automatically integrated back into version control system within the same 10 to 15 minutes. This means we’ll never wait for a job to be completed because there is no job. Right? And all the changes are continuously being integrated into internal builds and we can do multiple rounds of QA a day.


I see. That sounds really impressive. Thank you very much. I think that we don’t actually have time to address further questions now, but we can do it later and reach out to the authors of those questions. I would like to, once again, thank you very much for your time and for enlightening us all about this approach. Personally, it sounds quite inspiring. And the idea behind it is very interesting and certainly worth trying. Thanks everyone for your attendance, and hope to see you on our next webinar and well thank you very much.

Igor: Thank you.

Stay up to date as Nimdzi publishes new insights.
We will keep you posted as each new report is published so that you are sure not to miss anything.

Related posts