On June 27 and 28, global business leaders met in Salt Lake City, Utah, for this year’s TAUS Global Content Conference. The two-day event was preceded by the TAUS Industry Leaders Forum, happening earlier that week.
The conference drew a crowd of around 200 people from different corners of our industry. As per the TAUS event website, attendees included:
Topics included some of the most prominent discussions around global content:
Let’s review some of the many topics that were discussed by panelists and presenters.
Neural machine translation (NMT), AI, and the automation of workflows all require data. But good, applicable data is hard to come by. The fact leaves enterprises with a dilemma when it comes to making decisions regarding machine translation: would it be more beneficial to develop the technology internally or to buy?
The ultimate goal is to have robust NMT systems that produce context-appropriate translations and limit human involvement. These would ideally be models that are dedicated to an enterprise’s product, domain, content type and so on.
Here another question arises: who should be doing the training? The language service providers (LSPs)? Is it enterprise? The task goes beyond training, however, because it’s a whole job in itself to maintain the engines.
In the opening speech, Alon Lavie (Unbabel) calls the new approach to training models “dynamically-adaptive NMT models.” Unlike static models, these “unified” models would be learning off of enterprise-specific translation and from the resources and features of the enterprise itself.
At Facebook, MT plays a crucial role in advancing the company towards its globalization goal by providing access to all types of content and providing the same experience to all products for all users. MT also plays a critical role in eliminating hate speech because it’s not only the prevalent English that will be flagged. Still, like with most endeavors involving low-resource languages, quick improvement is both challenging and expensive. As a way to compensate, Facebook is trying to apply monolingual data to other languages.
Raw MT can be applied to places where it is difficult to use human talent and to where raw MT is better than no MT – in certain locations or for customer support, for example. Sarah Weldon (Google) says that at this point, it would be beneficial to use raw MT where appropriate and then build off of that, despite the fact that not everyone has the same level of tolerance for raw MT quality.
Especially when it comes to NMT, use cases in the industry are evolving. MT is mostly used in the legal domain and in life sciences. But generally, each content type is met with its own challenges, from formatting to grammatical errors. With legal text, specifically, there is a plethora of scanned PDFs, which adds an extra layer of difficulty for translation, as mentioned by John Tinsley (Iconic Translation Machines). At the same time, this has paved the way for new opportunities for optical character recognition (OCR) and a move to begin considering MT upstream.
So, what do translators think of MT? A linguist herself, Tess Whitty (Swedish Translation Services) shared some common views held by translators. MT started off as mockery, but as it evolved, it was met with fear. However, it is best to see MT as another tool in the toolbox, she says. And while MT is developing for the places it is appropriate for, the demand for translators and interpreters is record high.
One of the challenges facing translators is that while expenses go up, rates are stagnant at best. Linguists can make themselves more valuable to clients by serving as cultural consultants and domain specialists in addition to translators, Whitty adds.
It is no surprise that data was a focus point of the conversation. It is needed in all aspects of a business, from revealing metrics to engine training. Data is the fuel driving the language industry today. Data is also a key ingredient in removing human activity where it makes sense to do so.
But what is being measured and why? Dashboards are a means of visualizing information for a problem you are trying to solve. Once the problem and what needs to be measured are identified, data can be used to create actionable items.
Questions you may be asking yourself might include: how does productivity change when I change my MT? Should I really be investing in customizing an engine if a generic engine is good enough for what I need?
Julien Didier (Transperfect) shared that it would be useful to track the performance of engines in real time and then use that information to drive the selection of engines. It would also be incredibly useful, he added, to dice quality by factors such as engine or language.
Another topic of the conversation on global content was transcreation, something MT engines are not capable of doing at this point. Transcreation is used for advertising, logos, marketing, and other creative content, and it goes beyond translating the source. The challenge with transcreation was that there had been no clear guidelines on exactly what it entailed. Earlier this year, TAUS led an initiative to define a practical, repeatable and definable set of guidelines on transcreation, published online and available to the public. The document provides a core definition of what transcreation is, and it serves as a common tool to be used by all stakeholders. An online course on the topic is also currently in the works.
A panel of localization educators shared their thoughts on what can be done to bridge the knowledge gap in localization education. It included Adam Wooten (Middlebury Institute of International Studies), Jeff Beatty (Mozilla/ Brigham Young University) and Pete Smith (University of Texas at Arlington).
A localization program, says Adam Wooten, should include more than a class on CAT tools. It should be preparing students for jobs that may not yet exist and for the challenges that professionals may be facing in the future. Instructors could be bringing real problems into the classroom for students to explore. In fact, employers even approach instructors of the few localization programs that exist with problems they would like students to look at, and instructors are then modifying their curriculums based on these conversations and on the constantly-changing localization landscape.
Universities and graduate programs can also be interdisciplinary spaces where important discussion on topics such as ethics can take place. Unfortunately, at the moment, there is a lot of inconsistency when it comes to job titles involving localization, which makes recruiting and job matching challenging. One school in Austria for example, as pointed out by Jeff Beatty, calls their localization program “intercultural communications.”
One final piece of advice from Adam: tell your alma maters to introduce an internationalization aspect to their computer science program because today employers are seeking internationalization engineers and cultural experts.
By providing more opportunities in localization education students at the undergraduate and graduate levels, these professionals entering the field would be better equipped to continue filling the gaps in globalization that currently exist.
The TAUS Global Content Conference is a truly enriching couple of days. It’s another opportunity to discuss current topics in globalization with friends and colleagues. It’s also an opportunity to hear how stakeholders are addressing challenges and reaching solutions. And last but not least, it’s a space to listen to presentations from the bright, innovative minds filling the gaps in the ecosystem.
This year’s TAUS Game Changer Innovation winner was Lakshman Rathnam, founder and CEO of wordly. The company’s technology provides automatic interpretation in real time and is equipped with the capability to continue interpreting even if the speaker switches languages.
We are in the “surprise factor of acceleration,” as Jaap Van Der Meer, owner of TAUS, put it. The volume of content is exploding, and progress is being made in NMT systems. Next year’s conference will continue looking at how current and new challenges are addressed and at what is being done to fill the gaps.
Pictures courtesy of Nimdzi Insights.
It’s already been six years now since Google revealed that Google Translate processes 146 billion words a day — three times more than what all the professional translators in the world combined can do in a month. That was 2016 and things haven’t really slowed down in the machine translation (MT) universe since.
We recently introduced you to the two- (or five-) second rule, which is essentially the reaction or decision-making time a linguist should spend judging whether to post-edit a segment of machine translation (MT) output or to retranslate it. This rule of thumb aims to help increase the linguist’s productivity when working with MT.
If you’re a driver, you’ve probably heard of the two-second rule. Staying at least two seconds behind any vehicle is considered a rule of thumb for drivers wanting to maintain a safe following distance at any speed. The two seconds don’t represent safe stopping distance but rather safe reaction time.
Do you remember the last time when people were NOT talking about machine translation (MT)? We don't. Wherever you go, there’s someone talking about MT. With few exceptions, it seems like the only major disruptors in our industry over the past few decades have been breakthroughs in language technology.