Text-to-speech – can you afford to ignore this technology?

86%. That’s the percentage of Americans surveyed who are online daily according to Google’s 2017 Consumer Barometer survey. Even 77% of Americans aged 55 and older are online every day.

On a global scale, these percentages are just as impressive. Over 3 billion people are believed to be online, and by 2020 it is estimated that at least half of the world’s population will join the Internet sensation.

With more and more people using their mobile devices for online interaction, businesses are now faced with greater challenges to capture their audience’s attention and sustain interest – and that may mean a restructuring of their marketing strategies. One way in which businesses are attracting more customers is through the use of text-to-speech technology (TTS).  You may think of TTS as a unique form of technology used primarily in the educational field to assist students with reading or speech difficulties. Or maybe you think of TTS as automated voice synthesis and voice-recognition technology which permits the deaf community and those with speech-related challenges to communicate over the telephone. And you would be correct, but TTS has now gone way beyond assistive applications.

TTS has the potential to increase your return on investment (ROI) by extraordinary numbers

Although it is true that TTS is still widely used in the educational field to enhance student experiences with reading and speech, it is now being used in many private and commercial sectors. Whether your business focuses on commerce, education, fitness and nutrition, finance, entertainment, beauty and fashion, or computers and technology, TTS has been shown to increase customer comprehension and enhance the overall message. TTS has the potential to increase your return on investment (ROI) by extraordinary numbers. How? Through improved customer engagement.


Some websites are becoming screen-reader optimized to allow for visually impaired individuals to actively participate. Screen readers use TTS software to read the online content to the customer. Some examples of this technology include VoiceOver, OS X, and JAWS for Windows. iPhones and Androids also use sophisticated TTS functionality, opening up the doors to users with visual impairments. Creating websites that are screen-reader optimized just makes sense. Your products and services will be accessible to traditionally underserved demographics – and widening your target audience translates to an increase in sales. Non-native speakers who feel proficient in verbal communication but who struggle with reading, can now listen and interact with your website with the use of screen readers. Instead of bypassing your site, the screen readers help them to gain all the information they may need to make an informed purchase. Visually impaired individuals, or consumers with literacy issues suddenly become active participants in the ecommerce experience.

And what about the growing number of consumers who use their mobile device as their number one form of research? An increasing number of consumers worldwide turn to their mobile device for personal and professional use.

Reading small print on a mobile device can be a challenge for a great many individuals who may immediately abandon your site even though you have exactly what they need. A simple adjustment to ensure your website is screen-reader optimized could make all the difference.

What about virtual assistants? Do they employ TTS technology, and how do they change the ecommerce experience?     

The modern online shopper is multitasking – fixing dinner for instance, or balancing their budget while searching online for that new gadget that everyone is talking about. Taking the time to search, compare, and shop for products and services online using a home computer can be bothersome and time consuming, but searching for the same information using a mobile device is often more convenient. All customers have to do is ask their mobile device a question, and within seconds, they are presented with a list of options. So just how does this work? It’s all about virtual personal assistants (VAs) – and they have taken TTS to a whole new dimension. Using automatic speech recognition (ASR), natural language processing, question and intent analysis, and text-to-speech technology, VAs are changing the face of ecommerce.

Traditionally, online marketers focused on search engine optimization (SEO) and search engine advertising (SEA) campaigns, and developed natural phrasing for long-tail searches that worked – and works – extremely well for typed searches, but that isn’t necessarily the case when customers use their voices to shop. Marketers are now beginning to add more question-type keyword phrases in their content in order to match spoken searches which are generally longer in length and more conversational in nature, often beginning with interrogative words and phrases like what is the best…, or, where can I find… If your marketing campaign solely focuses on the development of natural phrasing for long-tail searches, you may be missing out on a significant percentage of TTS consumers. Marketers are encouraged to pay close attention to these trends and refine their marketing campaigns so that VAs are more likely to include their websites when responding to specific TTS inquiries.

Virtual personal assistants narrow down customer choices, often leading them directly to a purchasing decision

Siri was introduced to the iPhone in 2011, followed by the Android Now a year later. Amazon’s Alexa entered the commercial stage in 2014, and Microsoft’s Cortana came along in 2015. Virtual personal assistants have dramatically transformed the ways in which customers use the Internet which has in turn, driven how companies engage in marketing and customer service behaviors. Virtual personal assistants can be used for any number of tasks, from helping the user place a call, providing driving directions, and announcing the weather forecast, to promoting a company’s upcoming special offers, and allowing customers to search, find, order, and purchase products and services. If marketers fail to refine their SEO and SEA practices and include more question-type keyword phrases and answers in their content, they will be less likely to catch the virtual personal assistant wave. Virtual personal assistants narrow down customer choices, often leading them directly to a purchasing decision.

The history of TTS technology

TTS has actually been around for a lot longer than you may think. In fact, in 1779, Christian Kratzenstein, a Russian professor created an apparatus that successfully produced 5 long vowel sounds, namely /a/, /e/, /i/, /o/, and /u/. Kratzenstein built acoustic resonators that resembled the human vocal tract. The resonators were then activated when air passed over reeds, causing them to vibrate.  However, true human voice simulation didn’t gain world attention until the 1970s and 1980s. In 1984, DECtalk was created by the Digital Equipment Corporation. DECtalk – a type of speech synthesizer with text-to-speech capabilities – gained world acclaim when scientist Stephen Hawking was able to effectively communicate using the device. Since those pioneer days, TTS has advanced in unimaginable ways, and is used by any number of sectors, including education and eLearning, the automotive industry, ecommerce, the smart home industry, and more.


TTS provides an enhanced experience for online shoppers by providing a more interactive experience.

Ecommerce titans such as Amazon, Google, Microsoft, and Samsung are making substantial progress with advanced TTS solutions, including Amazon Polly, Google Text-to-Speech, Microsoft text-to-speech voices, and the Samsung text-to-speech engine. TTS has advanced to such a degree that it can virtually work with every personal digital device, from computers to smartphones. In these days of IoT (the Internet of Things), virtually anything with the capability of being connected to the Internet is getting connected. Light bulbs can be switched on and off, thermostats can be adjusted, toys, appliances and more can all be controlled using your smartphone. And with the percentage of online users steadily increasing, digital content can now be digitally consumed across multiple devices – any time, and from anywhere. Online users are becoming increasingly more selective when it comes to the websites they visit. They not only look for relevance and updated content, but they wish to experience the Internet in a multitude of ways – and with a multitude of devices. TTS provides an enhanced experience for online shoppers by providing a more interactive experience.

Prime Voices weighs in on TTS

We recently interviewed Prime Voices founder, Constantino de Miguel, to get his insights regarding TTS for localization. Prime Voices, a language service provider which first opened its doors in 2000, in Lyon, France, has now grown to include the PrimeGroup, consisting of Prime Voices in Lyon, Digita, in Ecuador, and Prime Multimedia, in Barcelona, where Mr. de Miguel now operates. PrimeGroup specializes in multilingual voiceovers, dubbing, subtitling, and digital multimedia localization, but that’s not all. PrimeGroup also has their finger on the pulse of TTS.

Although the PrimeGroup currently offers TTS in a limited number of languages, Mr. de Miguel is looking forward to the future of this low-cost, fast-delivery technology which he sees as an inevitable competitor of human voices in the coming years:

“We are observers and active participants of this new technology. We are trying to remain knowledgeable but strive to apply this technology as well, because in a few years’ time, the requirements will be TTS – so we cannot ignore this technology.”

PrimeGroup strives to stay on top of the latest breakthroughs with TTS technology while remaining focused on providing valued clients with choice, quality, and highly competitive prices. And although TTS offers a great many benefits, companies should also consider the limitations. One such limitation that Mr. de Miguel points out involves script length. The longer or more complex a company’s message, the less the quality of TTS since larger, more complex messages require the unique intonation and emotion that only a human voice can offer. While some companies prefer the cost savings and efficiency that TTS offers, others prefer the authentic human voice, natural inflection and tone, and authentic delivery that TTS cannot fully replace. Even with advances in TTS technology, Mr. de Miguel stresses that TTS is not a miracle solution, but an option.

“TTS is constantly developing but can be the solution some companies seek if they require only short messages.”

Eric Alexandre, an audio engineer at PrimeGroup, has an extensive background in voiceover work as well as audio-visual technology. When it comes to TTS, Mr. Alexandre helped us to further understand this technology. “TTS is constantly developing but can be the solution some companies seek if they require only short messages, such as those commonly-heard announcements on the phone, at the airport, or in bus stations.” Even with its current limitations, TTS shows great promise, and Mr. Alexandre, like Mr. de Miguel, is hopeful about its future. Mr. Alexandre does stress however, that the secret to TTS success, at least for now, is how companies choose to use this technology – As the saying goes, short, sweet, and to the point is likely the most cost-effective approach that will give you the best results.

The following is a cost comparison of TTS to voice talent at Prime Voices.


According to PrimeGroup, the magic number is 1,000 words – at which point the savings obtained by using TTS are offset by the increase in post-production costs to improve the audio (the frequency of issues involving intonation, inaccurate pronunciation, and audio glitches begin to significantly increase, the longer the text).  

TTS for localization

For global brands, localization has been a necessary, yet time-consuming and expensive investment. In the past, companies that wanted to reach a global audience were required to hire and schedule multilingual voice actors, organize expensive studio time, and spend hours and hours recording, rerecording, and editing. This is still the reality for many companies but thanks to the latest advancements in TTS technology, more and more companies are considering TTS. However, it is not for everyone, and marketers who are eager to use this technology, should carefully weigh the pros and cons:


  • Stress relief on the eyes
  • Convenience
  • Time efficiency
  • Cost-effective


  • Lack of prosody
  • Absence of emotion
  • Flat tone
  • Increased likelihood of inaccurate pronunciation
  • Risky to use for some sectors

Driving in the car, running on the treadmill, or preparing a meal, more and more individuals are turning to TTS technology. People in general, enjoy the option of sitting back and relaxing while having a story read to them. The convenience and time savings for consumers, and the savings in cost for companies are arguably the major benefits of TTS – while not having to stare at a screen for long periods of time is clearly an additional advantage. However, for some companies, the fact that TTS lacks a natural voice and appropriate human emotion is a game changer. A friendly human voice that answers all of your customer’s questions might be more important to you – and to your customers – than saving you time and money. Some consumers may react more positively to a relatable voice when considering the purchase of your product or service. Although the technology has come a long way, it is still a far cry from sounding authentically human.

If your company operates in the international market, it may be quite a challenge to get TTS to accurately pronounce your brand’s name (especially in different languages), not to mention foreign names and locations. Consider for instance how the city “Paris” is pronounced in English, and how the same word is pronounced in French. Other areas of concern involve the healthcare and legal sectors. Although both the legal and healthcare professions are beginning to employ TTS technology, there may be situations in which TTS isn’t the most appropriate or ethical. Due to the highly sensitive nature of legal issues and the absolute necessity for medical terminology to be as precise as possible, TTS might not be the most suitable approach to disseminating this type of information. It may in fact, end up costing more money and taking more time since it would likely require careful monitoring of the ways that specific words and expressions are being pronounced. Both the legal and medical profession cannot afford to put the wrong message out there since their audience’s wellbeing – both figuratively and literally –  lies in the balance.

And what about commercial organizations that employ TTS to drive down cost and provide options to consumers? What happens if they inadvertently infringe on copyright laws? Amazon had a bitter taste of that back in 2009 with the release of TTS for Kindle. The Authors Guild legally challenged Amazon, claiming the right to have a book read out loud is exclusive to the author. As a result, Amazon changed its policies, leaving the decision up to individual authors regarding whether or not to have their books TTS enabled.

Language service providers (LSPs) and their clients are beginning to ride the TTS wave with its many advantages. However, there is a need to reflect on the sensitive nature involving the sharing of proprietary information. Transferring company-owned knowledge (as in eLearning for example) to third-party memory machines and feeding this information to other corporations are risks worth examining. No matter how you slice it however, the future seems clear – TTS is here to stay – and its capabilities continue to advance. TTS provides LSPs with new and exciting options, but these options are just beginning to be realized. To date, just how TTS impacts the localization industry has been grossly underreported, but Nimdzi plans to change that. We will be focusing our attention in this area to bring more insight into how the introduction to TTS is transforming the localization industry.

Watch for upcoming reports as we delve deeper into the world of TTS. We will weigh in on the debate about whether or not TTS will ever fully replace the need for the human voice. We will uncover not only the current limitations to TTS technology, but how its future seems inevitable – especially for LSPs.

Stay up to date as Nimdzi publishes new insights.
We will keep you posted as each new report is published so that you are sure not to miss anything.