Text to Speech on OutSystems

Not long ago I participated in a Hackathon, which went fantastic. There we were able to try a lot of things and give free rein to our creativity. One of the components used was Text-To-Speech (TTS).

What is Text to Speech (TTS)

Text to Speech or Speech synthesis is a system that converts normal texts into digital phonetic text read aloud, in a natural pronunciation. That is, it is a computerized voice that turns written text into speech.

Why use TTS

There are many reasons why it makes sense to use TTS for your application. It can bring tremendous benefits to the user of your application.

  • It opens doors to anyone looking for easier ways to access digital content
  • Extend the reach of your content, through an accessibility feature to help people:
    • with literacy difficulties
    • with learning disabilities
    • with reduced vision
    • learning a language
  • It is convenient for those who want to be read to
  • Some people are auditory learners and can help with that learning style
  • Can provide a multisensory reading experience that combines seeing with hearing
  • Enables delivery of the enhanced end-user experience, while minimizing costs
  • Allows mute people to communicate by voice

How to Use TTS in Outsystems

So let’s go to the interesting part of this post. The use of TTS in your application. Here it is important to note that I will address TTS in the context of mobile applications. That said, I will be using a forge component/plugin which is the Text-to-Speech Plugin.

Before we start I must say that this is a simple plugin to use. It will bring many benefits to your application but also some fun, especially when testing. From this plugin, you will realize that there are several actions that you can perform, and we will talk about each one of them.

Note: if you don't know how to use an existing plugin in the Forge repository, see this document that will help you to add a plugin to your application.

You can convert texts from:

  • Input data inserted by the user
  • the hardcoded texts
  • the texts in the database

To use TTS in your application you will need to consume the plugin’s actions.

In a very simplified way I can say that the actions you can do are:

  • Play the Text to Speech
  • Stop the Text to Speech
  • Change the languages it speaks
  • Change the spoken voice

Let’s see how we can use each of them. I will start by talking about the most essential action that is the TextToSpeech.

  1. TextToSpeech

TextToSpeech makes the synthesized voice a little more robotic and I must say that it works mainly for English. You will not like hearing its phonetics in other languages. Or maybe you even like it, to see how fun it is.

This component has 2 inputs,

  • Text: which is of type text and will be the text to convert into sound
  • OnEndCall (optional): which is of type Object which is the action to perform after the speech is over.

That is, it is an easy-to-use component that produces good results.

Here I note again that the output returned by Javascript is of type Object. This is because the TTS action receives an object in the OnEndCallback.

The OnEndCallback action can be much more than just a feedback message. You can have more specific events and actions performed to make the experience more unique.

  1. TextToSpeechWithOptions

TextToSpeechWithOptions is like TextToSpeech but its big difference is that you can parameterize and make the experience even better for the user, especially when they are non-English speaking users. In other words, this component allows defining parameters that improve the value of the TTS in your app. It allows defining the Locale (e.g.: pt-PT, fr-FR, es-ES) that will allow having the language spoken with a phonetics and an accent like the native and much more appealing. Certainly, the component to use in case there is Multilingual. It is also possible to change the speed at which it speak and the type of voice you hear.

  1. CancelTextToSpeech

What if you start listening to a text read by TTS and don’t want to hear it anymore? What to do? It’s simple! There is the CancelTextToSpeech action that does exactly what the name says. Cancel or stop the execution of the task of converting text to speech. And here you can also have an OnEndCallback that executes after the TTS stopped.

  1. CheckLanguages

The end-user can decide which language the TTS synthesizer should speak.

This CheckLanguage action allows you to get a list of the possible Locales that TTS can synthesize voice to. With the correct accents in those different languages.

Unfortunately, there is no Locale of all languages for this plugin. So in some cases (e.g. Zulu – South Africa – zu-ZA), you will not have Locale available.

  1. GetVoices

Finally, there is a possibility that the speech synthesizer will speak with different voices. Depending on the type of Locale chosen, you have different voice options. In some Locales, you have few voice options, but in others, as in the case of English, you have a range of options. But it is fantastic to have this possibility to be able to choose the type of voice you want to be using in your TTS.

From one of the chosen Locales and with one of the voices that appeal to you, you can define TextToSpeechWithOptions according to your preferences and have a fantastic user experience in your application.

What to do if you have text with HTML in it?

Well, if you are in a situation where you have HTML text you can see that TTS will read the HTML tags. But don’t worry. To resolve the situation, the following is done. Before passing the text to the synthesizer, it is necessary to remove any part of the HTML and this can be achieved as follows:

  • Name the Container that will have the text and insert the Expression with the text.
  • Set HTML text to a normal text using the following script in JavaScript
var html = document.getElementById($parameters.ElementId).innerHTML = $parameters.HTMLText;
html = html.replace(' ', " ");
html = html.replace(/ /g, ' ');
html = html.replace(/<\/?[^>]+>/ig, " ");
html = html.replace(/<style([\s\S]*?)<\/style>/gi, '');
html = html.replace(/<script([\s\S]*?)<\/script>/gi, '');
html = html.replace(/<\/div>/ig, '\n');
html = html.replace(/<\/li>/ig, '\n');
html = html.replace(/<li>/ig, '  *  ');
html = html.replace(/<\/ul>/ig, '\n');
html = html.replace(/<\/p>/ig, '\n');
html = html.replace(/<br\s*[\/]?>/gi, " ");
html = html.replace(/<[^>]+>/ig, ''); 
$parameters.Text = html;

And here you can see the example of how before using TTS you remove the HTML tags and then you use the TextToSpeech plugin. This way you will have the TTS to read the desired text correctly.

There are other strategies to remove tags from an HTML string to produce a clean text version and i challenge you to present them in the comments.

Important: for plugins to be available to your application you need to have generated and installed, or updated, the native application after you started making use of the plugin. Otherwise, the plugin binaries are not in the application package and the plugin won’t work.

Conclusion

This is a fantastic component. It has an unlimited number of benefits in terms of accessibility, convenience, and aid to learning. TTS is simple to use, gives great pleasure when developing, especially when testing. It manages to bring lots of users to the application who would otherwise have to look for another more friendly application.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s