Make Your Computer Talk using IBM Watson

Make Your Computer Talk using IBM Watson

Using the IBM Watson Text To Speech API to synthesize speech from text in many different languages.

·

8 min read

“Once a new technology rolls over you, if you're not part of the steamroller, you're part of the road.” -Stewart Brand, Writer

Objectives

At the end of this article you should be able to:

  • Access the IBM Watson Text-to-Speech API.

  • Synthesize speech from a sentence.

  • Synthesize speech from a text file.

  • Synthesize speech from text in a variety of different languages.

IBM Watson is a PAAS that is used to develop, deploy and scale AI-powered applications. Today we are going to use Watson to synthesize speech.

Before we begin I'll give a light introduction to what text-to-speech synthesizing means.

Below is a picture of a robot holding up a book AI READING.jpg

Text-to-speech (TTS) is a type of assistive technology that reads digital text aloud. It's sometimes called “read aloud” technology. In simple words, it is a technology that helps you make your computer read out loud.

Before we begin the project let's look at

The Prerequisites for this Article

  • Knowledge of python and use of pip
  • Internet connection.
  • An email address.
  • A text editor or Jupyter notebook.

You also need to have an IBMID, if you don't have one, click here to get one. getting an IBMID is essential because we will be using the IBM text-to-speech API to build our software if you have never used a PAAS don't fret I'll walk you through it.

Now you have gotten an IBMID let's...

Get our API Key

Below I'm going to explain how we are going to get our API Key for the project

First": You go to the IBM Cloud Catalog below is the IBM catalog landing page IBM CATALOG Landing.png

"Second": You click on the Services link, once you do that you will see a drop-down menu called category which lists all of the service categories offered by IBM Watson Below is a picture of the services link ZOOMING SERVICES LINK IN CATALOG.png

Below is a picture showing the services categories SERVICES AND CATEGORIES PICTURE.png

"Third": In the category drop down you will see a lot of services like Compute, Containers, Networking, Storage etc. Click on the AI and Machine Learning link.

Below is a picture showing the AI and Machine Learning link box under the categories dropdown SERVICES AND CATEGORIES PICTURE.png

"Fourth": Once you click on the AI and Machine Learning link you will then be given a Catalog of Watson's AI and Machine Learning services

Below is a picture showing the AI and ML services page of IBM Watson TTS SERVICE LINK IN AI and ML PAGE.png

"Fifth": Amongst the services offered in the AI and Machine Learning category you will see services such as Watson Assistant, Watson Studio, Knowledge Studio, etc click on the Text to Speech service.

Below is a picture of the text to speech service in the AI and Machine Learning category TTS SERVICE LINK IN AI and ML PAGE.png

"Sixth:" Once you have clicked on the text to speech service you will be taken to the page displayed below TTS PAGE BEFORE LOG IN.png

Now I'm going to assume you have an IBMID but if you do not click here to get one. Now you have an IBMID you click on the log in link located at the top right corner of the page displayed below. Once you log in you will see a blue button with create on it click on it you will be taken to the page displayed below where you will be shown your API key and URL ENDPOINT WITH API and URL.png Congratulations you have gotten access to the IBM Watson Text to Speech API.

Now that we have gotten access to our API let's

Get building

Before we get to building, if you at any point in the article get confused take a look at the project's code in its GitHub repository by clicking here

Project Flow

Below is the flow which the project is going to take

  • Install Dependencies:Here we are going to install the necessary python packages needed to use the API.

  • Authenticate:Here we are going to provide our API credentials and link to the API.

  • Convert String:Here we are going to transform simple words into spoken words!!

  • Convert a File:Here we are going to transform a whole text file into spoken words!!

  • Using new Language Models:Here we are going to transform the text into different languages!!

Let us begin

Install dependencies

I am assuming you have python, pip and a text editor or jupyter notebooks installed.

If you have the above-listed let's begin, to make use of the IBM Text-to-Speech API we are going to install the IBM-Watson library, by typing in the command listed below in your terminal or command line.

pip install ibm-watson

If you are using the jupyter notebook environment you can put down the command listed below in a cell.

!pip install ibm-watson

If you are done installing then you can move on to the next step.

Authenticate

In this section, we are going to link to the IBM Watson Text-to-speech API.

To begin we are going to put in the URL and the API key given to us on the Text-to-service page.

Below is a picture showing the page containing the API key and the URL ENDPOINT WITH API and URL.png

url = 'https://api.eu-gb.text-to-speech.watson.cloud.ibm.com/instances/d04b5ea5-e6b9-4'

apikey = 'McPLvaUHWywOhceFBH4inrgnvirrwoomckw[omlap'

#I have used a fake url and apikey.

Now we are going to import the necessary tools

from ibm_watson import TextToSpeechV1

from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

Next we are going to set up our text-to-speech service

#setup service
authenticator = IAMAuthenticator(apikey)
#tts service
tts = TextToSpeechV1(authenticator=authenticator)
#service url
tts.set_service_url(url)

Now we have successfully authenticated and set up our service, next up we are going to...

Convert a String

Here we are going to turn the string (words) hello world (why hello world? why not?😏) into voice and turn and save it to an mp3 file so you can play it over and over again

with open('./speech_string.mp3','wb') as audio_file:
    res = tts.synthesize('Hello World', accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result()
    audio_file.write(res.content)

In the above code, we opened a file named speech_string.mp3(I'm bad at naming things😢) and then we synthesized the sentence hello world into speech using a female US voice called Allison, (IBM Watson has a lot of different voices and languages, we will get to how you can make speech with all of them when we get to the using new language models section) Now you can go to your project folder where you will see an mp3 file named speech_string.mp3 play it and you will hear the words hello word spoken.

Now you've seen the power of the IBM Watson Text-to-Speech API against words, let's see if it can...

Convert a File

Here we are going to transform a whole text file into spoken words.

First, we are going to create a text file and name it text_file, and then we are going to type stuff into it. But it shouldn't make it too large because as lite users the API won't let us synthesize large text files.

with open('text_file.txt','r') as text_file:
    text = text_file.readlines()

In the code above we have opened the text file called text_file, read the text inside the file and named the text gotten from the file with the variable name text.

Now to see the text you have read from text_file, just run the variable text

text

Now you will notice that your code contains \n and they are used to show a new line in the text read, they are used for clarification. But unfortunately, it won't help clarify anything for our code so we are going to remove them.

text = [line.replace('\n', '') for line in text]

text = ''.join(str(line)for line in text)

In the code above we replaced all the \n notations and replaced them with an empty string and then we looped the replacement for every line in the text so now you can recheck your text file.

#to check for the replacement of \n

text

Now we have cleaned up our text we are going to turn the text into speech and save it inside a .mp3 file which we will call speech_file.

with open('./speech_file.mp3', 'wb') as audio_file:
    res = tts.synthesize(text, accept='audio/mp3', voice='en-US_KevinV3Voice').get_result()
    audio_file.write(res.content)

The code might take some time to load compared to when we were working with only a sentence, because of the increase in the number of words. Now go to your root folder and search for an audio file called speech_file.mp3 and you will find all the text you had in the text file read out to you

We have been able to turn a sentence into speech, we have also been able to an entire text file into speech now we are going to look at making speech in different languages using...

Language Models

In this section, we are going to look at how to make speech in French and other languages. To make speech in other languages all we have to do is change the voices, to show that I'll resynthesize our earlier text file into french

#converting to new language(french)
with open('./speech_file(french).mp3', 'wb') as audio_file:
    res = tts.synthesize(text, accept='audio/mp3', voice='fr-FR_ReneeV3Voice').get_result()
    audio_file.write(res.content)

As you can see above the only difference between the code snippet and the one we did earlier is the difference in voice name, for the one above we used en-US_KevinV3Voice while for this one we used fr-FR_ReneeV3Voice

To see proof of this go to your project folder and look for speech_file(french).mp3 and play it, you should hear the same speech but this time in french.

So in summary to synthesize speech in different languages all you have to do is change the voice.

To get a complete set of voices available for the IBM Watson Text-to-Speech service click here .

You can also check out the comprehensive IBM Watson Text-to-Speech docs by clicking here.

Happy Building.