Google have launched a Cloud ‘text-to-speech’service along with their AI subsidiary DeepMind. With this launch, Google aims to provide developers with access to natural speech technology used across Google Assistant, Maps and Search applications.
According to Google, their cloud-based ‘text-to-speech’ service allows customers to choose from 32 different voices in 12 languages. They also believe that the product can be used to empower call centers with voice response systems and enable IoT devices to communicate and convert text-based media into spoken formats.
Speaking about this in detail, Dan Aharon, Product Manager, Cloud AI at Google, commented:
Cloud Text-to-Speech’ lets developers choose from 32 different voices from 12 languages and variants. We have also increased the resolution of each sample from 8 bits to 16 bits, producing higher quality audio for a more human sound.
Google claims that the ‘text-to-speech’ service correctly pronounces complex texts such as names, dates, time as well as addresses for authentic-sounding speech. They also say that DeepMind’s AI prowess will be leveraged to convert text into speech much faster.
The technology used by DeepMind to allow this text-to-speech conversion is called Wavenet, claim Google. They also say that the new version of WaveNet generates raw waveforms 1000 times faster than its original model to produce one second of speech in 50 milliseconds.
In closing, Google said that they will be competing with AWS’ Polly which enables 47 voices.