Generate TTS using Playback()

PitzKey · May 25, 2021, 10:01pm

Hey everyone,

I came across something interesting today and I thought it may be helpful to others.

Normally when we generate TTS, it is something like:

/path/to/script "text we want to generate" params more_params filename

This returns:

/path/to/filename

So, we’d usually call the script with it’s arguments using System()

TIL, that you can do the following:

exten => s,n,Playback(${SHELL(/path/to/script "text we want to generate" params more_params filename)})

You can also do the same with Read()

The cool thing about this is, that Asterisk will wait until the script returned the /path/to/filename, meaning if there’s a lot of text, it can take a second or two and Asterisk will wait, and start the playback only once the script completed.

Enjoy!

dicko · May 25, 2021, 10:19pm

There is a defined API interface to res_speech

Things that work as advertised

lumenvox
unimrcp
vosk
google
amazon
watson
sphinx

Choose your engine and analyse the cost

Using one of these which are non blocking you can return either on detect or timeout

quickstart

PitzKey · May 25, 2021, 10:32pm

Cost is an issue.

We use Weblate to manage the text and language translations, so the script will only generate a new TTS file if the text string was updated. Otherwise, it’ll reuse the same file that was previously generated.

Obviously, there is different TTS data such as names, street addresses etc. that we generate on each call that cannot be reused. So we clean up these files with a hangup handler.

So this is essentially the reason why we use Playback() instead of the built-in TTS engine.

dicko · May 25, 2021, 10:40pm

vosk is pretty good and totally free. It also provides the the resources you best can use in your dialplan,

revisit

for a 5 minute solution perhaps, TTS in a docker container, no muss no fuss

(It like cores and memory though)

PitzKey · May 25, 2021, 10:46pm

I’ll look into it. I like Google TTS a lot…

dicko · May 25, 2021, 10:47pm

As do I but it’s costs accelerate quickly after 60 minutes per month.

A full on TTS/SST looks to me like unimrcp as a core, that way you can choose your engine in a catholic fashion

TheWebMachine · May 27, 2021, 7:06am

A while back we worked to add AWS’ Polly TTS to FreePBX (https://twm.tips/wikipollytts). When calling TTS via the usual methods, either via the GUI TTS module or custom dialplan AGI(agi://127.0.0.1/propolys-tts.agi,"What I want to say.",polly,/usr/bin/node), a checksum is used to determine if the sound file has already been generated in order to reduce calls to and cost of Polly.

You can also substitute channel variables into the TTS string a la ${varName} allowing for re-generation of only the constantly changing bits via multiple AGI calls in series. So it is possible to utilize the included TTS functionality for simplicity while minimizing expense and reusing sound files where appropriate.

PitzKey · May 27, 2021, 7:19am

That’s exactly what we are going, but within Playback() or Read()

system · June 27, 2021, 7:19am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.