Generate TTS using Playback()

Hey everyone,

I came across something interesting today and I thought it may be helpful to others.

Normally when we generate TTS, it is something like:

/path/to/script "text we want to generate" params more_params filename

This returns:


So, we’d usually call the script with it’s arguments using System()

TIL, that you can do the following:

exten => s,n,Playback(${SHELL(/path/to/script "text we want to generate" params more_params filename)})

You can also do the same with Read()

The cool thing about this is, that Asterisk will wait until the script returned the /path/to/filename, meaning if there’s a lot of text, it can take a second or two and Asterisk will wait, and start the playback only once the script completed.


1 Like

There is a defined API interface to res_speech

Things that work as advertised


Choose your engine and analyse the cost

Using one of these which are non blocking you can return either on detect or timeout


Cost is an issue.

We use Weblate to manage the text and language translations, so the script will only generate a new TTS file if the text string was updated. Otherwise, it’ll reuse the same file that was previously generated.

Obviously, there is different TTS data such as names, street addresses etc. that we generate on each call that cannot be reused. So we clean up these files with a hangup handler.

So this is essentially the reason why we use Playback() instead of the built-in TTS engine.

vosk is pretty good and totally free. It also provides the the resources you best can use in your dialplan,


for a 5 minute solution perhaps, TTS in a docker container, no muss no fuss

(It like cores and memory though)

I’ll look into it. I like Google TTS a lot…

As do I but it’s costs accelerate quickly after 60 minutes per month.

A full on TTS/SST looks to me like unimrcp as a core, that way you can choose your engine in a catholic fashion

A while back we worked to add AWS’ Polly TTS to FreePBX ( When calling TTS via the usual methods, either via the GUI TTS module or custom dialplan AGI(agi://,"What I want to say.",polly,/usr/bin/node), a checksum is used to determine if the sound file has already been generated in order to reduce calls to and cost of Polly.

You can also substitute channel variables into the TTS string a la ${varName} allowing for re-generation of only the constantly changing bits via multiple AGI calls in series. So it is possible to utilize the included TTS functionality for simplicity while minimizing expense and reusing sound files where appropriate.

That’s exactly what we are going, but within Playback() or Read()

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.