I came across something interesting today and I thought it may be helpful to others.
Normally when we generate TTS, it is something like:
/path/to/script "text we want to generate" params more_params filename
This returns:
/path/to/filename
So, we’d usually call the script with it’s arguments using System()
TIL, that you can do the following:
exten => s,n,Playback(${SHELL(/path/to/script "text we want to generate" params more_params filename)})
You can also do the same with Read()
The cool thing about this is, that Asterisk will wait until the script returned the /path/to/filename, meaning if there’s a lot of text, it can take a second or two and Asterisk will wait, and start the playback only once the script completed.
We use Weblate to manage the text and language translations, so the script will only generate a new TTS file if the text string was updated. Otherwise, it’ll reuse the same file that was previously generated.
Obviously, there is different TTS data such as names, street addresses etc. that we generate on each call that cannot be reused. So we clean up these files with a hangup handler.
So this is essentially the reason why we use Playback() instead of the built-in TTS engine.
A while back we worked to add AWS’ Polly TTS to FreePBX (https://twm.tips/wikipollytts). When calling TTS via the usual methods, either via the GUI TTS module or custom dialplan AGI(agi://127.0.0.1/propolys-tts.agi,"What I want to say.",polly,/usr/bin/node), a checksum is used to determine if the sound file has already been generated in order to reduce calls to and cost of Polly.
You can also substitute channel variables into the TTS string a la ${varName} allowing for re-generation of only the constantly changing bits via multiple AGI calls in series. So it is possible to utilize the included TTS functionality for simplicity while minimizing expense and reusing sound files where appropriate.