TextToSpeech Module Updated - Need a little help and advice

JakFrost · November 8, 2009, 8:11am

Original Source: http://www.freepbx.org/trac/ticket/1012

Below is a little write-up of my experience with FreePBX and with updating the Text To Speech module. There’s a technical question at the end if you just want to skip to the part where you help me solve the last problem. Updated: Problem solution found.

I hope that you like long stories, because this is a doozy…

Background
(A little story of need…)

I’m building a PBX with a Digital Assistant menu system (aka. IVR) for a friend’s business to put a little buffer between the callers and my friend’s who work there to hopefully stem a lot of the repetitive questions such about general topics like Hours, Address, Directions, Services and to also provide more specific answers for some detailed services regarding the business itself that people always end up asking.

I built a great power saving but strong server in a small Mini-ITX format with an Intel Atom 330 (Dual Core 1.6 GHz - x86 or x64) on Intel 945 chipset with 1 Gbit, 2-SATA, 1-PATA, with 1GB DDR2 RAM, and OCZ Agility 30GB SSD for ultra-fast but affordable performance, inside an Apex Mini-ITX case, with slow and quiet 120mm fan and Zalman Fanmate 2 to lower the RPM on the chipset fan and wrote up the details in a Slashdot.org post for ~$300 USD. The system is a champ, very low power usage and completely (no exaggeration) silent at 1-foot distance from the ear.

I paired it with the Digium TDM402B (2 FXO, up to 4 FXO) phone card for PSTN line connectivity and Linksys PAP2T for 2-FXS output during their transition from analog to VoIP phones.

The software side is to use Elastix PBX 1.5.2 based on Asterisk 1.4 with FreePBX 2.5 Web GUI (2.6 after update) running on CentOS 5.4 Linux. Since everything is open source that saves a lot of money and also the hardware requirements for this system are low so it would work great in the little server.

Digital Assistant Menu System
(Buffer from the unwashed masses…

Now, some of the more detailed information in the IVR will change over time and since this is a business my friend’s are pretty busy most of the time and they are also not amazingly computer savvy but they could manager their way around the FreePBX IVR interface if instructed to do so for updating the IVR.

The choice comes down to using recorded human voices for all the IVR menu options and information, some of which can be long and occasionally changing, or using a text-to-speech voice synthesis solution. During my demonstration to them I had to record some voices into the IVR menu to show them how the system would work and I found that I had to re-record some of the longer menu options such as Directions many times just to get them decent. This was a bit of a chore to do correctly and frankly for a real business there is a need for a person with a very good voice to do the recordings, but then the problem of maintenance and updates comes in since the original person will likely be unavailable.

Voice Synthesis Engines
(Turing Test not required…)

So after checking out the voice synthesis engines available for Linux that work with Asterisk I found three, Text2wave, Flite, and Swift based on Cepstral voices. The first two sounded pretty poor but were understandable and the Cepstral voices sounded much better but not perfect. We tried a few voices and we liked the Diane and Allison voices with the Liquid Love effect and Slow speed to bring on a sexy announcer voice. The little web demo sounded good enough to use so the idea started to take shape and people became a little excited.

I also came across Speech Synthesis Markup Language (SSML) Version 1.0 and Using SSML with Cepstral Voices so this opened up ideas on how to improve and customize the voice.

Idea Implementation
(The little things that annoy you…)

So I built the hardware, installed Elastix, did all the OS updates and FreePBX update to 2.6.

After a bit of searching I also found the original TTS 1.0 (Text To Speech) module by XO. I installed it and it worked with the Cepstral swift engine after adding simple a soft-link from “/usr/local/bin/swift” to “/opt/swift/bin/swift” to get the program it into the PATH.

However, I quickly found that the module was made as a proof-of-concept by XO in 2006 and was not really intended for general usage much less production system deployment. It had quite a bunch of limitations, such as no punctuation or SSML markup of any kind in the text field and it was limited to 250-bytes due to the text being stored in the database. Also the module named the saved text and wave files in a illegible fashion using an MD5 sum and did not delete the files after deleting the Text To Speech entries in the GUI leaving a trail of illegible and impossible to clean-up huge Wav files in the system.

Inexperienced With Just About Everything
(Staring into the night, deaf and dumb…)

So at this point I knew that if I was going to make this voice synthesis thing work and be maintainable by my friends who are just general computer users that the module would have to be updated to remove the no-punctuation or markup limitations and make it self-cleaning when deleting or updating entries. The problem was that I knew nothing about Asterisk, FreePBX, MySQL, Apache, and I never wrote anything with PHP, AGI, and practically knew nothing about Linux except how to navigate the folders from the command line.

But like Plato said, “Necessity, who is the mother of invention.” and I had one major case of necessity and need to get this thing done for my friends who were getting a little bit more and more annoyed with the general public calling in the same questions every hour because I told them I could solve their problem quickly and for cheap. Like they say, “Soldiers fight in War not for their country but for their brothers” and I wasn’t going to let my friends take any more bullets to the brain by the general public’s constant calling. I decided to sit down and update the module by learning through Trial By Fire.

I already took my step a day before in the right direction by make a very small modification to the original 1.0 module by adding base64_encode() and base64_decode() methods around the text so that mark up and punctuation can pass through the AGI interface but the rest of the limits on the length of text, and no file management would require a much bigger change.

Cursing At Computer
(My Postman things I have Tourette’s syndrome…)

I spent the next 22-hours straight at my computer learning to modify the module scripts. The process went from total and utter frustration, loud angry cursing at the screen, punching the desk, and screaming at the computer about everything just not working at all to slow understanding of the system, some more cursing, screaming, and hitting, to deeper understanding and mumbling complaints, to low euphoria mixed with complete and utter tiredness and exhaustion as things started to work.

I basically had to update and rewrite most of the scripts in the module to add comments everywhere so that I and other people after me could understand what is happening where. I had to do a major rewrite and clean-up of all the code so that variables are consistently named everywhere and I had to remove all references to illegible acronyms and replace them with legible names, especially the name of the module itself going from cryptic for new users “tts” to easily understandable by everyone “texttospeech”.

I updated the module and solved all the limitations that I wanted and could solve except for a single one that I could not solve and because I ran out of time I had to leave behind unsolved. The updated module now does what I want it to do exactly, and it is very user friendly so that it is usable by my friends and other people who use FreePBX Web GUI and want easy to implement Text To Speech synthesis.

Thanks For The Foundation
(Standing on the shoulders of giants…)

As much as I cursed and screamed I have to say that the original TTS module that XO released as a proof-of-concept was a great piece of work that I built on to get the functionality that I required. I wasn’t very trilled to read and have to modify complicated code in unfamiliar language for me without any comments at all, with no white-space between any of the variable strings or sections of code, and with confusing and inconsistent variable names thrown around like confetti.

However, without the original module it would have taken me a lot more time to get the functionality out of Asterisk that I wanted with the voice synthesis engines and it would probably have required manual editing of the Asterisk Dial Plan “extensions_additional.conf” file by hand for every single IVR menu item and this would be an unmaintainable solution for my friends who wouldn’t learn how to create or modify Asterisk Dial Plans.

I want to thank XO for his work though because it allowed me to learn and to improve on it to get something that I needed.

Integration To Nowhere
(Parts just don’t always fit…)

Frankly I was surprised that when I started this project there was all this talk and documentation on the internet about Asterisk based PBX and articles about Cepstral Text To Speech synthesis on NerdVittles.com. However when it really came down to implementing all this stuff I found that there was no easy integration of Asterisk and voice synthesis without having to resort to Dial Plan modifications by hand.

This seemed strange to me because FreePBX Web GUI was being touted as the real way of administering Asterisk and this was the solution chosen by Trixbox and Elastix distributions. It was disappointing to find that both of these solutions didn’t have an easy to use way through a module of actually implementing useful text to speech voice synthesis in any of the dial plan configurations.

Share The Bounty
(Richard Stallman is a real hero…)

Now that this module is updated and improved without the original limits I want to share it out there with everyone so I will arrange to have it uploaded into the Third Party module repository later, for testing and after the last problem is solved. I’ll upload it to the original Ticket 1012 as version 1.2.

The Last Unsolved Problem
(There’s always one at the end of a project…)

Update: Problem was found with missing newline after last SQL statement line.

The problem that I was left with is that the “install.sql” file never seems to be executed when the module is installed through the FreePBX Module Admin GUI so the “texttospeech” table is never created in the “asterisk” MySQL database. There are no errors shown during the install process on the web output either. However, when I manually execute the “install.sql” script through “mysql” it works just fine and the table is created so it seems like there is nothing wrong with the file or SQL commands. When I uninstall the module the “uninstall.sql” does the executed correctly though and the table is deleted successfully.

I am now wondering if the problem is with my module, the “install.sql” file itself that I cannot find or does FreePBX have a bug somewhere? I’m going to look for the solution to the problem myself but in the mean time I’m hoping that someone more experienced and knowledgeable might offer me some advice or help on why this script just doesn’t execute.

install.sql

CREATE TABLE IF NOT EXISTS `texttospeech`
(
	`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
	`name` VARCHAR( 50 ) NOT NULL ,
	`engine` VARCHAR( 50 ) ,
	`goto` VARCHAR( 50 )
)
ENGINE = MYISAM ;

install.php

[php]

<?php // Asterisk Lib Folder Get if ( ( isset( $amp_conf['ASTVARLIBDIR']) ? $amp_conf['ASTVARLIBDIR'] : '') == '') { $astlib_path = "/var/lib/asterisk"; } else { $astlib_path = $amp_conf['ASTVARLIBDIR']; } // Text To Speech AGI Script Copy if ( copy( $amp_conf['AMPWEBROOT'] . "/admin/modules/texttospeech/agi-bin/texttospeech.agi", $astlib_path . "/agi-bin/texttospeech.agi" ) ) { chmod( $astlib_path . "/agi-bin/texttospeech.agi", 0764 ); } else { echo _( "Text To Speech AGI install failed." ); } ?>

[/php]

It is possible that everything is fine with the module but that in my development and installation and uninstallation of dozens of times of this module that I messed up something internally and now the “install.sql” won’t execute for some reason. I’ve already created a uninstall shell script to manually delete the files from the locations below in case I mess up and install a file with broken PHP syntax that causes the FreePBX Web GUI to go blank.

uninstall.sh

rm -rf /var/www/html/admin/modules/texttospeech
rm -rf /var/www/html/admin/modules/_cache/texttospeech
rm -f /var/lib/asterisk/agi-bin/texttospeech

What could be causing the problem? Do you have any ideas?

mickecarlsson · November 8, 2009, 11:05am

The reason that install.sql is not run is that it should be incorporated into install.php.

The SQL script should be incorporated into install.php

The function that called install.sql was removed a long time ago.

As it turned out, the function is still in there to have an install.sql.

So the module maker needs to add into his install.php something like this:

global $db;

$sql = "
CREATE TABLE IF NOT EXISTS `sometable` (
      `column1` varchar( 45 ) NOT NULL default '-1',
      `column2` varchar( 30 ) NOT NULL default '',
      `column3` varchar( 150 ) NOT NULL default '',
      `column4` int( 1 ) NOT NULL default '0',
      PRIMARY KEY ( `column1` , `column2` , `column3` )
)";

$check = $db->query($sql);
if(DB::IsError($check)) {
        die_freepbx("Can not create miscdests table\n");
}

Same goes for the uninstall.php, there must exist code for dropping the database.

Text edited to remove inaccurate information.

JakFrost · November 8, 2009, 4:56pm

I don’t quite understand PHP injected code inside HTML web pages still and that confuses me especially look at someone else’s syntax and the mixing of variable names and different quotation styles. It took me hours and hours of trial and error with testing and looking through PHP manual for references to figure out what was going on. This was all based on a fairly simple and small TTS module that doesn’t have very many lines of code and only a single web page with two text boxes. Luckily I didn’t have to modify the design of the web page itself and I didn’t have to figure out how PHP passes variables from the web page to the function script through REQUEST.

Permissions

I came across what I think was a permissions issue with the AGI script since I didn’t have the correct chmod it wouldn’t get copied to the agi-bin/ folder. After that I just set the permissions on all the files in my folder to og+rwx during testing to avoid permissions problems.

I have to clean them up a little before I release the package, no need to make the README and module.xml files a+x. I don’t even know if the .php files need g+w or even g+x on them and I have no idea if the .sql files need g+w and a+x either.

Can someone let me know if there is a reference somewhere for FreePBX modules that lists the permissions required on the files in the package?

.:
total 48
drwxrwxr-x 2 asterisk asterisk 4096 Nov  7 07:35 agi-bin
-rwxrwxr-x 1 asterisk asterisk 7810 Nov  7 07:35 functions.inc.php
-rwxrwxr-x 1 asterisk asterisk  449 Nov  7 07:35 install.php
[u]-rwxrwxr-x 1 asterisk asterisk  199 Nov  7 07:35 install.sql[/u]
-rwxrwxr-x 1 asterisk asterisk 2527 Nov  7 07:35 module.xml
-rwxrwxr-x 1 asterisk asterisk 5786 Nov  7 07:35 page.texttospeech.php
-rwxrwxr-x 1 asterisk asterisk 4305 Nov  7 07:35 README
-rwxrwxr-x 1 asterisk asterisk  424 Nov  7 07:35 uninstall.php
-rwxrwxr-x 1 asterisk asterisk   37 Nov  7 07:35 uninstall.sql

./agi-bin:
total 4
-rwxrwxr-x 1 asterisk asterisk 3459 Nov  7 07:35 texttospeech.agi

JakFrost · November 8, 2009, 5:37pm

I could have sworn that the original TTS module worked properly after I installed it so it means that the “install.sql” was executed correctly by FreePBX during the install (or the tts table existed in asterisk database already somehow) and it didn’t need to have been referenced from “install.php”. Also the “uninstall.php” probably worked correctly.

The only issue is that since I’ve installed and uninstalled the original and my version of the module so many times my Elastix installation is non-pristine so I don’t know what happened. Now that I have learned what happens during the install process, at least I know about module files being copied, agi-bin file having to be manually copied, and the “install.sql” being supposedly executed (unless the tts table existed in the asterisk database before TTS was installed). I’ll build a new Elastix virtual machine and update it to be at the same versions as mine, then I’ll snapshot the image and do an install and check manually what changed in the database and if the “install.sql” executes. I can always go back to the pristine snapshot and repeat.

I’ll do a little checking to see what happens and confirm that MickeCarlsson said about the “install.sql” not being executed anymore. My memory is so cloudy right now after so much work that I just need to check and see for myself.

If there is a need then I will add the proper code into “install.php” but frankly I would like to avoid that and keep the SQL stuff in the .sql files. At the worst, I’ll use pull the SQL statements out of the .sql files when the .php files get executed during install and uninstall as to keep the PHP and SQL code separate and in their proper file types. (I’m a stickler for consistency.)

JakFrost · November 8, 2009, 5:43pm

I just came across a new eSpeak Speech Synthesizer and there is a Asterisk module asterisk-espeak so I’m going to update the TextToSpeech module to add this engine into it.

I’ll look around for other engines to add support for to make this a full featured module.

JakFrost · November 9, 2009, 2:15am

The original TTS 1.0 module installs and uninstalls correctly and the MySQL ‘tts’ table is created and deleted automatically.

I did a lot of testing today on why the “install.sql” wasn’t working and I found the problem! There was no trailing newline at the end of the line! This is why the “install.sql” was not working but “uninstall.sql” was working in my test package.

[size=20]The install.sql and uninstall.sql files require a newline “\n” after the last SQL statement otherwise they will not execute during module installation and uninstallation.[/size]

Now I’m going to clean it up a little bit, add more comments, add that eSpeak engine feature. I’m going to figure out a way to do a dependency check on the old TTS 1.0 and 1.1 packages to either do an upgrade to require a uninstall to make things consistent.

I updated the Module Tutorial wiki to make a note of this issue.

wiseoldowl · November 8, 2009, 10:29am

First of all I want to commend you on the work you’ve put into this, and thank you for contributing back to the community. You’re obviously a lot smarter than I - seriously, you “knew nothing about Asterisk, FreePBX, MySQL, Apache, and I never wrote anything with PHP, AGI, and practically knew nothing about Linux except how to navigate the folders from the command line” and yet you managed to pick up enough of all that to rewrite this module in a relatively short of time? I’m in awe of you - I’ve been working with this stuff for years now and the only part of that I really understand is FreePBX and maybe a little Asterisk dial plan syntax. PHP in particular is almost incomprehensible to me (I can maybe follow along if I’m looking at someone else’s code and it’s not too complex, but that’s about it).

So I feel like any suggestion I might make is something that you would think obvious, but hey, after working on something for 22 hours straight even the best minds become a bit dulled! So what I’m wondering is if maybe it’s a permissions or ownership issue. In my experience, when a script works fine when you execute it manually (particularly if you are logged in as root) yet refuses to run when called from another program, it’s almost always a permissions/ownership issue.

Sorry for suggesting the blindingly obvious, but that was the first thing that popped into my head. I’d check the permissions/ownership of install.sql in some other module and see if yours are the same.

JakFrost · November 9, 2009, 6:41am

Oh great. I was tidying up the code and now I broke the module since it no longer detects any sound engines at all through the ‘which’ command from the PATH.

I was getting ready for a release for beta testing but nothing. I was eager to try out the espeak engine which I just added support for. Now I have to go back into pulling my hair out since I can’t scream right now because it’s middle of the night. It’s funny and sad. I’ll need a vacation after this amount of stress…

JakFrost · November 9, 2009, 10:41am

[b]

1.2.0.0 - Major Release

[/b]

Download: http://www.freepbx.org/trac/ticket/1012

Readme: /var/www/html/admin/modules/texttospeech/README

Modder: JakFrost

Modified: 2009-11-07

Add: Deletion old of sound files when Text To Speech entries are deleted or updated.

Add: Addition of the espeak voice engine.

Add: README file in “/var/www/html/admin/modules/texttospeech/” with detailed descriptions, changelog, and instructions on how to install all the available sound engines.

Change: Changed all external “tts” acronyms to legible “texttospeech”.

Change: Legible filenames on sound files instead of illegible MD5 sum hash names.

Change: Text stored singly in text file of unlimited size supporting plain text and XML with all punctuation and markup.

Change: Passing of Name and Engine through AGI calls in “extensions_addition.conf” and not the actual Text removing limits on size and punctuation usage.

Change: Major code clean-up and re-write, addition of comments to the code and updates to all function and variable names to create consistency.

Remove: No more base64 text encoding necessary for AGI pass-through that was added in quicky version 1.1.

Remove: Hash information is no longer useful when sound and text files are deleted on entry updates and deletions.

Limit: Name for TextToSpeech entry is still limited to AlphaNum (a-z A-Z 0-9) and Underscore (_) and Dash (-) with no spaces due to problems with spaces or special characters in “$AGI->stream_file( $soundfile )” and “$AGI->exec( Playback, $soundfile )” calls and the inability of these AGI calls to accept single or double quoted file names or PHP “escapeshellarg” or “escapeshellcmd” encoded filenames. Nothing I can do about this limitation in Asterisk 1.4.

JakFrost · November 29, 2009, 7:45pm

Here are some ideas that I had for future improvements. They would be nice to have for a full-featured module with a lot of configuration and control options but they are not strictly required since the module works well as it is with the default settings.

Most of the improvements are for the GUI so they are for the comfort of the user, because the underlying implementation of the module now lets you just cut-and-paste what you want. Except for editing the command line arguments.

Unfortunately, I won’t be able to get to work on these improvements for a little while since I have to go back to non-Linux non-FreePBX studying for a while but these ideas are intriguing enough that I’ll have to find some time soon to get them done since they will be interesting to implement for myself.

(If you’re a developer please give me some time to get back to implementing these ideas since I want to do this myself so I can learn more PHP scripting, something that I’ve wanted to do for a while. That means that I don’t really want anyone to step in right now to implement them and release a new module update. If you have ideas on how to implement them though post them up please.)

Additional improvement ideas are welcome!

Additional Command Line Arguments i[/i]

Provide a text field to add additional command line arguments for each entry to customize the sound file with specific settings for the engine.

Not difficult to implement, the data in the text field would be base64 encoded in the PHP and decoded in the AGI file and it would be “escapeshellarg” enclosed for safety when passing arguments.

This value would have to be saved into the database for future retrieval and this would require the creation of a new “upgrade” procedure in the “install” files so that if it detects a previous version of the module it will try to do a SQL command “ALTER TABLE” to add an “arguments” column into the database, instead of “CREATE TABLE” command. Could be done in SQL file preferably or at worst through “install.php”.

Engine Arguments Help

A context sensitive text box or a plain text line could list the available
options depending on what engine is selected in the drop-down box.

It would be great to parse the “–help” output of each engine and dynamically create a list of available command line arguments. An easier solution would be just to capture the help output and present it to the user in a help text box as a beneficial reminder to him while he’s typing in the commands.

The browser text field AutoComplete would come in handy for this field since it would show the user the previously used arguments.

Remember Last Used Engine

The PHP web page would remember the last used engine that was selected during the last entry Add so that the engine drop-down selection box would have that same engine already selected for new additions.

Just have to test how to save the $engine selection into $engine_choice_last and make it persistent across page reloads in same session. Didn’t try it yet so it might be easy to do, or hard. Don’t know enough about PHP or web page session scripting to know solution or difficulty right now.

Preferred Engine Checkbox

Preferred Engine checkbox could be added to make an engine preferred so on new returns to the PHP page it would retrieve the previously saved Preferred Engine and have it selected during a new Add.

This could be saved into a “/etc/asterisk/texttospeech.conf” file or somewhere else for saving and retrieval across new sessions.

SSML Markup Insertion Buttons for Text Area Field

Forum style buttons could be added to the top of the text area that would include the ability to add SSML XML markup directly into the text area field and enclose any selected text around them. Also provide drop-down buttons on some of the buttons so let the user select options for the markup tags.

An example would be the Emphasis tag ‘Some text’ and the user could select from the “level” options of “strong”, “moderate”, “none” and “reduced”.

The availability and list of SSML tags and buttons to insert them would be dynamically available depending on which engine the user selects from the drop-down box. The Engine selection box would have to be moved to the top of the page to let the user logically select the Engine first and then fill in the text fields.

I have to look around the web to see if there is a GPL friendly JavaScript code release somewhere by some web page developer that I could borrow entirely with credit to him or at least get the correct ideas on how to do it.

BBCode and vbCode forum pages include the exact feature I want but I don’t know if they are GPL compatible or not. At the very least their implementation might shed a light on how I could get this accomplished so I can code my own solution. (Luckily copyright doesn’t prevent the sharing of ideas (at least not yet), just the work produced by those ideas.)

Ideally the buttons for the tags would be dynamically created from the SSML DTD but that is probably the wrong solution knowing how different engines use only certain tags and also probably provide their own additional options for the tags or additional tags themselves. Since the engine vendors don’t publish the SSML DTDs for their engines themselves it is not possible to parse the DTD to build the list of buttons. Also engine versions will change in the future so new tags and options will be available but there won’t be any DTDs to update them except for vendor web page manuals.

This dynamic parsing of the DTD is the solution that I like the most to make because it sounds like it is future upgradable automatically but it is just not possible without having the engine vendors publish different DTDs for each version of the engine that they make. Oh well, good 'ol hardcoding will be the choice now but it will become obsolete with new versions of engines. Users will have to manually type in new tags in the future.

Because engine versions will change there would be additional code required to present the correct tags not only for each engine but also for each version of the engine that could be used. That’s probably not practical so my guess is that the tags from the latest version will be the only ones available. Not so bad as long as somebody updates this module every so often.

Delete Button Instead of Text Line i[/i]

The delete action needs to be moved down to a Delete button by the Submit button and not as a linked line at the top of the page.

Sound File Creation After Page Submit Instead of During AGI Call i[/i]

The sound file should be created after the Text To Speech entry is either created or edited so that it is ready and waiting in the /sounds/texttospeech/ folder to be used.

I just realized that the original design where the sound file is created by the AGI script right before the AGI->stream_file() method is called is not the best from a design point of view because there is a “$AGI->wait_for_digit( 1000 );” call in the script creating a unnecessary delay.

I don’t see a need that the AGI script should be the one creating the sound file when the Page script is the one that already handles all the chores of database updates and text file creation. The AGI script should just receive the filename of the sound file as a parameter and issue the simple “AGI->stream_file()” or “AGI->exec( Playback, )” call and play the sound file.

This change would remove a lot of code from the AGI script making it much simpler and faster to execute. Actually, this change would remove the need for the AGI script completely, and the Page script would just create the Dial-plan entry to do “AGI->stream_file( $soundfile )” since the sound file would be created ahead of time.

Also the Page script could provide the result of the sound file creation command to let the user know if the creation was successful or if it failed. Much better time to tell the user there is a problem while he’s looking at the GUI than for him to find out from the asterisk debug log as he’s wondering why the Text To Speech entry in his IVR isn’t playing.

I just hope that there isn’t some kind of a permissions issue that would prevent the Page file from doing a “exec()” to create the sound file, but I’ll see that when I try to make the change.

Play The Recording

The System Recordings module has the ability to play back the recording and I want to borrow that code to allow users to play back and listen to the created Text To Speech sound file to proof-hear it.

Dependency Check

The System Recordings module also has a dependency check to tell you which other destinations use that recording. I want to implement this also in my module.

Return to IVR Improvements

I implemented the Return to IVR functionality but it has a certain flaw in it since if you use Text To Speech for all your messages and you select the Return to IVR option then you will go back to the IVR but you won’t hear the options message, this is because the Text To Speech entry that was the source of the menu options was a destination before the IVR so you never go back that far when you return.

One way of fixing this would be to either change the Return to IVR message to jump back an extra destination to the Text To Speech entry, but that would break systems that use the Announcement inside the IVR for the options message. Another solution would be a “Return to Text To Speech” option to let you jump back past the IVR into the previous Text To Speech entry to get the options.

The way that I implement the return functionality is that I specifically specify the Destination for my Text To Speech messages to go back to the Text To Speech options message so that it can go forward again into the IVR. It is not as efficient as the “Return” method but it works and it is very specific since you know exactly which IVR it goes back to.

This problems with implementing “Return to IVR” is due to the fact that the IVR and Announcement modules are cooperating so that the IVR menu plays the Announcement inside the IVR menu, instead of before. To get the same “Return to IVR” functionality to work with the Text To Speech module I would have to edit the IVR module to allow announcements for message options to be either from the Announcement or Text To Speech module.

I had to use a crude little hack of updating the “IVR_CONTEXT” Asterisk variable to get Return to IVR functionality to work technically, even though it doesn’t work correctly because it skips the Text To Speech options message.

The problem is that I can’t just make Return to IVR jump back to the previous Text To Speech instead of the IVR in case there is someone out there that uses a mix of Text To Speech for messages and Announcements for IVR.

Phone Directory Names Pronounced by Text To Speech

The current Phone Directory is being spelled one letter at a time currently but this could be improved to leverage voice synthesis to pronounce the name. Even if it means mangling the pronunciation of the person’s name it is easier to understand mispronounced names than to try to figure them out over 20-seconds of letter-by-letter spellings.

Dynamic Pronunciation versus Cached File Usage

It might be useful for some people to have an option in the Text To Speech module to choose between Dynamic vs. Cached. The Cached option is the current way of writing the sound file to the disk after the user hits the Submit button. The Dynamic way would use and require the installation of the voice synthesis engine’s Asterisk applications, but the README instructions already include those for all four of the engines. The Dynamic option would then use the “Swift()”, “Flite()”, “Text2Wave()”, and “Espeak()” Asterisk application functions to pronounce the text in the Text To Speech entry without having to use the Asterisk “Playback()” or “Background()” functions that require pre-created sound files.

FreePBX Integration

This is not really a module suggestion but more towards FreePBX improvements in the future. It would be nice to have better integration inside FreePBX to the Text To Speech module so that the IVR could have the announcement list also list the Text To Speech entries from the module. The Phone Directory pronunciations could be done with the voice synthesis engine application functions “Swift()”, “Flite()”, “Text2Wave()”, and “Espeak()”. Other integrations also so that in the FreePBX General Settings list you could have an entry to specify which voice synthesis engine to use so that all the announcements could be made depending on what engine you prefer.

Asterisk Sound File Replacement

Asterisk uses tons of sound files for all the little pronunciations but they are all limited since they are all made with the same generic voice. It would be great to send a code patch to the Asterisk future development tree to have the option of using voice synthesis for all the little announcements to get rid of all the pre-recorded stuff. The main reason for this would be to allow the usage of dynamically created voice synthesis and also to customize the engine that would be used and the settings for that engine. So that you could configure Asterisk to use Festival or Flite for free voice synthesis, or use a commercial engine like Cepstral with a custom voice, sound effects, and other options to have all the sounds be dynamically created.

The Asterisk settings could also include a Dynamic or Cached options so that you can choose if you want to use up a lot of CPU to do the announcements or instead cache them to disk for lower utilization at the cost of storage space.

Licensing of commercial voice engines becomes an issue for truly dynamic usage of giving the user real-time information from some kind of an internal data source like a database of his account information but pre-caching would be a legal solution about replacing the Asterisk sound files with newly created ones by the voice synthesis engine selected.

Feature Codes or Extensions

I need to add access to the “$fcc” object from FreePBX to implement Feature Codes for all the Text To Speech entries for direct access to them during a call or to transfer callers to individual Text To Speech entries directly.

This has always been a feature that I wanted to throw people back into the menu since they slam “0” to get to the operator then inevitably their tiny-brains ask the first and second question that are answered by the menu, “What are your Hours?” and “What is your Address?”. Because of their rudeness of not using the IVR menu to get the answers themselves and then annoying my friends at the business with questions answered by the menu the payback will be to throw them back into the exact items in the menu when they do that.

This exact thing just happened to my friend yesterday while I was standing next to him the caller got through to the operator extension then asked about Hours and Address, which are options 1 and 2 on the menu at the very beginning.

For now my friend can use Blind Transfer “##” and throw them back into Incoming Call Simulation “7777” extension to make them hear the menu options again but it would be better to just thrown them directly in the Text To Speech Hours menu item.

JakFrost · November 9, 2009, 9:31pm

Little post as a reminder to myself to fix this when I get home.

README

195c195
< cd espeak-1.41.01/src
---
> cd espeak-1.41.01-source/src

296a296,297
> echo /usr/local/lib/> /etc/ld.so.conf
> ldconfig

JakFrost · November 16, 2009, 4:17am

----------------------------------------------------------------------------
1.2.0.1 - 2009-11-15 - JakFrost
----------------------------------------------------------------------------

Fix: Fixed the README file instructions for installation of the engines, some lines were missing, some were mistyped.

Fix: Fixed a CRITICAL error in README with installation instructions for Flite speech engine that had a typo in the command for appending the “usr/local/lib” library path to the ld.so.conf file for usage by the “ldconfig” command. The correct command includes append “>>” redirect instead of create “>” redirect. This bad command caused the removal of all the “/etc/ld.so.conf.d/" files from “ldconfig” command. The result was system wide error messages such as “PHP Warning: PHP Startup: Unable to load dynamic library >> ‘/usr/lib/php/modules/mysql.so’ - /usr/lib/libmysqlclient.so.15”. The correct command is: “echo /usr/local/lib/>> /etc/ld.so.conf” with double “>>”. [b]If you experienced problems or error messages due to this bad command you can easily fix the situation by adding the line "include ld.so.conf.d/.conf” to the top of the “/etc/ld.so.conf” file and executing “ldconfig” again to re-add all the shared library paths.[/b]

Change: Changed Cepstral “./install.sh” line to add “agree” to license and “/opt/swift” as default path to install to without prompt.

kevinb · November 17, 2009, 5:47pm

Jak,
GREAT module work btw! Makes inbound announcements REALLY easy! However, on my system at least, there are some issues. This happens on both versions 1.2.0 and 1.2.0.1. The asterisk base directory is not being detected, so when the texttospeech.agi runs it is not giving the proper full path to swift and trying to read and write to/from non-existant directories. I have verified this by editing the agi file and adding the asterisk dir in and then dialing in and it then works properly. However, it seems that if I go into the fpbx module and change the text it rewrites the agi and the changes are gone? Thanks and I hope its an easy fix!

Kevin

JakFrost · November 17, 2009, 6:44pm

The Asterisk path detection is very poorly implemented so that will change next release. Also the entire AGI is going away because it is pointless so that should address that problem.

I’m rewriting the module this week because it is useless for IVR menu usage since it doesn’t let the user skip and go to an extension directly, the user has to wait for message to finish or manually skip it with # then hit the extension. This is due to a bad implementation of stream_file() in the AGI that will be fixed. Instead the PHP page will use Asterik’s internal Playback() for no-skip and Background() for skip and direct extension access in a new context. This make the module useful for IVR.

kevinb · November 17, 2009, 6:46pm

Very good Jak. I have a system here to beta on if that helps at all. Thanks!

Kevin

JakFrost · November 17, 2009, 9:49pm

Linux Distros

What Linux distribution are you running by the way so I can see where the problem is showing up on your end?

I’m using Elastix 1.5.2 (32-bit) with FreePBX 2.6 with Asterisk 1.4.26 running on CentOS 5.4 (Red Hat) myself. When I find some extra time I will create virtual machines and install Trixbox 2.6 and 2.8 and AsteriskNOW to test it with FreePBX on those distros.

Rushing To Finish

I’ve been coding the updates to this mod pretty much for my own usage but I’m trying to be a good coder so I’m avoiding hardcoding paths and trying to use Asterisk system variables as much as possible, at least the ones I know about from FreePBX exposure.

Upgrade Code

I still have to write code to do clean-upgrades from 1.0/1.1, to 1.2, and soon to be 1.3 but I haven’t had enough time to do that since I’m rushing to write my code to implement a IVR menu for my friend’s business who’s been waiting almost 2-months for it.

Logic Rewrite for AGI Removal

This rewrite will remove the AGI script since it’s unnecessary and I’m trying to style this module after the Announcement and IVR modules already in FreePBX 2.6 so that the layout, design, and options are all the same so that they could be familiar to everyone. I’m also going to be borrowing PHP code from those modules to make this module as close in design to the core FreePBX modules as possible so that if this module becomes popular it might be taken up into the main FreePBX development trunk later.

Coding Style

While I’m mentioning all of this I don’t really like some of the some of the coding styles used in the FreePBX modules, especially the usage of all “lowercasevariablenames” instead of “camelCase” variable names that are easier to read. Also the lack of horizontal and vertical white space and absolutely no section or action comments make reading and using the core module code quite difficult for a PHP newbie like myself. I’ll at least keep the logic the same as the core modules so that if there is some need to strip comments and collapse white space to get the module into the main trunk that can be done with a few RegExes easily later.

All this stuff is coming just need to find time to do it in. It’s an interesting project to me since it involves Linux exposure and PHP coding and that’s something that I’ve been meaning to get into for over 10-years, and now I’m doing it.

kevinb · November 17, 2009, 9:47pm

Well I for one am glad you’re doing it. It’s something I could definately use and I’m sure others will as well! I’m running Pbx in a flash 32bit here, Centos 5.3, Asterisk 1.4.21.2 with FreePBX 2.5.2.2

Thanks!

Kevin

JakFrost · November 19, 2009, 2:52am

I’ll include PBX in a Flash to my list of VMs to create and also I’ll have to test on FreePBX 2.5 since I’m using 2.6 now.

Rushed Rewrite Finished, So Much More To Do

I’ve been rushing in the last three days to get this module updated to allow simple skipping ability to the Text To Speech entries. I had to remove the whole AGI script and move everything into “functions.inc.php” and “page.texttospeech.php” but I did it. Now the new module uses Asterik’s native Playback() for no-skip and Background() for skip ability.

I was trying to integrate additional features from the Announcement module into mine but I was only able to implement Allow Skip and Don’t Answer Channel features. I ignored the Repeat Message function since I didn’t see it as very useful, but I might re-add it after adding it and removing it because of debugging. I tried very hard to get the Return to IVR feature to work since I wanted the Text To Speech entries to be used inside IVR menus as ends but I dropped that feature also during debugging and also because I couldn’t get it to work properly in time due to my limited, or non-existent, knowledge of Asterisk Dialplans.

I really wanted to make the Allow Skip feature work with the “Background($file,$options,$language,$context)” application by using $context so that if the Text To Speech entry is used to say the options of an IVR menu, the user can press one of the buttons and he will skip the rest of the recording and jump to the extension in the new “$context”. I tried to make it work but it just didn’t work for me after hours of debugging so unfortunately I just had to drop that feature. This was the most valuable and important feature I wanted to implement, to let the user skip and jump to a new context and extension without having to finish the playback but I just couldn’t do it. I might attempt again when I have some more time and see if I can get something to work after I learn more about Asterisk Dialplan programming.

For now at least there is no more AGI script necessary. The Skip to IVR Menu option can still be used with this module but it has to be done through the creation of the sound file, then a System Recording, then Announcement, and finally added to the IVR menu as an Announcement at the beginning of the menu or as one of the Destinations.

No Clean Upgrades

I would really like to implement a clean upgrade procedure for this module so that it can alter the database and rename any files required but right now there isn’t enough time to do that so these next few versions are not cleanly upgradable. Especially the new 1.3.0.0 that changes the database format.

You will have to uninstall 1.2.0.1 or earlier and install this one clean because the “install.sql” file won’t be executed to update the database and I didn’t include the “ALTER” code into “install.php” yet.

If I can get enough time to write the upgrade code and maybe even go back and implement Return to IVR and Skip to Menu Extension option.

I’m going to release it as 1.3.0.0 for now as a Work-In-Progress version.

JakFrost · November 19, 2009, 9:33pm

Note: This version improves the usage even more. You can now use Text To Speech to say the options of an IVR menu and allow Direct Dialing to those options. Text To Speech can also be used as a destination from an IVR to an option to say a message. In this case the Text To Speech entry should set the Destination back to the Text To Speech entry that says the menu options. It should not use the Return to IVR option otherwise the menu options will not be played to the user because the system will jump directly to the IVR to await a key press without saying the options. Just remember to chain Text To Speech menu options, to IVR, to Text To Speech message, to Text To Speech menu options.

Add: Added the Wait Before and After options to allow adding a pause in the speech for the number of seconds required.

Add: Added the Direct Dial to allow direct access to destination IVR extension.

Add: Added the Return to IVR option to allow return to a calling IVR when finished.

Limit: Previous version uninstall required since database schema is changed again. I didn’t code the upgrade code yet since these are Work-In-Progress versions.

JakFrost · November 19, 2009, 9:40pm

Now that this new version is out you can choose how to implement Text To Speech entries in your IVR menu.

You can use the classic Announcement version:

Create: System Recordings (using “texttospeech/” recording)

Announcement (Welcome message) ->
IVR (Annoucement says Options)
Destination: Announcement (Message) ->
Announcement (Message)
Destination: Return to IVR ->
…

Or the new Text To Speech version:

Text To Speech (Welcome message) ->
Text To Speech (Menu options) ->
IVR
Destination: Text To Speech (Message) ->
Text To Speech (Message)
Destination: Text To Speech (Menu options)
…

You can also use the newly added Wait before and after options to put in pauses so avoid run-on talking.