Original Source: http://www.freepbx.org/trac/ticket/1012
Below is a little write-up of my experience with FreePBX and with updating the Text To Speech module. There’s a technical question at the end if you just want to skip to the part where you help me solve the last problem. Updated: Problem solution found.
I hope that you like long stories, because this is a doozy…
Background
(A little story of need…)
I’m building a PBX with a Digital Assistant menu system (aka. IVR) for a friend’s business to put a little buffer between the callers and my friend’s who work there to hopefully stem a lot of the repetitive questions such about general topics like Hours, Address, Directions, Services and to also provide more specific answers for some detailed services regarding the business itself that people always end up asking.
I built a great power saving but strong server in a small Mini-ITX format with an Intel Atom 330 (Dual Core 1.6 GHz - x86 or x64) on Intel 945 chipset with 1 Gbit, 2-SATA, 1-PATA, with 1GB DDR2 RAM, and OCZ Agility 30GB SSD for ultra-fast but affordable performance, inside an Apex Mini-ITX case, with slow and quiet 120mm fan and Zalman Fanmate 2 to lower the RPM on the chipset fan and wrote up the details in a Slashdot.org post for ~$300 USD. The system is a champ, very low power usage and completely (no exaggeration) silent at 1-foot distance from the ear.
I paired it with the Digium TDM402B (2 FXO, up to 4 FXO) phone card for PSTN line connectivity and Linksys PAP2T for 2-FXS output during their transition from analog to VoIP phones.
The software side is to use Elastix PBX 1.5.2 based on Asterisk 1.4 with FreePBX 2.5 Web GUI (2.6 after update) running on CentOS 5.4 Linux. Since everything is open source that saves a lot of money and also the hardware requirements for this system are low so it would work great in the little server.
Digital Assistant Menu System
(Buffer from the unwashed masses…
Now, some of the more detailed information in the IVR will change over time and since this is a business my friend’s are pretty busy most of the time and they are also not amazingly computer savvy but they could manager their way around the FreePBX IVR interface if instructed to do so for updating the IVR.
The choice comes down to using recorded human voices for all the IVR menu options and information, some of which can be long and occasionally changing, or using a text-to-speech voice synthesis solution. During my demonstration to them I had to record some voices into the IVR menu to show them how the system would work and I found that I had to re-record some of the longer menu options such as Directions many times just to get them decent. This was a bit of a chore to do correctly and frankly for a real business there is a need for a person with a very good voice to do the recordings, but then the problem of maintenance and updates comes in since the original person will likely be unavailable.
Voice Synthesis Engines
(Turing Test not required…)
So after checking out the voice synthesis engines available for Linux that work with Asterisk I found three, Text2wave, Flite, and Swift based on Cepstral voices. The first two sounded pretty poor but were understandable and the Cepstral voices sounded much better but not perfect. We tried a few voices and we liked the Diane and Allison voices with the Liquid Love effect and Slow speed to bring on a sexy announcer voice. The little web demo sounded good enough to use so the idea started to take shape and people became a little excited.
I also came across Speech Synthesis Markup Language (SSML) Version 1.0 and Using SSML with Cepstral Voices so this opened up ideas on how to improve and customize the voice.
Idea Implementation
(The little things that annoy you…)
So I built the hardware, installed Elastix, did all the OS updates and FreePBX update to 2.6.
After a bit of searching I also found the original TTS 1.0 (Text To Speech) module by XO. I installed it and it worked with the Cepstral swift engine after adding simple a soft-link from “/usr/local/bin/swift” to “/opt/swift/bin/swift” to get the program it into the PATH.
However, I quickly found that the module was made as a proof-of-concept by XO in 2006 and was not really intended for general usage much less production system deployment. It had quite a bunch of limitations, such as no punctuation or SSML markup of any kind in the text field and it was limited to 250-bytes due to the text being stored in the database. Also the module named the saved text and wave files in a illegible fashion using an MD5 sum and did not delete the files after deleting the Text To Speech entries in the GUI leaving a trail of illegible and impossible to clean-up huge Wav files in the system.
Inexperienced With Just About Everything
(Staring into the night, deaf and dumb…)
So at this point I knew that if I was going to make this voice synthesis thing work and be maintainable by my friends who are just general computer users that the module would have to be updated to remove the no-punctuation or markup limitations and make it self-cleaning when deleting or updating entries. The problem was that I knew nothing about Asterisk, FreePBX, MySQL, Apache, and I never wrote anything with PHP, AGI, and practically knew nothing about Linux except how to navigate the folders from the command line.
But like Plato said, “Necessity, who is the mother of invention.” and I had one major case of necessity and need to get this thing done for my friends who were getting a little bit more and more annoyed with the general public calling in the same questions every hour because I told them I could solve their problem quickly and for cheap. Like they say, “Soldiers fight in War not for their country but for their brothers” and I wasn’t going to let my friends take any more bullets to the brain by the general public’s constant calling. I decided to sit down and update the module by learning through Trial By Fire.
I already took my step a day before in the right direction by make a very small modification to the original 1.0 module by adding base64_encode() and base64_decode() methods around the text so that mark up and punctuation can pass through the AGI interface but the rest of the limits on the length of text, and no file management would require a much bigger change.
Cursing At Computer
(My Postman things I have Tourette’s syndrome…)
I spent the next 22-hours straight at my computer learning to modify the module scripts. The process went from total and utter frustration, loud angry cursing at the screen, punching the desk, and screaming at the computer about everything just not working at all to slow understanding of the system, some more cursing, screaming, and hitting, to deeper understanding and mumbling complaints, to low euphoria mixed with complete and utter tiredness and exhaustion as things started to work.
I basically had to update and rewrite most of the scripts in the module to add comments everywhere so that I and other people after me could understand what is happening where. I had to do a major rewrite and clean-up of all the code so that variables are consistently named everywhere and I had to remove all references to illegible acronyms and replace them with legible names, especially the name of the module itself going from cryptic for new users “tts” to easily understandable by everyone “texttospeech”.
I updated the module and solved all the limitations that I wanted and could solve except for a single one that I could not solve and because I ran out of time I had to leave behind unsolved. The updated module now does what I want it to do exactly, and it is very user friendly so that it is usable by my friends and other people who use FreePBX Web GUI and want easy to implement Text To Speech synthesis.
Thanks For The Foundation
(Standing on the shoulders of giants…)
As much as I cursed and screamed I have to say that the original TTS module that XO released as a proof-of-concept was a great piece of work that I built on to get the functionality that I required. I wasn’t very trilled to read and have to modify complicated code in unfamiliar language for me without any comments at all, with no white-space between any of the variable strings or sections of code, and with confusing and inconsistent variable names thrown around like confetti.
However, without the original module it would have taken me a lot more time to get the functionality out of Asterisk that I wanted with the voice synthesis engines and it would probably have required manual editing of the Asterisk Dial Plan “extensions_additional.conf” file by hand for every single IVR menu item and this would be an unmaintainable solution for my friends who wouldn’t learn how to create or modify Asterisk Dial Plans.
I want to thank XO for his work though because it allowed me to learn and to improve on it to get something that I needed.
Integration To Nowhere
(Parts just don’t always fit…)
Frankly I was surprised that when I started this project there was all this talk and documentation on the internet about Asterisk based PBX and articles about Cepstral Text To Speech synthesis on NerdVittles.com. However when it really came down to implementing all this stuff I found that there was no easy integration of Asterisk and voice synthesis without having to resort to Dial Plan modifications by hand.
This seemed strange to me because FreePBX Web GUI was being touted as the real way of administering Asterisk and this was the solution chosen by Trixbox and Elastix distributions. It was disappointing to find that both of these solutions didn’t have an easy to use way through a module of actually implementing useful text to speech voice synthesis in any of the dial plan configurations.
Share The Bounty
(Richard Stallman is a real hero…)
Now that this module is updated and improved without the original limits I want to share it out there with everyone so I will arrange to have it uploaded into the Third Party module repository later, for testing and after the last problem is solved. I’ll upload it to the original Ticket 1012 as version 1.2.
The Last Unsolved Problem
(There’s always one at the end of a project…)
Update: Problem was found with missing newline after last SQL statement line.
The problem that I was left with is that the “install.sql” file never seems to be executed when the module is installed through the FreePBX Module Admin GUI so the “texttospeech” table is never created in the “asterisk” MySQL database. There are no errors shown during the install process on the web output either. However, when I manually execute the “install.sql” script through “mysql” it works just fine and the table is created so it seems like there is nothing wrong with the file or SQL commands. When I uninstall the module the “uninstall.sql” does the executed correctly though and the table is deleted successfully.
I am now wondering if the problem is with my module, the “install.sql” file itself that I cannot find or does FreePBX have a bug somewhere? I’m going to look for the solution to the problem myself but in the mean time I’m hoping that someone more experienced and knowledgeable might offer me some advice or help on why this script just doesn’t execute.
install.sql
CREATE TABLE IF NOT EXISTS `texttospeech`
(
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`name` VARCHAR( 50 ) NOT NULL ,
`engine` VARCHAR( 50 ) ,
`goto` VARCHAR( 50 )
)
ENGINE = MYISAM ;
install.php
[php]
<?php // Asterisk Lib Folder Get if ( ( isset( $amp_conf['ASTVARLIBDIR']) ? $amp_conf['ASTVARLIBDIR'] : '') == '') { $astlib_path = "/var/lib/asterisk"; } else { $astlib_path = $amp_conf['ASTVARLIBDIR']; } // Text To Speech AGI Script Copy if ( copy( $amp_conf['AMPWEBROOT'] . "/admin/modules/texttospeech/agi-bin/texttospeech.agi", $astlib_path . "/agi-bin/texttospeech.agi" ) ) { chmod( $astlib_path . "/agi-bin/texttospeech.agi", 0764 ); } else { echo _( "Text To Speech AGI install failed." ); } ?>[/php]
It is possible that everything is fine with the module but that in my development and installation and uninstallation of dozens of times of this module that I messed up something internally and now the “install.sql” won’t execute for some reason. I’ve already created a uninstall shell script to manually delete the files from the locations below in case I mess up and install a file with broken PHP syntax that causes the FreePBX Web GUI to go blank.
rm -rf /var/www/html/admin/modules/texttospeech
rm -rf /var/www/html/admin/modules/_cache/texttospeech
rm -f /var/lib/asterisk/agi-bin/texttospeech
What could be causing the problem? Do you have any ideas?