It’s been about 6 months since the engineering team last posted a blog on FreePBX and let me tell you, it’s been a busy 6 months. Last time we posted was shortly before the Digium/Sangoma merger. You’ll be happy to know that we still have a great team working on FreePBX as we did before the merger. Best of all, now that Digium and Sangoma are one team we are able to work on complex issues together that end up benefiting the open source community as a whole. Performance in FreePBX has always been a complex, multifaceted issue.
Performance problems can stem from many issues such as individual hardware, bad hard disks, corrupt ram, loading too many virtual machines on a single host. Our support team deals with so many different types of issues that it can be hard to track down the root cause at times. However there are also performance improvements that can come directly from FreePBX’s internal open source code. We will document four of these changes in this post.
The Asterisk Gateway Interface, also known as AGI, is a language-independent API for call processing. AGIs allow external scripts to manipulate Asterisk which lets Asterisk perform tasks that would otherwise be difficult or impossible.
In the world of AGIs, there are two types of AGIs actively in use today in Asterisk: AGI() and FastAGI(). Both have been in existence in Asterisk since around version 1.0. In the beginning days of FreePBX all it required was a LAMP stack; Linux, Asterisk, MySQL and PHP. To utilize FastAGI at the time meant you’d have to run a daemon that continually runs in the background listening on TCP port 4573. Additionally you’d need to effectively load all AGIs into this server at run time. Back in 2007 this was a huge mountain to climb so instead the developers of FreePBX (then known as Asterisk Management Portal) decided to use regular AGI and so was born dialparties.agi, one of the first agi php scripts to be used in FreePBX. Standard AGI is the simplest, and most widely used form of AGI. Standard AGI scripts run on the local PBX and communicate with Asterisk through socket descriptors (namely STDIN and STDOUT). The standard AGI allows for usage of all AGI commands, and is what this article will be discussing.
Fast AGI is the AGI over TCP sockets protocol. It allows for all AGI functionality except EAGI, and is provided as a solution to developers who need to run resource intensive AGI programs. By running the bulk of the AGI logic on another server, the Asterisk server itself can process calls and not worry about handling complex computation for other services. This is the recommended protocol for large applications.
Over the course of the next several years of FreePBX development, more agi scripts were added to FreePBX to do tasks that developers either didn’t want to do in Asterisk directly, or couldn’t get Asterisk to do with straight dialplan. This never caused serious issues in FreePBX until the last three years when Sangoma Support started to notice a significant impact to systems that were running for long periods of time without restarting Asterisk and had a high volume of calls. The main issue users experienced was dropped audio frames in conference rooms. This lead one of our lead engineers to open up a ticket on the Asterisk bug tracker.
With the help of the Asterisk team and our through own investigations we came to a few conclusions. First we setup a test system, against which we generated 250,000 AGI calls. This load would get Asterisk into a broken state. In this broken state the MoH will be extremely choppy, and conferences will also have dropped audio frames. After a restart of Asterisk, the MoH is no longer interrupted and conference bridges are fine.
Josh Colp of Sangoma provided more information to us:
If ConfBridge didn’t send a frame for 100 milliseconds then either every channel was blocked for that period of time (in which case you would see a flood of frames as it catches up) or the timerfd file descriptor didn’t wake up within 20ms as it was told to and instead woke up 100ms later and then resumed at a 20ms interval, or the transcoding process in the mixing loop took 100ms to execute.
Which allowed our engineer to narrow down the problem
Awesome! So, thanks to your input, it seems like we’ve narrowed down where the problem is. For some reason, that timerfd isn’t being woken up correctly when an AGI is launched.
Our engineer Rob was then able to create a few test cases by using timing test inside of Asterisk while generating the 250,000 AGI calls.
So timerfd, for whatever reason, took 5ms longer [than normal]. — Josh Colp
You’ll see that the ‘Active’ machine slipped twice. This, actually, confirms what I posted earlier — running a large number of AGI’s causes timing to slip — Rob Thomas
Rob then determined there was an issue with timerfd and Asterisk on CentOS machines.
It is an issue with the timerfd timing source freezing when asterisk uses the fork() system call to spawn an AGI. The workaround is to use dahdi timing, or, switch to using FastAGI, which does not require a fork(). — Rob Thomas
It was then determined internally that we needed to find a way to convert FreePBX modules into FastAGI. This was original no easy feat. However the code base for FreePBX references the same function call in php to create AGIs: ext_agi() what if we could use that centralized call to call fastAGIs instead:
Next we needed to create an AGI server. What we originally thought we should do was go and update all old AGIs to utilize the fastAGI server but in doing so we might break any non FreePBX Distro instances out there. Instead we came up with a way for FreePBX to be able to do a hybrid way of launching AGIs on the system and communicating with Asterisk.
The idea we had was to create a NodeJS daemon that merely does thin proxying back to Asterisk and launches AGIs just like Asterisk did. Then if this setting is enabled in FreePBX and the daemon is running to change all calls to ext_agi() to use fastagi() instead (with a call to the locally listening server).
The performance gains were nothing short of outstanding. Here is a comment from the customer who has on average 25 to 30 simultaneous calls.
The CPU still climbs from 200% to 400% depending of the lines in use (we average 25 to 30 simultaneous calls). The more lines are in use, the higher the CPU climbs and the sound gets worse.
Pre FreePBX FastAGI Asterisk at 25–30 simultaneous calls, notice the 212% CPU
Packet loss in FreePBX 14 before the implementation of the FastAGI Proxy
After our support team enabled the FastAGI server the customer sent back the following response:
I have monitored the system since last week’s update and everything seems to be working fine. We had a peak of 29 simultaneous calls this morning and the CPU did not go above 60%. At that time, I did a tcpdump and checked the stats in wireshark and I observed only 0.1% packet loss which is the best stat we had so far. At the present time, we have 24 active calls and the CPU is stable between 15% and 20% (some occasional peaks at around 60%). I think it is safe to say that the issue is resolved.
Post FreePBX FastAGI Proxy Implementation in FreePBX 14
You can enable the FastAGI server in open source FreePBX today. In FreePBX 14 it is disabled by default (but you can enable it). Once you enable it in Advanced Settings you will need to do an fwconsole reload & fwconsole restart. While in FreePBX 15 the FastAGI server is enabled by default. FreePBX 14 & 15 versions of the FastAGI server both require the PM2 module to maintain stability and uptime of the FastAGI server.
The advanced setting to enable the FastAGI Server
In addition to the work we performed with FastAGI another sticking point in our cloud environments was the dashboard scheduler cron that runs once a minute. The dashboard scheduler cron is used to display statistics from your system such as the one below.
PBXact is the commercial platform for FreePBX
You can also disable dashboard scheduler in Advanced settings
However for those systems that do not want to disable the collection of system statistics the resulting queries against MySQL can be staggering. As you can see below every minute there is a huge uptick in the amount of queries to hit the MySQL server every minute.
Insert operations per second
Closer view of queries per second of different clients
After doing some initial investigations it was determined that dashboard was keeping data for over 90 days and continually reprocessing that data. So if there was data from a month ago, dashboard would take the data from a month ago and reprocess it every minute. This means you’d be continually averaging the last three months, the last 12 weeks, and the last 90 days. As you can imagine there is no point in reprocessing data from the past because it never changes . Working through the code base we were able to tell the dashboard scheduler to only update and average our most current data groupings. This means that instead of processing data from months ago you are only processing the most recent hour, day, week and month. This results in a huge performance increase to the system and a huge decrease in the amount of SQL queries per second.
Can you tell from this screenshot below when we installed the updated dashboard module?
The new dashboard module is installed at the end of the graph around 13:10
Another look at the data pre and post upgrade of the dashboard module
Data over a wider range
The most impressive stats are on the hosts as a whole. The red lines below mark when our cloud staff did a massive grouping of dashboard upgrades. You can see the drastic results below.
We immediately go from about 1.2k Queries per second to about 0.8k
Another graph on a different host shows about 1.4 kps down to 0.9 kps
These changes are in the latest versions of the Dashboard module for both FreePBX 13, 14 and 15. Respectively 126.96.36.199 and 188.8.131.52.
The next big issue we noticed in regards to performance was how FreePBX’s internal Key Value Store works. The Key Value Store was introduced in FreePBX 12 as a way to store data without maintaining tables or making direct database calls.
The way Key Value Store was designed originally was that every key was stored in the same database table. The use of this feature outgrew its design as we moved from 12 and into versions 13 and 14… In FreePBX 14 we broke out of the “1 table to rule them all” design, and made it so that the Key Value Store engine would create a separate table for each module and a separate table for values greater than 4096 bytes to optimize performance.
But there were more steps we knew we could make. Starting in Framework 184.108.40.206 and Framework 220.127.116.11 the Key Value Store now implements in-memory caching. This means that multiple calls to the same ‘key’ data will not make multiple calls to the database. This will create huge performance gains in code that calls the same key more than once but does not cache it locally, decreasing the amount of SQL Select calls that need to be made. Additionally, this also has the result of speeding up Key Value methods that attempt to get data in other ways, such as getting all the keys for a single module. Since the values are loaded from MySQL at first query then the keys are already in memory.
The last major benefit is that you don’t need to worry about running out of RAM with this new caching system, as anything over 4096 bytes is not stored in RAM at startup. Anything over 4096 bytes is only loaded in RAM on request.
Finally, the biggest change of them all. If you know anything about the inner workings of FreePBX you might have heard of an AGI file called “dialparties.agi”. This file is launched by Asterisk for any ringgroup call or any direct extension call that has Find me/Follow me enabled. It is not considered a resource intensive file, but you can imagine the issues this file causes in terms of FreePBX’s usage of AGIs, especially when you consider there are audio cut outs due to the previously discussed AGI forking issue.
Also, developers have long wanted a way to “splice” (add) dialplan directly into dialparties.agi. Unfortunately, there has never been a way to do this unless one completely forked out the Core module.
All of this changed in FreePBX 15. In FreePBX 15 we made the effort to turn dialparties.agi into straight Asterisk dial plan. This means that every FreePBX 15 system is using the new Dial Plan Dialparties. The dialplan for dialparties can be easily hooked into by any third party module and best of all Asterisk is no longer launching an AGI for about 80% of calls. This is enabled by an advanced setting, however, it is enabled by default so you should never need to disable it.
Dialparties.agi Dialplan setting
As we creep closer to FreePBX 15’s release (we’ll have a blog on the status of that soon) we are proud to be able to provide these open source performance improvements. We discovered, diagnosed, and fixed these issues in our own cloud infrastructure, and are pleased the efforts will be of benefit to the community as a whole. Sangoma is proud to be supporting the FreePBX project and we look forward to providing you with more updates as the year progresses.