Keeping FreePBX servers in sync

warm-spare
freepbx
Tags: #<Tag:0x00007f702a6d06e0> #<Tag:0x00007f702a6d0528>

(Greg Militello) #1

I am looking to add some resilience to my FreePBX server. Right now I am running FreePBX in AWS with approximately 200 users connecting to it daily. Right now we are able to handle this load easily, but as our company grows I am worried about an outage causing a major disruption.

I would like to add a warm-spare, but I don’t think that add a warm spare would solve the issue in case the primary server gets overloaded. The warm spare would then get overloaded with a similar call volume and fail.

During Astricon I watched the talk on Kamailio and think that that could provide the solution to increase our resilitancy. I could then use Kamailio as a load balancer in front of a group FreePBX servers and as the servers get overloaded I could use AWS Cloudwatch to spin up a new server and have that register with Kamailio. The one thing I am confused about is how we would be able to keep all these FreePBX servers in sync. How can I ensure that the changes on one server would propagate to other FreePBX servers behind my Kamailio instance?

Has anyone solved this issue before or have any suggestions?


(Itzik) #2

Hi Greg,

Asterisk can easily handle triple the amount of calls you mentioned with the right resources (at least from what I saw in my experience)

The only time we had Asterisk crashing [on a server that had enough resources] was when a client argued that they need full access to develop ARI/AMI apps and refused to have our team review the applications before deploying, and as you can imagine, Asterisk crashed, a couple of times, everyone was mad…

I would still encourage you to setup a warm spare, because there are scenarios where for some reason there’s an issue on the PBX which drains all the resources and is not necessarily related to Asterisk, but it can cause channels to hang or act weird.
In such a cases, you’d want to run core stop gracefully which will result all new calls to resume on the warm spare but not kill active calls, investigate the issue properly without interrupting any calls, and then start Asterisk again when the issue is resolved.

We have clients who their primary server is on site and works just fine, but there is that one outage which lasts for a few hours that happens to everyone, and then is when the warm spare pays off 100%

So back to your question, if you watched Astricon, there was a nice talk from Jöran Vinzens on that specific issue, how they built a call controller with ARI to handle calls on multiple servers.

When load balancing, both servers need to know the entire time the device/extension state, so it can properly route calls when call waiting is enabled/disabled, maintain the queue order, priority and ring strategy. Etc etc etc.

AFAIK with PJSIP, there’s some functionality to publish a channel and device state and maybe some other stuff.
See references: https://wiki.asterisk.org/wiki/display/AST/Asterisk+16+Configuration_res_pjsip_publish_asterisk
https://wiki.asterisk.org/wiki/display/AST/Publishing+Extension+State

I am not sure tho if it’s doable with PJSIP publish. The other way to do it, is by having a fourth and fifth server, one used for ARI which will be the call controller, and one as an ARI proxy

So to answer your question, something like this doesn’t exist out of the box and will require a lot of work to build something like this.

At AstriDevCon we touched a bit the point of interest in Media/Channel recovery/failover which exists in FreeSWITCH and is doable with Asterisk.
If I understand correctly: the way it works is, that both servers know of the the channel states, RTP streams and probably more info at all times and once the primary dies the secondary recovers it by re-invinting the channels and media.
So if/once that functionality is built into Asterisk by default, you’d probably be able to make use of these tools to have a load balancing setup syncing the necessary information so it can route calls properly.

I hope that helps a bit.


#3

What do you mean by that?

200 extensions is not a big system; quite a few have more than 1000.

200 calls per day is nothing; most systems are handling more than that.

But if you mean 200 concurrent calls, that’s a big system. If it’s just connecting calls, it’s not hard. But if it’s recording, transcoding, encrypting, etc., careful design is required and may not be possible on a single Asterisk, regardless of how many cores and how much memory it has.


#4

Quite some time ago I looked into this quite deeply. The ‘state’ of asterisk is now maintained in the asteriskdb sqlite3 table (it was Berkely many years ago, and it worked then :wink: ) .

The way that the database functions are written are fundamentally flawed for this scenario, they are multi-thread safe, but not multi-user safe.

The corosync module exposed some state but not enough and was never completed, even trying to read the database after trying to lock it, (the locks are ignored), sooner or later causes an unrecoverable state of the database.

So syncing active call states is IMHO ‘doomed’ until such states are exposed by some asterisk based API as yet unknown to me.

Freeswitch/fusionpbx and Kamailio are likely better platforms for this scenario.

JM2CWAE