Diverse Datacenters for a FreePBX HA node

WB3FFV · June 21, 2014, 4:56am

I haven’t really worked with the HA module yet for FreePBX Distro, but I am wondering if there are any issues with deploying one of the servers in one datacenter, and the other server in a different datacenter? Now before anyone says, you can’t do that, you don’t have the bandwidth to do it, let me state that I have multiple full GigE links interconnecting the two datacenters, so there is LAN speed capacity between the diverse locations.

I am looking to build up a diverse and fault tolerant PBX implementation, so that is one location falls off the map, the other just keeps on running. Now days with (pick your disaster), it makes good sense to create a network that can keep running at the loss of a single location. I have just been asked to make the PBX’s a part of that disaster plan for a couple people, so want to see if there is any reason I can’t just do this with the servers in diverse datacenters…

-Howard

SkykingOH · June 21, 2014, 5:38am

I am not so much concerned with speed as topology. Can you put the servers in the same VLAN?

dicko · June 21, 2014, 6:12am

Exactly as SK says, layer 2 is one thing, layer 3 really needs BGP in place to properly locate your “floating” IP to the rest of the world, do you have that? Maybe add a couple of SIP proxies in both places in front of your HA boxen to better handle that possible SPOF and have an impeccable and authoritative DNS service also. Given that then it should work quite well.

WB3FFV · June 21, 2014, 11:38am

Yep, I have a Layer-2 network running between the two locations (actually more than 2), and am using VLAN’s for all kind of services between the locations. So it would be simple to have the HA cluster in the same LAN and IP range if desired. I didn’t see any reason this shouldn’t work, but figured I would toss it out to the group and see if anyone thought of anything that wasn’t coming to mind…

WB3FFV · June 21, 2014, 11:46am

dicko, yea I have a fully meshed IP network, using BGP, and we have out own IP space and announce it to the world as well. So BGP is a non-issue for us, granted I figured the HA cluster would probably run at Layer-2 on a common network segment. I haven’t done it yet, but wanted to build up a linux cluster using most of the tech I believe that the FreePBX solution is utilizing to create the HA environment.

I also thought about the concept of using SIP proxies, and just have the HA cluster in one location, but the first thing I got hit with was why happens if the location both servers are in get squashed, then you have proxies and no servers, so the phones again all go down. So in my mind, the way to avoid that was to place each half of the HA cluster in diverse datacenters.

Does the HA solution from Schmooze allow more than two machines in the cluster? If so, then that could even be expanded to support more than two machines in the fail-over environment, say two for the HA at the main location, and then an additional at a remote datacenter…

dicko · June 21, 2014, 2:12pm

Then I think you have it all in place, The file system chosen for most HA asterisk solutions has traditionally been DRBD in Master/Slave mode which Schmooze chose also, that path is self limiting to two nodes , but there is no reason you can’t use ISCSI or glusterfs (my choice) instead to build a multinode cluster, corosync of course has no problem either way. That way you can keep all the machines running at the same time , I add add asterisk’s realtime corosync state synchronising process and use SIP proxies for registration instead of asterisk’s and use a shared device for your astetcdir and astspooldir and IWFM.

xrobau · June 21, 2014, 8:52pm

[quote=“dicko, post:6, topic:22754”]
there is no reason you can’t use ISCSI or glusterfs (my choice) [/quote]

Don’t use iSCSI for multi-user stuff without running something like GFS on top of it. I’m sure you know that already, just that when I saw that I panicked slightly.

Gluster is really really good, except for when it breaks, at which point it’s terrible and you hate it 8)

All the rest is 100% awesome, but it wasn’t something I felt I could put into a plug-and-play HA service and not spend a week or so training every site up on what to do when it goes wrong.

The FreePBX HA service is pretty much self healing, and because it’s built on very simple blocks, it’s normally easy to fix if it does go bad beyond its own self-healing capabilities.

xrobau · June 21, 2014, 9:01pm

Oh, and to answer the other question, it’s not something we recommend, but it -will- work with a small amount of tweaking (you need to change the DRBD replication protocol)

At the moment, the system calls work like this:

System: I want to write this block to disk
DRBD: OK. I’m writing this block to disk.
Host A: I’ve got the packet, I’m going to write it to disk. I’ll tell you when I’m done.
Host B: I’ve got the packet, I’m going to write it to disk. I’ll tell you when I’m done.
Host A: I’ve written it to disk, but I haven’t flushed it yet, it could still be in RAM
Host B: I’ve written it to disk, but I haven’t flushed it yet, it could still be in RAM
Host A: I’ve flushed it to disk. It’s there, I’m sure of it.
Host B: I’ve flushed it to disk. It’s there, I’m sure of it.
DRBD: I’ve written it to disk.
System: Thanks, I’ll keep going now!

This works fine for machines that have a negligible network latency between them, but when there’s even a minor latency it can really slow things down. This is Protocol C.

You can manually change it to protocol A, which will probably be sufficient, and removes any issues with latency.

–Rob

dicko · June 21, 2014, 9:16pm

I agree about iscsi, no breaks for me with glusterfs in the past 9 months (gestation period ? ) but for best performance, I use the nfs mount against the local machine’s glusterfs node, it won’t disappear and performs much better for small file changes like you get in the astspooldir and astetcdir.

You can also use the same sqlite database in astdbdir for all machines , but a few patches are needed to main/db.c to make it better handle multiple connections for the db “read” functions, as they don’t properly release the lock when exiting, I’m not completely happy with my patches yet though, so no disclosure there yet . Add real-time registration into sqlite and your nearly multi but chan-sip is a bear to get it to play properly with the same locking problem, pjsip looks a little easier but I haven’t spent much time there as yet.

dicko · June 21, 2014, 9:22pm

And of course a third replica for the glusterfs share (maybe a geo-replicated one for “belt and braces” and those other "Oh S&^T! days.)

sspritzer · February 20, 2015, 2:10am

Hi Howard,

I am very interested in using FreePBX HA but having my PBX’s in separate datacenters.

Did you get this sort of deployment to work?

If yes, how did you do it?

I have a fully meshed IP network using BGP as the routing protocol.

Thanks!

-Steve