I haven’t really worked with the HA module yet for FreePBX Distro, but I am wondering if there are any issues with deploying one of the servers in one datacenter, and the other server in a different datacenter? Now before anyone says, you can’t do that, you don’t have the bandwidth to do it, let me state that I have multiple full GigE links interconnecting the two datacenters, so there is LAN speed capacity between the diverse locations.
I am looking to build up a diverse and fault tolerant PBX implementation, so that is one location falls off the map, the other just keeps on running. Now days with (pick your disaster), it makes good sense to create a network that can keep running at the loss of a single location. I have just been asked to make the PBX’s a part of that disaster plan for a couple people, so want to see if there is any reason I can’t just do this with the servers in diverse datacenters…
Exactly as SK says, layer 2 is one thing, layer 3 really needs BGP in place to properly locate your “floating” IP to the rest of the world, do you have that? Maybe add a couple of SIP proxies in both places in front of your HA boxen to better handle that possible SPOF and have an impeccable and authoritative DNS service also. Given that then it should work quite well.
Yep, I have a Layer-2 network running between the two locations (actually more than 2), and am using VLAN’s for all kind of services between the locations. So it would be simple to have the HA cluster in the same LAN and IP range if desired. I didn’t see any reason this shouldn’t work, but figured I would toss it out to the group and see if anyone thought of anything that wasn’t coming to mind…
dicko, yea I have a fully meshed IP network, using BGP, and we have out own IP space and announce it to the world as well. So BGP is a non-issue for us, granted I figured the HA cluster would probably run at Layer-2 on a common network segment. I haven’t done it yet, but wanted to build up a linux cluster using most of the tech I believe that the FreePBX solution is utilizing to create the HA environment.
I also thought about the concept of using SIP proxies, and just have the HA cluster in one location, but the first thing I got hit with was why happens if the location both servers are in get squashed, then you have proxies and no servers, so the phones again all go down. So in my mind, the way to avoid that was to place each half of the HA cluster in diverse datacenters.
Does the HA solution from Schmooze allow more than two machines in the cluster? If so, then that could even be expanded to support more than two machines in the fail-over environment, say two for the HA at the main location, and then an additional at a remote datacenter…
Then I think you have it all in place, The file system chosen for most HA asterisk solutions has traditionally been DRBD in Master/Slave mode which Schmooze chose also, that path is self limiting to two nodes , but there is no reason you can’t use ISCSI or glusterfs (my choice) instead to build a multinode cluster, corosync of course has no problem either way. That way you can keep all the machines running at the same time , I add add asterisk’s realtime corosync state synchronising process and use SIP proxies for registration instead of asterisk’s and use a shared device for your astetcdir and astspooldir and IWFM.
Oh, and to answer the other question, it’s not something we recommend, but it -will- work with a small amount of tweaking (you need to change the DRBD replication protocol)
At the moment, the system calls work like this:
System: I want to write this block to disk
DRBD: OK. I’m writing this block to disk.
Host A: I’ve got the packet, I’m going to write it to disk. I’ll tell you when I’m done.
Host B: I’ve got the packet, I’m going to write it to disk. I’ll tell you when I’m done.
Host A: I’ve written it to disk, but I haven’t flushed it yet, it could still be in RAM
Host B: I’ve written it to disk, but I haven’t flushed it yet, it could still be in RAM
Host A: I’ve flushed it to disk. It’s there, I’m sure of it.
Host B: I’ve flushed it to disk. It’s there, I’m sure of it.
DRBD: I’ve written it to disk.
System: Thanks, I’ll keep going now!
This works fine for machines that have a negligible network latency between them, but when there’s even a minor latency it can really slow things down. This is Protocol C.
I agree about iscsi, no breaks for me with glusterfs in the past 9 months (gestation period ? ) but for best performance, I use the nfs mount against the local machine’s glusterfs node, it won’t disappear and performs much better for small file changes like you get in the astspooldir and astetcdir.
You can also use the same sqlite database in astdbdir for all machines , but a few patches are needed to main/db.c to make it better handle multiple connections for the db “read” functions, as they don’t properly release the lock when exiting, I’m not completely happy with my patches yet though, so no disclosure there yet . Add real-time registration into sqlite and your nearly multi but chan-sip is a bear to get it to play properly with the same locking problem, pjsip looks a little easier but I haven’t spent much time there as yet.