SIP trunk failover strategies

So I am looking for some configuration strategies that some of you may have implemented to handle automatic trunk fail over to a secondary internet connection.

I’ve been trying to implement something that would work for a small business for a couple of years now and I just can’t figure out anything that works 100% of the time.

All of our PBXs are hardware boxes running on the client’s site (Asterisk 16) using their internet infrastructure with a primary and secondary internet connection running through a firewall (usually a Ubiquity USG) that handles the fail over in case of the primary internet connection going down. The actual fail over works great every time there is an issue with the primary internet connection.

The SIP trunk provider that we like to use is Vitelity.

When setting up a trunk we prefer to use IP authentication but the limitation here is that if the primary connection fails the IP address for the connection changes by definition and the External IP setting in Asterisk SIP Settings is not valid any longer so calls don’t really work.

I know there are some workarounds to get this IP address updated automatically in the settings through some scripts but I would prefer if we didn’t have to modify/maintain a weird one off aspect to the systems that we implement.

So that leaves us with trunk registrations. This would be fantastic if registrations weren’t as flaky as they are. Most of the time after an internet outage is over and we are able to connect back into the system we find the trunk in a weird state saying that the registration request has been sent but it never completed it for some reason so incoming and outgoing calls stop working completely. We have to usually do a manual core reload to get the trunk to register and work again properly even after the primary connection is back up and going.

Any better strategies out there on how to implement a fail over configuration in FreePBX that will work all of the time with minimal weird workarounds and minimal manual intervention?

PS. On a somewhat related note, I wish I had the coding skills to try and implement a bit better Asterisk SNMP support. I would love it if Asterisk would report the trunk state to SNMP so that they can be monitored easier for outages or latency issues.

Honestly, there are lots of hackish things that you can do to solve this particular problem, but SD-WAN is by far the best modern way of handling connections across multiple/redundant connections. It’s not necessarily the cheapest, but it certainly does the job well.

BGP (not hackish) is also effective if you can fit it into your network infrastructure

I would love it if SD-WAN was an option for some of our clients but they are usually the size where a second internet connection is the most they are able to and willing to do. SD-WAN is usually way out of the price range that they are willing and able to afford.

Or is there an affordable option that I am just not aware of that would be a good fit for a small business?

Dicko,

Maybe there is something that I don’t understand about how BGP works and gets implemented but I am not sure how this would apply for my use case.

Maybe I wasn’t clear enough about the use case so I’ll try to elaborate a bit more.

We use these for a small business (usually between 10 to 15 extensions) that have two internet connections coming into their office from two separate ISPs that go into a single firewall that then handles the fail over to a secondary connection in case of an outage of the primary connection.

The fail over part usually works fantastic. It seems that it’s the idiosyncrasies of SIP and NAT that keep me from being able to implement a simple fail over for the trunks as well.

Very simply BGP (border gateway protocol) allows, with the collusion of your various network providers, an update of the internet route to your static IP address in real time if any one route fails. You always have the same IP so static ip routes always get to your PBX.

Man, I so wish that was an option.

These are very inexpensive connections and don’t really support that level of integration. We are talking about people spending maybe between $50 to $100 each month for their phone service and another $50 to $100 for each of their internet connections.

And I completely appreciate that I am probably at a price range where you just don’t get redundancy.

It just all feels like I am almost there but there are just a few hurdles about how this all fits together that I don’t get to solve at this price range without spending a whole lot more money on top just to get redundancy.

Most good VSP will allow you to failover to another IP address if the Primary disappears, also most VOIP phones allow a secondary server as a failover if the primary disappears

Voyant does as well.

The problem really is not on the trunk provider side. It’s that because of NAT (with the phone system sitting behind a firewall). The external IP address needs to be manually set under “Asterisk SIP settings” and there isn’t a non custom script hacky way to make that change when the connection fails over to the secondary internet connection in FreePBX/Asterisk.

The phones sit on the same LAN as the phone system so there is no need for the phones to fail over to a new PBX with a different IP.

Really the case I am trying to solve for is if we lose a connection from our primary ISP between our on prem PBX to our VSP I’d like for the PBX to automatically and seamlessly just work over the backup connection that the firewall is already handling the fail over for just fine.

Like even being able to set the external IP in Asterisk for each trunk individually I think would be even a potential solution here but alas, it is not to be so.

PS. Hiding that previous statement as maybe that wouldn’t be a good solution as there still wouldn’t be a way for the system to know that the firewall has failed the internet over to the secondary connection. But this is why I feel like I am super close to the solution, yet so so far.

Personally at $6 a month, I find a cloud solution with its always (almost) available public IP just fine, but yes, if you are behind a NAT it will need you to do something hacky because FreePBX is in itself just a PBX and not a router. So that hackyness will always be up to you to monitor and adjust to your networks public availability.

But if you have a flaky network then perhaps dynamic DNS and registration rather than IP would be a better choice for you and easier to manage.

You are probably right about having the PBX hosted instead of on site. There are just some things that are nice about having it local to the network but I guess it’s probably a trade off between that and redundancy.

Vitelity has failover IP for IP authenticaton.
The public ip in freepbx can be set to a dynamicdns name. Problem solved?

First, I didn’t know you can use a FQDN there, thanks for that tip.

Second I don’t think using DDNS is a fast enough solution for this. I need it to fail over correctly instantaneously and not wait for the various systems to work through different timeouts until the new IP value is used in those settings.

I’m not aware of any VSP’s accepting FQDN’s for IP authentication.

Such a mechanism Is too open to fraud. DNS is notoriously flaky and vulnerable to hijacking, IP addresses not so much for obvious reasons :wink:

I don’t know anything about Vitelity functions or cost as I use VOIP.MS thus I’m just tossing this out based on what I can do with another service provider.

Why not just get two trunks, yes both will have a unique numbers (DID). Each trunk can still register with a static IP [I’m assuming both ISP connects have fixed IP’s]. From the FreePBX side set two trunks as a potential outbound route. From the SIP provider if your Primary DID goes down have it fail-over to the secondary DID. For outbound I can set my secondary DID to show/display my primary DID which I set at the SIP service provider. Not sure if this made a lot of sense and would work in your use case.

Thanks,

Heh, I don’t think I am explaining myself well.

The problem is not with the VSP as they support fail over to secondary IPs. The problem is really that Asterisk requires you to manually set your external IP address value in its settings because of NAT, when it sends those packets out it sends whatever IP is set in the settings. If that’s any different from the IP address that you are currently on (depending on the state of the connection that you are currently using to get out to the internet) SIP breaks.

You are correct, asterisk atsreisk only knows where to send stuff if you tell it, if it changes you need to retell it.

I am not actually sure what it uses that setting for. It’s certainly not to tell it where to send stuff to. I think it’s for the other endpoint to know where to send stuff back?

And I am pretty damn sure what asterisk uses for routing, you need either ‘externip=1.2.3.4’ or ‘externhost=your.dns.server’

‘externip=1.2.3.4’ will not be “resolved” by dns, "externhost=a.b.c’ will be, but if you use externhost, it is not asterisk that can do that, it is your network setup, you will need a ‘helper’ (dynamic dns)

I guess I don’t completely understand. Are you talking about the SIP Settings - External Address value?

I haven’t set that value for over four years and I don’t have a problem receiving calls or dialing out. From my understanding when I register with VOIP.MS that link basically becomes a controller channel where-as my PBX open ports to VOIP.MS for making and receiving calls. My firewall doesn’t forward any ports back to the PBX. I don’t have any extensions outside my PBX local network so maybe that’s why I’m not having an issue.

Thanks,