I have a Fortigate using SD-WAN (which automatically switches between 2 Internet Providers when 1 goes down).
The private LAN-side IP addresses to the FreePBX server never change. The only thing that changes is the WAN side ISP IP address.
When Fortigate switches to the other ISP, of course FreePBX won’t be registered until the next SIP registration period, which is fine.
However, when that time comes, FreePBX can’t register. It says “Rejected” in the pjsip show registrations. Then it tries again in 60 seconds. It tries over and over and over and can’t register.
During that time, I can ping the online sip provider from the FreePBX server, and I can telnet sip.provider 5061 and get a connection… so the connection is good through the network. It’s just something in FreePBX won’t let it connect.
Interestingly, as soon as I reboot the FreePBX (or fwconsole reload), the registration is successful. That’s the only thing I do to “fix” it. I restart FreePBX.
So, what is it about Asterisk that is causing the registration to fail when my external ISP ip address changes? FreePBX doesn’t even see this address, and the LAN ip address of FreePBX is never affected at all. A reboot or reload fixes it immediately every time though.
You don’t have an autonomous system number, so your old IP address is not valid via the new ISP. You are going to have to wait for DNS to update and both the provider and Asterisk to refresh their domain name and reverse DNS lookups before everything is consistent.
If you are using TCP, or TLS my suspicion is that what is actually breaking is that provider’s reverse DNS lookup on you. If using TLS that may cause it to fail to recognize you as the subject of the SSL certificate. If you are using plain TCP, on a non standard port, it might still be doing an authentication check on the REGISTER.
It isn’t a DNS issue. I’m not using any dns/hostnames for my side at all. This is an outbound registration, so it isn’t required for the online provider to initiate connections back to me. They simply use the pre-existing connection created by the outbound registration itself.
I am using TLS, port 5061, but again I am not using any dns or hostnames. Also, it is the reboot of my pbx that fixes it, not anything happening on the sip provider’s side and not waiting for a certain amount of time. If I reboot immediately after the switchover it works immediately. If I wait 10 minutes after the switchover without rebooting, it still doesn’t work until a reboot.
I think the issue is that Asterisk actually cannot contact the sip providers server at all. I’m thinking that somehow Asterisk is trying to keep the connection/session open to the provider, and when Asterisk attempts to re-register it doesn’t create a new session. It uses the existing one. It needs to actually stop the connection and start a new connection. That would explain why I am able to ping/telnet the sip provider because ping and telnet create new sessions each time they run. That would also explain why a simple asterisk restart fixes it as well… as a restart would clear out all sessions.
That all being said, I’m still not sure of a good automatic workaround.
That makes sense, but, if I remember correctly, it is what RFC 3261 says it should do for TCP based connections.
I think you have to fix this in the NAT router or the switch, by getting it to return destination unreachable. One possibility is something downstream is returning an ICMP, but is meeting up with a router, etc. which is paranoid about ICMP, so the ICMP is never getting back to you.
I suppose it also possible that the switch is changing the source IP address, in which case the provider will see a rogue packet, although they should TCP RESET it in that case, so that is maybe less likely than a lost ICMP.
Obviously the cleanest solution is proper multi-homing, with your own ASN and border gateway protocol, but you may be a too small user to justify that.
Definitely too small for ASN. I’ll do a packet capture to see exactly what is happening with the registrations at the nat router.