Lost SIP Trunking Registration?

So typically the small handful of times we lose connectivity to our SIP trunking provider, I see events like this logged.

[2019-10-12 21:22:26] NOTICE[28800] chan_sip.c: Peer '5264232744GW2' is now UNREACHABLE!  Last qualify: 39
[2019-10-12 21:22:30] NOTICE[28800] chan_sip.c: Peer '5294877463GW2' is now UNREACHABLE!  Last qualify: 40
[2019-10-12 21:22:32] NOTICE[28800] chan_sip.c: Peer '5294877463GW1' is now UNREACHABLE!  Last qualify: 26
[2019-10-12 21:22:36] NOTICE[28800] chan_sip.c: Peer '5264232744GW2' is now Reachable. (154ms / 2000ms)
[2019-10-12 21:22:40] NOTICE[28800] chan_sip.c: Peer '5294877463GW2' is now Reachable. (39ms / 2000ms)
[2019-10-12 21:22:52] NOTICE[28800] chan_sip.c: Peer '5264232744GW1' is now UNREACHABLE!  Last qualify: 26

Today we had a hiccup. My SIP trunking provider sends me an automated SMS message informing me that SIP trunking registration has been lost. When I immediately hopped into their web portal, it likewise shows unregistered. Then after a couple of seconds we registered again.

When I looked at the Asterisk logs, I didn’t see any of the typical lost registration events like above. All I saw around the time of the blip was the following:

[2020-03-05 13:06:56] VERBOSE[3605][C-00003612] pbx.c: Executing [[email protected]:19] Set("SIP/5264232744GW1-00006b69", "FAXOPT(faxdetect)=yes") in new stack
[2020-03-05 13:06:56] VERBOSE[3605][C-00003612] pbx.c: Executing [[email protected]:20] Answer("SIP/5264232744GW1-00006b69", "") in new stack
[2020-03-05 13:06:57] VERBOSE[3605][C-00003612] pbx.c: Executing [[email protected]:21] Wait("SIP/5264232744GW1-00006b69", "4") in new stack
[2020-03-05 13:07:01] WARNING[3605][C-00003612] channel.c: Exceptionally long queue length queuing to SIP/5264232744GW1-00006b69
[2020-03-05 13:07:01] WARNING[3605][C-00003612] channel.c: Exceptionally long queue length queuing to SIP/5264232744GW1-00006b69
[2020-03-05 13:07:01] WARNING[3605][C-00003612] channel.c: Exceptionally long queue length queuing to SIP/5264232744GW1-00006b69
[2020-03-05 13:07:01] WARNING[3605][C-00003612] channel.c: Exceptionally long queue length queuing to SIP/5264232744GW1-00006b69

That one warning repeated dozens of times (same timestamp) until the next line in the workflow finally kicked in. Specifically, hitting the daynite call flow control.

[2020-03-05 13:07:01] VERBOSE[3605][C-00003612] pbx.c: Executing [[email protected]:22] Goto("SIP/5264232744GW1-00006b69", "app-daynight,0,1") in new stack
[2020-03-05 13:07:01] VERBOSE[3605][C-00003612] pbx_builtins.c: Goto (app-daynight,0,1)
[2020-03-05 13:07:01] VERBOSE[3605][C-00003612] pbx.c: Executing [[email protected]:1] GotoIf("SIP/5264232744GW1-00006b69", "0?timeconditions,2,1:timeconditions,1,1") in new stack
[2020-03-05 13:07:01] VERBOSE[3605][C-00003612] pbx_builtins.c: Goto (timeconditions,1,1)

Looking at the utilization around this time, it’s not like FreePBX was being slammed with concurrent calls, heavy CPU load, etc. First time I recall seeing this. Any suggestions?

I think its sleepy thread issues under-the-hood just not waking up when they should.

But the issue is complicated by separate “peers” which are the same remote IP.

Maybe limit the qualifies to only one of these peers ?

This way you aren’t re-pinging the same IP at the same time for N peers…

There are a total of two SIP trunks, both associated with one SIP provider. Each trunk registers against a gateway 1 DNS name and gateway 2 DNS name. So there is some duplication. Even if I eliminated the gateway 2 DNS name, there still would be a dupe in that both SIP trunks are registering against the same gateway 1 DNS name?

Kind of… chan_sip uses a single thread for UDP traffic, scheduled items, and some other things. If handling of something in it is blocked, then it will ripple and delay other things such as outbound registrations and the handling of SIP traffic.

1 Like

My question I guess really is if I should change anything in terms of my SIP trunk configurations. This was what SIP.US automatically provisioned in FreePBX through their integrated module. I would think it beneficial to have some form of redundancy in terms of each trunk first looking for GW1, then looking for GW2, right?

Drop some of the qualifies and other “extra” SIP signaling packets eg. MWI polling, so you don’t work the thread so hard ?

Move to static IPs so you don’t need to worry about traffic state (as much) in your firewall ?

Switch to PJSIP ? :wink:

Also how many phones ?

@gregarican - the thing about Chan-SIP is that it can only point at one IP ADDRESS. That means that, even if the name resolves to a dozen addresses, it’s only ever going to use one. PJ-SIP doesn’t have this restriction. Similarly, PJ-SIP can point to two (or considerably more) different IP addresses and use as many of them as you’d care to set up. That’s one of the (many) reasons that PJ-SIP is preferred over Chan-SIP.

On a personal config setup note - I never use FQDNs. I always use IP addresses, if only to avoid the problems that a flaky DNS Server can cause.

Given the above, I’d say “Yes, switch your config to PJ-SIP and point the system at all of the IP addresses associated with your provider’s inbound addresses.”

Thanks for all of the clarification! I will look to implement these changes soon. In a little over 2 years’ worth of rolling out FreePBX to our site locations, so far this was the only hiccup I’ve seen of this nature. So all in all I’m more than pleased and think that this recommended configuration change can only further solidify things. Appreciate the help!

While I’m behind the move to PJSIP idea, understand it might not be the solution for this problem. When you register with SIP.us you are giving them your current location details (User:IP:port) they will hold that registration as long as your Expire setting tells them (or if they override to their own).

Now, when you see Asterisk telling you that the trunk is UNREACHABLE that means you are qualifying the trunk and Asterisk is not getting a response to the keepalive messages it is sending to SIP.us.

When SIP.us says you are no longer registered that means the location entry they had for you was dropped. That could be due to the Expire time being reached and there isn’t a new REGISTER to refresh it. It could also means they are trying to do keepalives (much like Asterisk is to them) and they are not getting responses. As a result the location is considered “dead” and dropped.

Is this PBX behind NAT?

The FreePBX has an internal, private LAN IP. I don’t expose it as a public service via NAT for inbound access on my firewall.

Then, yes the PBX is behind NAT and the symptoms you are describing are 100% NAT symptoms. You may have to make NAT rules in the firewall for this to work properly because it is sounding like the NAT hole made from the outbound request (Registration) is being lost in the router and this is why the PBX isn’t getting keepalive responses back or keepalive requests from the provider.

What type of router is being used?

My firewall is a Cisco ASA. The router is a Cisco 1921, managed by AT&T. This was the only instance I see where we have had an issue in 2+ years, so I’m not terribly concerned. But would like to clean things up and be running optimally. Everyone’s suggestions are certainly helpful in that regard!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.