Inbound phone SIP registrations fail with with flaky WAN link

ecarlseen · August 28, 2018, 4:59pm

We have a WAN link that we’ve been having issues with (still fighting with the carriers involved), but it’s creating a problem with FreePBX - inbound SIP phone registrations start failing on the extensions beyond that WAN link. Rebooting the server or invoking an HA failover clears the issue. SIP reload doesn’t help. I’m wondering if this is an Asterisk bug or if there’s a FreePBX function that’s causing bouncy connections to start being rejected (FreePBX firewall is disabled). I’m not seeing anything obvious in the logs or using the Asterisk SIP debug commands. Currently running FreePBX 13.0.195.4 from the official distro.

BlazeStudios · August 29, 2018, 12:44pm

So you’re saying that external users are connecting their SIP phones to your FreePBX system via the Internet and the Internet connection where the PBX resides is having a bunch of issues which in turn are causing those said external phones to have issues connecting to the PBX?

When you say “start failing”, what do you mean by that? They can no longer REGISTER or they are not getting calls?

And no there is no bug or feature that causes “bouncy” connections to be rejected. You need to provide more details for actual actionable answers.

cynjut · August 29, 2018, 3:29pm

There isn’t a bug, but we have seen the firewall lock addresses out when the connections bounce like this. Granted, it’s been a while since anyone has reported it, so it could be fixed (or contingent on UCP, perhaps) but we have heard reports where phones that connect and disconnect more than some threshold over a given period of time get locked out of the system by the FreePBX firewall.

This statement, however, is completely true. You’re (@ecarlseen) making us guess about your guessing, which is an enormous waste of everyone’s time. Give us some details and let us help you solve the problem (if it is, in fact solvable within the FreePBX universe) once and for all.

BlazeStudios · August 29, 2018, 4:20pm

I would agree with your assestment about the System Firewall’s thresholds. I’ve seen that problem before as well. But based on the fact the OP stated it wasn’t enabled, I decided not to comment on scenarios that wouldn’t apply.

ecarlseen · August 29, 2018, 5:05pm

They’re connected over an MPLS link, not the Internet but that should be irrelevant.

Specifically, Asterisk receives SIP registration packets from the phones and responds with 401/unauthorized. We know the authentication settings on the phones are correct because as soon as Asterisk is completely restarted (not a configuration reload) or an HA failover initiated the registrations go back to functioning normally. Cranking up the SIP debug levels on the Asterisk console doesn’t shed any additional light on the situation - there’s nothing that says “Oh, hey, we hit this odd rule and we’re rejecting” - it just rejects as if the authentication information was actually incorrect.

We’ll keep looking into it. I was mainly wondering if there was some odd setting that I was unaware of that might cause this.

BlazeStudios · August 29, 2018, 5:39pm

The SIP authentication process should have a 401 in it. It’s what happens after that 401 that is important.

Initial REGISTER/INVITEs do not contain any Auth headers in them. The SIP server should respond with a 401 Unauthorized reply. This is standard and normal behavior.

The endpoint should send a NEW request (the CSeq will increment by +1) this time it will have the Auth headers which the system will check against. If they are correct, the system will save the location and send back a 200 OK. If they are incorrect the system will send back 403 Forbidden.

What you need to look at is a SIP debug (sip set debug on OR pjsip set logger on) and look at the REGISTER attempts. Keep track of the CSeq: header and that number. If you see something like

REGISTER from device (w CSeq: 1)
401 Unauthorized from PBX
REGISTER from device (w CSeq: 1)

This is an immediate problem. That means the device did NOT receive the 401. It is actually re-sending the REGISTER again. That would be an issue within the network.

If you see:
REGISTER from device (w CSeq: 1)
401 Unauthorized from PBX
REGISTER from device (w CSeq: 2)

The next reply should be a 403 or a 200, if you’re seeing a 403 then something is up…if you are seeing a 200 OK and there are issues, then it’s possible the endpoint didn’t get the 200 OK? Again, that would point back to network issues.

— Post Follow up----

After the initial REGISTER from the device happens, all sequential follow up Re-REGISTERs will have Auth headers in them, it will have the nonce used by the previous registration. You should still see the same 401 Unauthorized challenge and CSeq go up. The system will challenge due to the nonce being expired/old and cause a REGISTER with fresh Auth headers will the new nonce.

sorvani · August 29, 2018, 6:43pm

Happened to me last month when my ISP went flaky. The issue is not just the firewall. The Intrusion Detection system flagged me. Previously the firewall had flagged me when this happened.

I have no idea why the ID system does not parse the whitelisted addresses from the firewall module.
Because my IP by DNS name was already marked trusted in the firewall module.

system · August 29, 2019, 6:43pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.