Sip clients unregistering randomly Stale nonce error

Steveg70 · October 27, 2021, 12:17am

I have about 200 clients registering and a handfull of them seem to drop registration randomly and we get errors on the console showing…

[2021-10-26 23:29:26] NOTICE[7582]: chan_sip.c:17471 check_auth: Correct auth, but based on stale nonce received from ‘“CLIENTA” sip:[email protected];tag=1820798efde56ed6o0’
[2021-10-26 23:29:27] NOTICE[7582]: chan_sip.c:17471 check_auth: Correct auth, but based on stale nonce received from ‘“CLIENTG” <sip:[email protected];tag=56d93b6a93922684o0’
[2021-10-26 23:29:27] NOTICE[7582]: chan_sip.c:17471 check_auth: Correct auth, but based on stale nonce received from ‘“CLIENTW” sip:[email protected];tag=351a7fda11b42c09o0’
[2021-10-26 23:29:29] NOTICE[7582]: chan_sip.c:17471 check_auth: Correct auth, but based on stale nonce received from ‘“CLIENTZ” sip:[email protected];tag=5d0640ccbcf1163bo0’

We have done the usual checked nat is yes, disabled alg and made sure client is authorized though firewall. Client registers fine but after a few minutes looses registration.

We have shortend session expiry to 300, then 60 did not help,
We tried pedantic=no this also did not help

Extensions are using chan_sip

thanks in advance

Stewart1 · October 27, 2021, 1:26am

I’ve not seen this with FreePBX, but have on other systems. The problem arises when the round-trip network delay exceeds the SIP timer T1 value, which defaults to 0.5 seconds.

For example, client sends REGISTER, server responds with 401, sending nonce A. 0.5 s later, client (having not received a response) retransmits REGISTER. Server responds with 401, sending nonce B. Client receives the first 401 and sends REGISTER with Authorization header, using nonce A. Server complains because the current nonce is B.

Running tcpdump on the PBX should show whether this is what is happening.

A long network delay could be caused by being far away, e.g. Baghdad to Bangalore may sometimes be routed via the US, connecting via GEO satellite or 3G mobile data, or (most likely) buffer bloat where the client’s connection is saturated by non-VoIP traffic.

I believe that pjsip is more robust in this regard; why are you still using chan_sip?

Or, try increasing SIP T1 on the client to 1.0 or even 2.0 seconds.

Possibly, the errors you are seeing get promptly recovered and the problem is caused by something else.

If you still have trouble, please post: Client device make/model or app/version? Router/firewall at client location? New system, or one that just started failing? If PBX is itself behind a NAT, provide details. Paste ~5 seconds of the Asterisk log demonstrating a failure at pastebin.freepbx.org, including pjsip logger or sip debug, showing at least 2 seconds before and 2 seconds after the error entry. Is there any pattern to the failing clients, e.g. they are all behind the same NAT or behind the same NAT make/model?

Steveg70 · October 27, 2021, 7:20am

The server the clients connect to are behind a 10g fiber connection on a public ip. Clients are on our own network with ping times < than 20ms for the most part.

Freepbx firewall only (with client subnets trusted) and client side is router (usually tplink with alg off) and NAT. Device is SPA112 or SPA122 cisco ata.

I will run tcpdump on the server today to see if I can sniff out a single client… but as it stands it looks like about 50 out of 200 clients are having this issue.

The issue seems to be related to these 50 or so clients… most seem to have that stale nonce error popping up frequently. Perhaps chan_sip is having issues dealing with 200 clients?

I used chan sip because of issues with some devices not able to connect with pjsip

Steveg70 · October 29, 2021, 1:15am

Just an update for anyone else seeing this issue… we had to reboot the server for something else and once freshly rebooted the errors stopped. Watching it not for 24hrs and they have not returned.

We have been migrating a lot of users from our old platform to this one and havent reboot or restarted asterisk in quite some time… Hoping this was the issue and it doesn t return .

system · November 29, 2021, 1:15am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.