System crash - 4 phones can call, but do not ring when called to


#1

Hi, thanks in advance for any help.

I had a system crash yesterday afternoon and had it back up and running within an hour with no issues.

Today it seems that I have 4 extensions (out of roughtly 80) that can make calls, but do not receive them.

When someone calls, the actual phone does not ring at all and the calle either gets the message that they are ‘on the phone’ or it just rings until voicemail (which works).

I have even tried deleting and recreating one of the accounts with the exact same result. All other extensions work just fine.

Help me obi wan kenobi!


#2

I have noticed I’m getting a LOT (every 10 seconds) of securityEvent=“ChallengeSent” for the extensions in question.

[2021-03-18 19:28:39] SECURITY[2664] res_security_log.c: SecurityEvent=“ChallengeSent”,EventTV=“2021-03-18T19:28:39.722-0400”,Severity=“Informational”,Service=“PJSIP”,EventVersion=“1”,AccountID=“119”,SessionID=“xxx”,LocalAddress=“IPV4/UDP/xxx.xxx.xx.xx/xxxx”,RemoteAddress=“IPV4/UDP/xx.xx.xxx.xxx/xxxx”,Challenge=""


#3

That’s a network problem, If the extension is not available it won’t respond to the challenge


#4

Appreciate for the reply dicko. I have been kind of digging down that road, but cannot for the life of me seem to get anywhere. I think it is registering, there is another successful message after that (I will post) but it seems to happen often. I also noticed that the 4 extensions that are doing this are all having this same behavior. They just keep registering.

I’d like to add a few items.

  1. The same extension that is throwing the above log can both dial out and receive voicemail. Also if placed into DND it responds right away (no ringing still) and says the user is unavailable etc.
  2. There are many phones on the same network as the one throwing that log and they works fine.
  3. I have tried the extension from 3 different networks, all with the same result AND while having another extension connected to the same server. The ‘other / comapare extension’ works just fine.
  4. I have tried soft phones and replacing the physical phone, same issue.
  5. I have disabled the firewall on the FPBX system (for a min phew) - same results.
  6. I have completely deleted the extension twice and tried all of the above again. same results…

A little More Log

[2021-03-18 20:28:06] SECURITY[2664] res_security_log.c: SecurityEvent=“ChallengeSent”,EventTV=“2021-03-18T20:28:06.350-0400”,Severity=“Informational”,Service=“PJSIP”,EventVersion=“1”,AccountID=“119”,SessionID=“94386YzU3N2IxNmE4MzdjN2Y0NDI2MmI4ZTM3YzhmMmYzNWY”,LocalAddress=“IPV4/TLS/xxx.xxx.xx.xx/xxxx”,RemoteAddress=“IPV4/TLS/xxx.x.xxx.xxx/63049”,Challenge=""
[2021-03-18 20:28:06] SECURITY[2664] res_security_log.c: SecurityEvent=“SuccessfulAuth”,EventTV=“2021-03-18T20:28:06.371-0400”,Severity=“Informational”,Service=“PJSIP”,EventVersion=“1”,AccountID=“119”,SessionID=“94386YzU3N2IxNmE4MzdjN2Y0NDI2MmI4ZTM3YzhmMmYzNWY”,LocalAddress=“IPV4/TLS/xxx.xxx.xx.xx/5161”,RemoteAddress=“IPV4/TLS/xxx.x.xxx.xxx/63049”,UsingPassword=“1”
[2021-03-18 20:28:06] SECURITY[2664] res_security_log.c: SecurityEvent=“ChallengeSent”,EventTV=“2021-03-18T20:28:06.452-0400”,Severity=“Informational”,Service=“PJSIP”,EventVersion=“1”,AccountID=“119”,SessionID=“94386MDQ2MzEwNjYxMGJlZmQ5NWQwMjdlM2M2ZDU2ZWQ3MGI”,LocalAddress=“IPV4/TLS/xxx.xxx.xx.xx/5161”,RemoteAddress=“IPV4/TLS/xxx.x.xxx.xxx/63049”,Challenge=""
[2021-03-18 20:28:06] SECURITY[2664] res_security_log.c: SecurityEvent=“SuccessfulAuth”,EventTV=“2021-03-18T20:28:06.472-0400”,Severity=“Informational”,Service=“PJSIP”,EventVersion=“1”,AccountID=“119”,SessionID=“94386MDQ2MzEwNjYxMGJlZmQ5NWQwMjdlM2M2ZDU2ZWQ3MGI”,LocalAddress=“IPV4/TLS/xxx.xxx.xx.xx/5161”,RemoteAddress=“IPV4/TLS/xxx.x.xxx.xxx/63049”,UsingPassword=“1”

Any suggestions as to what else to check?


#5

sngrep is easier to reduce the clutter


#6

Thanks Dicko,

If you have any other suggestions please let me know. I will try to learn sngrep in the meantime.

Thanks!


#7

No, In the meantime I will wait for you to "try to learn . . . ", when in doubt, just press ?


#8

Ok I feel a tad foolish. I searched it and saw that it was packet capture etc and figured I would be learning a wireshark equiv and sending you a cap file.

Looks pretty great, just not sure now what to grab.


#9

2021/03/18 22:19:12.361561 xxx.x.xxx.xxx:5060 -> xxx.xxx.xx.xx:5060
REGISTER sip:xxx.xxx.xx.xx:5160 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.126;branch=z9hG4bK1a1814e35ACD2F26
From: “Al M” sip:119@xxx.xxx.xx.xx:5160;tag=8CB7E3CD-3CFB8C38
To: sip:119@xxx.xxx.xx.xx:5160
CSeq: 19 REGISTER
Call-ID: 0b16edbdfdce36893fc8254ea627f114
Contact: sip:119@192.168.1.126;methods=“INVITE,ACK,BYE,CANCEL,OPTIONS,INFO,MESSAGE,SUBSCRIBE,NOTIFY,PRACK,UPDATE,REFER”
User-Agent: PolycomVVX-VVX_411-UA/6.3.1.11465
Accept-Language: en
Authorization: Digest username=“119”, realm=“asterisk”, nonce=“1616120292/77bfc7b91d89b381a0ab059837c0a590”, qop=auth, cnonce=“4v6Y5ujoaqYV0VV”, nc=00000002, opaque=“00cdeec749e42d34”, uri=“sip:xxx.xxx.xx.xx:5160”, response=“583072d2a3c18177eca9ae729fbc
08”, algorithm=MD5
Max-Forwards: 70
Expires: 120
Content-Length: 0

I then get a 401:

I also then get a 401 it looks like:

2021/03/18 22:20:12.361367 xxx.xxx.xx.xx:5160 -> xxx.x.xxx.xxx:5060
SIP/2.0 401 Unauthorized
Via: SIP/2.0/UDP 192.168.1.126;rport=5060;received=xxx.x.xxx.xxx;branch=z9hG4bK8cd442696FF61C44
Call-ID: 0b16edbdfdce36893fc8254ea627f114
From: “Al M” sip:119@xxx.xxx.xx.xx;tag=8CB7E3CD-3CFB8C38
To: sip:119@xxx.xxx.xx.xx;tag=z9hG4bK8cd442696FF61C44
CSeq: 21 REGISTER
WWW-Authenticate: Digest realm=“asterisk”,nonce=“1616120412/205fbeb831328dd860ed7d3758518b46”,opaque=“155ad70877a69084”,stale=true,algorithm=md5,qop=“auth”
Server: FPBX-15.0.17.24(16.15.1)
Content-Length: 0


#10

New info:

If I create a new extension and use the exact same phone and provision it the exact same way via EPM it works just fine.

If I delete the buggy extension and then recreate it, it still does not work.


(Dirk2358) #11

That’s pretty normal for Register process: device sends Register and server requires authentication by sending 401. Now the device has to send a register containing a Digest. Does it send it? Does the device know that it has to do authenticated Register?
If I compare the 401 with the 401 asterisk sends here, the only difference I can see, is, that the Digest contains a “stale” parameter.


#12

Hi Dirk,

Yep: Authorization: Digest username=“119”, realm=“asterisk”, nonce=“1616132501/b2c74d89725069903a31d526f8483c66”, qop=auth, cnonce=“gqzmYNZHEsQ5E57”, nc=00000002, opaque=“013df18972288e83”, uri=“s
01:42:41.972242 x <qqqqqqqqqqqqqqqqqqqqqqqqqq x x:xxx.xxx.xx.xx:5160”, response=“2dd532254ab482ae34c64ea511b0a505”, algorithm=MD5

I keep thinking that somehow something is stuck on or a file is corrupt.

I have literally wiped any trace I can find of extension 119 from the system. I can create a new (120) and use the same phone - works instantly and perfectly.

If I re-create 119 and use the same phone again it cannot receive calls, but makes them fine.

I have also tested this with softphones / bria both local network and remote, all the same.

This seems to be going on for about 5 phones on the system since the crash.

I also just found this, not sure if it is the culprit, but debug doesn’t sound right :wink:

Debug: Found PJSIP Destination PJSIP/119") in new stack
– Executing [dstring@macro-dial-one:10] GotoIf(“PJSIP/100-00000003”, “0?doset”) in new stack
– Executing [dstring@macro-dial-one:11] NoOp(“PJSIP/100-00000003”, "Debug: Updating PJSIP Destination with PJSIP_DIAL_CONTACTS") in new stack
– Executing [dstring@macro-dial-one:12] Set(“PJSIP/100-00000003”, “THISDIAL=”) in new stack
– Executing [dstring@macro-dial-one:13] ExecIf(“PJSIP/100-00000003”, “1?Set(DIALSTATUS=CHANUNAVAIL)”) in new stack
– Executing [dstring@macro-dial-one:14] GotoIf(“PJSIP/100-00000003”, “1?skipset”) in new stack
– Goto (macro-dial-one,dstring,16)
– Executing [dstring@macro-dial-one:16] Set(“PJSIP/100-00000003”, “ITER=2”) in new stack
– Executing [dstring@macro-dial-one:17] GotoIf(“PJSIP/100-00000003”, “1?begin”) in new stack


(Dirk2358) #13

Ok. This means, the phone responds to the 401, but asterisk doesn’t send a 200 OK? Does it send anything? Does it send something other to the device?

Taking into account that the same device works perfectly with another number, I think there could be a database inconsistency or the config file is somewhat broken or even both.

Did you compare the output of pjsip show endpoint [your working|broken endpoint]? Are there any relevant differences? Did you compare the complete configuration file entries for both devices?


#14

Something is wrong here: 5160 is the port used by chan_sip and should not appear anywhere in the config for a pjsip extension or for the phone.

That does sound like corrupted database or config file somewhere. Rule out something cached locally by rebooting the server, and reprovisioning the phone after factory resetting it. If you still have trouble, search both the config files read by the phone and those in /etc/asterisk/pjsip* for 5160.


#15

Thanks guys,

I have deleted, rebooted, factory reset phone, manually combed through all of the PJSIP conf files in /etc/asterisk, deleted the user as well as the extension, used the 119 as a ring group (which worked) and then deleted the ring group and recreated the 119 doing all of the above… still does not work.

I create a new extension (120), works perfect on the same phone, zero issues and instant.

I have also tried the above from completely different networks, all with the same result.

I do know it is on 5160, that’s how it’s been setup for 3 years, its all setup correctly (with the wrong port), just whoever set it up originally used that port for pjsip. I can almost 100% say this is not the issue, especially as everything else works just fine, including the SAME phone with a different extension from the same network.

Any other ideas? I literally spent 19+ hours on this yesterday, I’m at a loss of what else to try.

I have poked around in the DB, but with 350+ tables I’m a bit stumped where to look.


#16

Try from a shell

rasterisk -x "dialplan show 119@"

(Lorne Gaetz) #17

Provide a call trace of a failed call via pastebin:
https://wiki.freepbx.org/display/SUP/Providing+Great+Debug#ProvidingGreatDebug-AsteriskLogs-PartII


#18

Hi Dicko,

It found a lot of items. I’m not quite sure I understand how it’s pulling them. I see, for example, it returns that its found in extensions_additional.conf, but when I open that file and search it, there is NO 119 in the file at all (other than part of a DID).


#19

Hi Igaetz,

The trace is here - https://pastebin.com/V0uNKcdS

Thanks.


#20

Where I to guess , first choice would be that zulu is interfering, but I don’t use zulu.