Strange issue

andersonhaulage · April 21, 2016, 7:18pm

Network has not changed. This PBX has been in production since November and I’m only seeing this in the last couple of weeks. I’m stumped. Random endpoints, random times.

Can you think of a monitoring tool that can be used on the phones using TCP? I’d really love to see if the lag comes from the network on general, or the PBX NIC in particular.

dicko · April 21, 2016, 7:27pm

The delay is due to the endpoints not replying quick enough to a sip “options” (which is generally UDP not TCP), just choose one extension that “disappears” and from the asterisk cli

sip set debug peer nnn

and watch the flow.

xrobau · April 21, 2016, 7:49pm

Look at this from a different angle then. Which endpoints are not playing up, and what’s the common factor with them?

Is a switch on the way out? Maybe a certain brand of phone has automatically upgraded its firmware and has a bug. Maybe they HAVEN’T upgraded their firmware and there’s a date related bug that needs a firmware upgrade. Maybe there’s something on their network segment that’s spamming multicast SIP Packets that’s exhausting their CPU?

Also, reading back, you’re confused between ICMP PING (which is a kernel level thing) and a SIP Qualify (which is an application level thing).

An ICMP Ping is you asking the very bottom of the operating system of the phone ‘are you running?’ while a SIP Qualify requires the entire phone to be up and talking SIP correctly, and to actually think about the packet that’s been sent to it and respond sensibly.

andersonhaulage · April 21, 2016, 8:06pm

Problem is, it’s random. All sets are affected at some point. Some more than others, but all do. All but mine are on the same firmware. I brought mine up to the latest firmware yesterday and also set it to qualify=no. I still got “No Service” a couple of time. I do multicast paging. While sets are at “No service”, they can still receive pages and make calls. Calls to them go straight to VM.

johnens · April 21, 2016, 8:10pm

Having a similar issues on one of our systems also. Running Aastra 6737i phones. Have some older 57i phones also, and they don’t seem to be having any issues. All the phones are running current firmware.

Very intermittent here too. Began happening fairly recently, but can quite put my finger exactly when.

andersonhaulage · April 21, 2016, 8:19pm

Could it be a recent FreePBX update?

johnens · April 21, 2016, 8:21pm

That’s what I was leaning towards. Like yourself, there haven’t been any changes on our local network either.

In the past, I’ve always had more of these goofy issues with the Aastra phones. Back when we ran Polycom, they seemed much more solid.

xrobau · April 21, 2016, 8:32pm

Extremely unlikely.

The only possibility that jumps to mind is that some application is spamming the phones with some sort of traffic. Can you do a tcpdump to see what is being sent to and received from the phones, and see if that matches up with what asterisk thinks it’s seeing?

andersonhaulage · April 25, 2016, 8:03pm

Still monitoring, but I may have solved it. The VM was sitting on a, shall we say, less then Enterprise grade datastore. I was able to finish the setup of the final SAN and moved the VM. It’s been about 5 hours and I have not had a phone drop.

andersonhaulage · April 26, 2016, 2:44pm

So nearly 24 hours. Had one extension go unreachable around 0500 this morning. But that’s during backups so network and datastore are being hammered hard. Besides, 0500, WGAS?

Looks like I’m OK.

gherbstman · August 29, 2016, 12:11pm

Have you found a cause? I am having a similar problem. Randomly Asterisk stops responding to register.

andersonhaulage · August 29, 2016, 2:39pm

I never found a cause, but moving the VM to a faster datastore fixed the issue.