Short Gap in Audio at Start of Outbound Calls

I have a strange situation that frankly, I’m out of ideas on how to troubleshoot.

We have a PBXact-100 system running in high availability mode, version 13 (the last version that supported the HA module that was sold to us) with 2 physical servers. We have approximately 80 phones, 68 of which are the old Cisco 7960 IP phones flashed with version P0S3-8-12-00 SIP firmware with the remainder being various model Sangoma S-series phones. When putting the system in, I initially wanted to slowly migrate people to more recent IP phones, but pretty much everybody likes the Cisco 7960 phones better than the Sangoma, some managers even having me replace their Sangoma phones with a Cisco 7960. (From my perspective, the Sangoma phones have a LOT more hardware issues per phone than do the old Ciscos, so I’ve had no problem at all with this.) Due to circumstances when I installed the phone system, we also have a considerable number of spare Cisco 7960 phones - at the low hardware failure rate they have, I won’t have to worry about having to buy more phones any time before civilization ends. All of the phones are on their own VLAN with the PCs hanging off the back of the phones being on their own VLAN. The switches are Cisco 3750 providing PoE for all the phones with a Meraki MX64 as the edge device, which also provides routing between VLANs.

A few months ago, our Meraki MX64 edge device had it’s firmware updated to version 17.10.5 from 17.10.2 to fix an unrelated problem. The day after this happened, the 7960 phones began having this issue. (We know this precise day as the parent of an employee passed and he was off that week; the manager had to work on the floor and noticed it the last day he was covering, which happens to be the day after the firmware update happened.) Unfortunately, nobody told me about this issue until over a month later and Meraki does not allow you to roll back firmware after two weeks of it being applied to the device. :roll_eyes: I have been in contact with Meraki support and have tried both the 18.107.2 and 17.10.7 firmwares with no impact at all on the issue. We are currently running 17.10.7 as the 17 branch is the ‘gold standard’ for this device according to Meraki support.

The problem is that intermittently when making an outbound call, you will get 1-2 seconds of normal audio, followed 1-3 seconds of silence, and then the audio is normal for the rest of the call. It does not matter if you’re on the phone for a minute or for two hours - the audio is clean after the initial few seconds. Inbound calls do not exhibit this issue, nor do outbound calls to local extensions or to the PBX itself. The problem is completely random from what I have been able to tell - sometimes it’ll happen on 3 calls in a row, other times you might make 5 calls before it happens once. The average seems to be that it will happen once about every 4 calls. It is ONLY affecting the Cisco 7960 phones - none of the the Sangoma s305, s405, s500, and s705 phones have exhibited this. I did not observe this and had no reports of it happening in the 5 years we’ve been using these phones before the Meraki firmware upgrade to 17.10.5.

To troubleshoot, I first looked at the call recordings on our PBX - they were clean. Then I contacted our SIP trunk provider - they confirmed that there is no gap when the calls hit their server. Then I loaded up a packet sniffer on the PBX, reproduced the problem, transferred the capture file to my PC, reconstructed the RTP streams, and upon listening to them discovered that there’s no gap in the audio. So I moved to the Meraki and repeated the same steps - the audio in the capture done from there is clean as well. Determined to figure out what is going on, I connected a 5-port managed switch at the phone with port mirroring turned on, connected a laptop, began a capture, and reproduced the issue. To my amazement, the reconstructed RTP audio stream is again clean, with no indication of there being any kind of gap or silence on the call! I compared a capture of a clean call to one that had the silence and there was no difference that I could tell. (To me, this would indicate a problem inside the phone itself as it’s receiving audio without a gap, but the Cisco SIP firmware for these 7960 phones hasn’t changed in years, it affected every 7960 we have simultaneously, & I cannot get past the fact that it began the morning after the night the Meraki firmware was updated… seems way too much of a coincidence to be coincidental, even though I cannot think of what the MX64 might be doing to cause such a strange, specific, and device-specific malfunction.)

Finally, I connected a 7960 directly to an open LAN port on the PBX. After enabling the interface, the phone booted, registered, and was able to make calls. Unfortunately, there was no audio either way, most likely due to some kind of firewall issue somewhere. Instead, I connected the phone to the same switch as the PBX servers are on to a switch port that is configured to have both data & voice on the same VLAN, just as the PBX servers do. I tested this phone and made 14 calls that did not have the gap before the 15th had it. Further testing showed that this happening on about one out of every 15 calls to be about average in this configuration. To me, having the problem happen about a quarter as frequently as it does when going through the Meraki from phone->PBX points the finger back at the MX64 as being the source of the issue, but the fact that it happens at all disputes that idea. (Even though calls go through the Meraki when being sent out to our SIP provider’s servers.)

I’ve sent a Cisco 7960 phone with SIP firmware to someone who has a newer version of FreePBX and he’s going to report back after he’s had a chance to test it in his environment. (Having a different version of the PBX and a different edge device is going to make it difficult to track down what caused any differences, but I’m down to grasping at straws.) No matter what he observes, I honestly don’t know where to go from here. The problem must be inside the 7960 phone because the packet capture done while tapped into the network at the phone itself doesn’t have the silence. At the same time, the problem can’t be inside the phone as it started with a firmware upgrade to the Meraki & wiring a phone directly to a switch on the voice VLAN cuts it’s frequency to a quarter of what it is going through the Meraki from phone to PBX. Meraki support says that they’ve received no other reports of SIP issues with the firmwares we are on and has basically washed their hands of the problem.

Would be grateful for any suggestions of what to do next.

if you’ve taken things to the extent you described by confirming audio at all those locations it sounds like you have to make a choice between the meraki mx and the 79xx series … I think meraki wins that one

I just went through a similar debug with captures being taken directly from phones and the PBX at the same time to see where the audio was breaking down - in the end we saw the audio leave the device and never make it to the PBX … not much more you can do but looks at whats in between for the answer and in my case it was a fortinet; in your case the phone isnt processing whats being delivered

1 Like

I understand where you’re coming from Chris, but it’s extremely frustrating / confusing / worrisome that a firmware update to a router can destroy local communications like this… and in such a specific way. Even if I replace all of the phones malfunctioning because of this, what’s to stop the same thing from happening again in the near future???

.

Over the past week, I have continued troubleshooting this and have uncovered some interesting tidbits.

  1. Tried alternate sip drivers on the PBX - both the Chan_SIP & PJSIP drivers behave the same way. (Which I kind of expected with the audio arriving at the phone clean, but wanted to test with as strange of an issue this has been.)

  2. Tried a Cisco 7961 phone. I have a few of these phones that were sent to me instead of the 7960 model and was unable to get them to work when installing the phone server. Figured that with as much experience I’ve had with VOIP since, that I’ll be able to make them work. Got the SIP firmware on it and see the server, but was unable to get the phone to register… asterisk kept complaining about an incorrect password. I gave up, the illusion of my being semi-competent when it comes to VOIP shattered. :cry: :laughing:

  3. Totally on a fluke, I found that the frequency of the gap occurring seems to be related to the destination you’re calling. (!!!) I’ve been testing with the toll free number of one of our vendors because that’s what the department head gave me as an example. I mistakenly discarded my notes from that day and he was in a meeting, so I started testing with a local store that has an automated recording pick up calls. I called this store 45 times with each of the chan_sip and the pjsip drivers without it happening once! :open_mouth: (Also tried a different switch port during this time, wondering if that might have something to do with it, but was the same.) I then tested extensively using a local pharmacy and it was fine. Thinking it might be a local vs long distance thing, I looked up the number for a pharmacy in a different area - it was also fine. By this time (takes a bit of time to dial a phone 160 times, even hanging up after 5-7 seconds :rofl: ), the manager was out of his meeting and I was able to get the vendor phone number. Tested using that number with the same phone… five calls later, the gap happened. :flushed: I got another toll free number with an automated answer and the very first call had the gap. :thinking: Went to the manager & asked if it’s only toll-free numbers doing it and he said that was not the case. I’m completely perplexed by this, especially with what I found next.

  4. I remember installing the chan_sccp module on our system years ago, but could never make it work. I dug into this again and was able to figure out that the ‘Allow Networks’ and ‘Local Networks’ settings apparently weren’t being applied and rejecting the phone’s attempts to register no matter what I put in these fields in the SCCP Server Config. Found a bug report online that seemed similar, but the source I had on the server was significantly after the fix date, so I dunno what’s going on. Not being willing to give up, I found where this was happening and put in a quick / dirty hack to allow it, recompiled, reinstalled, and tested. The phone registered and I began testing. There’s no gap in the audio.

.

So something in the change Meraki made to their firmware caused only outgoing calls to specific numbers to have a short gap in the audio at the front of the call only for specific brand/model phone and only if the phone is running a particular firmware. :thinking: :man_facepalming: :man_shrugging: :man_shrugging: :man_shrugging:

Swapped out a SIP phone with one running SCCP yesterday to test. Honestly don’t know if the chan_sccp module is solid enough to use in production, but it’s really the only option I can see short of replacing 70+ phones with a different make/model phone. (Which are going to have their own quirks / problems, as evidenced by my using the 2 s305’s we had left from a now-closed off-site location reading the caller ID with the ring for every incoming call, which is goofy to say the least. I’ve not found a way to make it shut up and just ring; since nobody’s responded to the thread I opened about it, so it’s probably built into the phone without a way to turn it off.) Would like to stay with SIP if possible as that’s what the PBX is based on without add-on modules, but not if it’s going to have goofy stuff like this happen.

Can anyone see anything I missed or should be trying??? Have to say that I’m really not looking forward to flashing nearly 70 phones and rebuilding each of the extensions… :persevere:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.