Intermittent One Way Audio, AT&T Support Says Due To High Jitter, Not sure how to fix

jdennis · September 29, 2022, 6:14pm

We have SIP trunks through AT&T at a central location with one freePBX server(distro v15 with pjsip). We have 36 locations all with multiple phones that connect directly to our centralized FreePBX server. Each locations has whatever the local cable provider is for their internet(Spectrum, Comcast, etc).

Earlier this week we started getting locations complaining that when they had incoming calls where they could hear the caller, but the caller couldn’t hear them. Some locations are affected, some are not. It’s not every call, it’s only some calls.

We still have a trouble ticket open with AT&T on this issue and keep providing “sample” calls that have the issue and they continue to tell us it’s due to “high jitter”. Here is a copy/paste of the jitter they told us for one of the problem ones.

Minimum Jitter(ms) 9.39
Average Jitter(ms) 314.25
Maximum Jitter(ms) 1062.69

The only noteworthy recent change is about 2 weeks ago we migrated the FreePBX server which runs on a VM from one hypervisor to another. They are both on site HP Proliant DL360 Gen10 servers, the newer one has a slightly newer CPU(Intel Xeon Silver 4210R) and SSD drives that the VM now runs on. It also has a different intel 4 port gigabit network card.

Yesterday we discovered that the dedicated port we have the FreePBX VM running on defaulted to “flow control” for RX. Using ethtool we turned off all flow control and rebooted FreePBX to make sure the change was in affect. We thought we nailed it with that, however the problem persists.

Tonight we are going to try to reboot our meraki MX100 firewall and meraki core switch as a hail mary.(the server local IP address is the same, but now has different mac address, wondering if something is cached causing issues )

Anyone have any suggestions on things to try and/or look at to get this issue resolved? AT&T makes it sound like the “high jitter” isn’t due to their service and it’s a problem on our end. I’ve never had to deal with a “jitter” problem like this so I’m a bit stumped. Thank you in advance for any suggestions you might have.

Edit: Probably worth noting that phones at our HQ location that are on the local network with the phone server are NOT having this issue. It’s only the phones connecting to our phone server over the internet that are experiencing this one way audio issue.

david55 · September 29, 2022, 6:44pm

Are you able to get a pcap from routers either side of Asterisk to see where the jitter is being introduced? I’d suggest that the problem is either between the 36 locations and the Asterisk machine, possibly in the upstream networks, or is the result of the Asterisk VM not having sufficient priority for access to the host (I think it generally advisable to dedicate core solely to the Asterisk VM)

An average jitter of 314ms suggests a much larger propagation delay which will be difficult for users to tolerate, even if nothing is actually dropped.

If the local phones aren’t having problems, I’d suggest congestion between you and the 36 locations, as VM scheduling should affect the local ones as well.

You might want to google “buffer bloat”, although that might suggest a cause, rather than a solution.

jdennis · September 30, 2022, 12:16pm

Yes, we can do a packet capture on our Meraki MX100 firewall that is at the location where the FreePBX server is hosted. The tricky part is going to be to do the capture while the intermittent issue is happening. Going to try that today.

jdennis · September 30, 2022, 1:59pm

One thing I was thinking about this morning is we do a “ring all” call queue for our locations, some of which have as many as 15 different phones. Someone in the forums before described it that when we are doing that, we are pretty much DDOS’ing our selves. Does anyone think that could be causing the high jitter and maybe that’s normal for us for when a call is picked up?

We’ve explored alternatives to the “ring all” and haven’t come up with anything that we feel will work for our users, but we are open to ideas.

david55 · September 30, 2022, 2:48pm

DDOS gets overused. You will be generating similar traffic levels outbound, which will also be consuming network resources.

Whether ring all will generate a lot of traffic will depend on whether the remote phones generate early media. Whether that traffic is excessive will depend on you you have dimensioned your network connection. In rich countries, VoIP is not generally a significant user of network bandwidth and it is much more likely that someone streaming HD movies, or software updates is overloading some level of the network. The overload my be from bottlenecks in the provider’s network which are shared with other customers.

Stewart1 · September 30, 2022, 6:57pm

I recommend starting with a packet capture on the PBX into a ring of files. You can let it run 24/7 and when trouble is reported, you should have enough time to find a file demonstrating the problem, before it gets overwritten. See

jdennis · September 30, 2022, 7:24pm

Thank you for this suggestion! We are closed on weekends, so I’ll start doing this on Monday.

jdennis · October 3, 2022, 3:33pm

So after looking at the packet captures from the tcpdump in wireshark for 2 samples that had the one way audio issue, they both show a bunch of packets with “incorrect timestamp”. It’s on the packets going from our phone server to the AT&T sip trunks. I’ve tried googling it and searching in these forums and not sure what needs to be done to correct that symptom.

david55 · October 3, 2022, 6:49pm

The Wireshark summary isn’t very helpful without information on the nature of the incorrectness, and the stage in the call to which it relates.

Are the problem frame “marked”. Whilst I’m not sure of the current position on this, when Asterisk switches between different streams it used to use mark, which has the meaning: this is a good place to reset any accumulated jitter buffer, not, this is a change to a completely different sequence of timestamps. I think that was the result of its not tracking SSRC values properly internally, as they are an RTP concept, but Asterisk was originally written for ISDN.

We found that some PABXes didn’t like this but if we faked an SSRC change they were happy. This was done on Asterisk 1.6, and not fed back, as we were locked into an early version. I’m now retired and don’t have access to the code change, but, in any case, it is very unlikely that a patch would apply cleanly to a current version.

jdennis · October 3, 2022, 8:56pm

No, the problem frame(s) aren’t “marked”. Unfortunately I’m getting a bit in over my head here. Still not sure how to get this one figured out.

david55 · October 3, 2022, 9:07pm

You really need to provide us with the wireshark RTP analysis for the problem stream. I suspect that is going to be easier than asking for a description of the anomaly.

jdennis · October 4, 2022, 12:20pm

Ok, the attached files I believe are what you are asking for. I’ve never used wireshark before until a few days ago, so let me know if you need more than this or in another format. Thanks for your help.

rtp_packet_data.tgz (40.0 KB)

david55 · October 4, 2022, 2:03pm

"Packet","Sequence","Delta (ms)","Jitter (ms)","Skew","Bandwidth","Marker","Status"
"10.5.0.2:12976 → 10.5.0.5:18046 (0x3f725d8d)"
41868,19157,19.974999999998545,0.09123562002359628,-0.10800000000017462,81.6,"","OK"
41905,19158,20.00800000000163,0.08603339377222338,-0.11600000000180444,81.6,"","OK"
41945,19159,20.016999999999825,0.08171880666144851,-0.13300000000162981,80,"","OK"
41973,19160,20.091999999996915,0.08236138124491517,-0.2249999999985448,80,"","OK"
42174,19161,20.091999999996915,0.08236138124491517,-0.2249999999985448,80,"","Incorrect timestamp"
42224,19162,20.091999999996915,0.08236138124491517,-0.2249999999985448,80,"","Incorrect timestamp"
42280,19163,20.091999999996915,0.08236138124491517,-0.2249999999985448,80,"","Incorrect timestamp"
42319,19164,20.091999999996915,0.08236138124491517,-0.2249999999985448,80,"","Incorrect timestamp"

My memory was playing tricks and I thought this would include the timestamps, but you obviously need to open up the individual packets for those. However, I suspect you will find that they all have the same value as sequence 19160.

The very precise timing means that they are coming from very close, so presumably being created by Asterisk. I’m guessing from the fact that they are slightly slow, that Asterisk is filling in. I’m not sure if that is the result of a loss of input, or because it is turning comfort noise into literal silence. (if unencrypted, you should be able to see the silence in the data, or doing music on hold).

I’m not aware of Asterisk filling in for a simple underrun, so comfort noise may be a better guess. You need to correlate these with the corresponding incoming packets, and the call state, to try and understand why Asterisk might be sourcing frames locally.

Music on hold is so common, that I can’t believe that Asterisk would send wrong timestamps for that, without someone complaining before.

I think the skew is constant because it isn’t being calculated whilst the timestamp is considered valid, as I’d actually expect it to be heading off in the negative direction given each packet is 92 microseconds later than the previous one.

One other thought: have you tried to enable a jitter buffer in Asterisk. That is an unusual thing to do for VoIP both sides, but might result in Asterisk filling in for an underrun, in which case it is confusing the issue, and should be turned off for clearer debugging.

jdennis · October 4, 2022, 2:21pm

No, I haven’t enabled a jitter buffer, unless there is one on by default I’m not aware of. We are using digium phones with pjsip.

Stewart1 · October 4, 2022, 2:42pm

OK, so Asterisk is sending ‘bad’ RTP. Where did this come from (a remote extension, announcement generated by Asterisk, voicemail playback, etc.)? If from a remote extension (Asterisk was simply relaying the RTP), was the incoming RTP likewise impaired?

If so, then we need to look at the source to see whether the branch-to-headquarters network path was overloaded or otherwise malfunctioning.

OTOH, if the incoming stream was good, we need to see whether Asterisk had a CPU or other resource shortage at the time, or if it was handling the relay incorrectly.

If there were multiple calls in progress at the time of the trouble, find out whether the other RTP streams sent by Asterisk were also impaired.

david55 · October 4, 2022, 2:48pm

The frames with the bad timestamps have absolutely no jitter, so I think they have to have been originated from Asterisk.

Again the total lack of jitter tends to contra-indicate this, although maybe the same jitter applies to the capture clock. I suppose a resource limitation could create a situation where Asterisk sources frames, but it would be the sourcing of frames that resulted in bad time stamps, not the resource starvation.

Stewart1 · October 4, 2022, 3:00pm

Yes, but

So at least some of the troubles occur during conversation between two people. The OP needs to tell us what was going on during the failure he observed. He should both listen to the RTP and look at what the Asterisk log shows at the time of the trouble.

jdennis · October 4, 2022, 3:17pm

Looking in the “full” asterisk log, I see this showing up a lot. 3,209 times so far today.

[2022-10-04 10:45:35] WARNING[4821][C-00000507] taskprocessor.c: The ‘stasis/p:channel:all-00002104’ task processor queue reached 500 scheduled tasks again.

Also there are a LOT of these errors, not sure if these are normal or not.
[2022-10-04 10:45:06] ERROR[4643][C-00000502] pbx_functions.c: Function SIP_HEADER not registered
[2022-10-04 10:45:06] ERROR[4643][C-00000502] res_pjsip_header_funcs.c: This function requires a PJSIP channel.

The sample that I had attached previously is a digium phone at a remote location connecting to our centralized FreePBX system at a different location where our AT&T sip trunks are at. It was a customer calling in to talk to our sales counter. On some of the problem ones I can hear the audio on both ends when I play it back in wireshark, but they can’t hear each other on the call.

david55 · October 4, 2022, 3:26pm

This does indicate an overload. However, the normal mitigation strategy is to refuse calls.

These are simply because FreePBX tries doing things both the chan_sip and chan_pjsip way, without even checking whether SIP is being used.

jdennis · October 4, 2022, 3:32pm

The thing that is so confusing about this is many of our large branches are using our system all day every day and never experiencing this issue. I still have a ticket open with AT&T and continue to send them call samples to review and they have yet to do anything to address this. It feels like it could be an issue on their end, but I’m not sure?

We’ve considered migrating our FreePBX VM back to the previous hardware it was running on, but my concern there is we use the commercial Parking Lot Pro module, so we’d have to unregister it from the current hardware and re-register it on the old hardware(and you can only do that so many times before you have to contact them). Furthermore the older hardware is out of warranty, so we need/want to get it off there anyways.

Does anyone think the slightly newer CPU and network card in the new hardware could be causing this sort of issue? Other than that, the hardware is very very similar. HP Proliant DL360 Gen10. I don’t want to have to do another after hours migration if it’s extremely unlikely the hardware is causing this.