Clock Drift

dave69s · April 21, 2016, 5:27pm

I’m having audio loss issues on both ends of my calls. I reviewed a tcpdump in Wireshark, and I see a significant amount of clock drift and jitter on the stream from the server to the device. Our network was recently updated, and I know the FreePBX server, network switches and routers each have a different ntp server specified (thanks to my network admin).

I’m thinking this is a clock sync issue on the network because there is no visible packet loss in the streams.

Does this sound reasonable? And does setting them all to use the same ntp server seem like it could resolve this? I know the router and FreePBX server have never been using the same ntp server, so this happening now is a surprise.

Thanks!

xrobau · April 21, 2016, 7:55pm

‘Clock Drift’ is only relevant to PSTN connections, when you want to ensure that there’s perfect synchronization with your provider with the ‘ticks’ that their hardware is providing.

This is not relevant to wall clock time in any way.

What exactly do you mean by that? Please paste examples.

dave69s · April 21, 2016, 8:12pm

This is what I get when I do an analysis on the RTP stream between the server and the device. Server is .43, devvice is .168.

172.18.112.43:11680
172.18.115.168:11798
Forward

SSRC
0x0db4b730
Max Delta
1218.54 ms @ 60066
Max Jitter
65011663.30 ms
Mean Jitter
603403.30 ms
Max Skew
536870063.75 ms
RTP Packets
1782
Expected
1782
Lost
0 (0.00 %)
Seq Errs
0
Duration
35.61 s
Clock Drift
-379906 ms
Freq Drift
-77352 Hz (-1066.91 %)

Reverse

SSRC
0x0db4b730
Max Delta
20.49 ms @ 60082
Max Jitter
0.17 ms
Mean Jitter
0.07 ms
Max Skew
-1.19 ms
RTP Packets
1791
Expected
1791
Lost
0 (0.00 %)
Seq Errs
0
Duration
35.80 s
Clock Drift
-27590 ms
Freq Drift
1835 Hz (-77.07 %)

xrobau · April 21, 2016, 8:53pm

Seems fine to me. Nothing out of sequence, and from the rest of the numbers provided, it looks like they’re basically random, so they’ll be no use to you at all.

So, you’re saying that you have two devices on the same network segment, and you’re having audio issues between them? What are these two devices, exactly?

dave69s · April 21, 2016, 9:29pm

I’m actually seeing the issue on calls between IP phones internally, as well as talking with external customers. Audio will drop out for sometimes as long as 5 seconds and both sides hear it. I’ve done several tcpdumps from the server and what I posted is pretty much all I’m seeing. None of my network devices (switches, routers, etc) are showing dropped packets. I have 2 different SIP providers, and neither is seeing any issues on their end. I also see the issue during calls on either carrier so I ruled that out.

xrobau · April 21, 2016, 9:31pm

That is a physical network issue, or, the machine running Asterisk is actually freezing.

Edit: As an educated guess, this is a spanning tree issue. Speak to your network guy and get them to fix it.

dave69s · April 21, 2016, 9:34pm

That’s been my thoughts as well. I swapped over to my backup server a couple of weeks ago and still have the problem. So, I’m thinking network. I was thinking maybe a duplicate IP somewhere, or mis-configured router/switch.

digitalb · April 21, 2016, 9:52pm

We have been seeing this exact issue too for about 2 months across several PBX systems and haven’t be able to narrow it down. It also seems to mainly affect the new aastra models (686xi).

So far our network engineers haven’t be able to narrow it down.

What version of freepbx and asterisk are you running on?

dicko · April 21, 2016, 10:04pm

one initial diagnostic might be to turn rtp debugging on,

rasterisk -x ‘rtp set debug on’

Identify both legs of a call with the problem (2964 = originating leg, 2980 = destination leg)

grep -E “[2980]|[2964]” /var/log/asterisk/full|less

look for out of order or missing packets.

(I’ve seen it on asterisk 13.7.2 with aastra’s)

dave69s · April 21, 2016, 10:17pm

My network admin just told me he is seeing time drifting 55 seconds over 4 days. He’s going to work on that tonight.

I’m using FreePBX 12 w/Asterisk 11. I have 200 Yealink T41P phones.

cynjut · April 22, 2016, 1:14pm

From my NetBSD developer “time served”, I know that one of the most common causes of significant clock drift is dropping interrupts. If the machine is busy enough to drop interrupts, you have a real problem, especially with real-time applications like VOIP.

If the server is drifting that much, you need to look at the server’s performance. I’d be willing to bet you have a card (or a hard drive) that’s losing its mind.

dave69s · April 22, 2016, 3:17pm

I debugged the RTP using Wireshark, and saw something very strange on an echo test. The RTP showed back and forth, between the phone and server, as expected, but there was a point in which I had a rush of about 50 packets go from the phone to the server, followed by a rush of the same number of packets going back to the phone. Then, everything went back to the normal single packet back and forth. Make sense?

During that call, I experienced a huge delay at the time in which the packets did that strange one way burst.

Dicko … is there something I can gain from the rtp debug in Asterisk that I can’t via Wireshark? I have about 12K calls a day, so identifying to legs out of all of those is tricky. Right now, I’m using Wireshark because I can do a tcpdump to specify the source/dest IP addresses.

Thanks for everyones feedback!

dave69s · April 22, 2016, 3:18pm

BTW … my network guy came back today and said all configurations are correct and he sees no issues. I know I haven’t seen any spanning tree issues personally, and the clock issue is the only one I know of.

cynjut · April 22, 2016, 4:11pm

Remember those dropped interrupts we were talking about a few minutes ago.

Since the server isn’t responding to your packets in order (and a timely fashion), I’d lean toward a problem with your hardware.

Have your network guy do more than come in, look at the switch, and tell you he’s too busy to find the problem. This could be packet storming, Christmas Tree packets, or any one of dozens of other network overloads. It could also be a switch port or a server card starting to fail.

The more you tell us, the more is sounds like server or network infrastructure hardware going out. If your IT guys doesn’t want to fix it, it might also be time to find a new IT guy.

dicko · April 22, 2016, 4:31pm

rtp set debug ip a.b.c.d

dave69s · April 25, 2016, 3:13pm

I did additional analysis and what I’m finding is that I experience jitter of around 300ms, a drop of bandwidth by about 80% during these times of choppiness. My tcpdump is consistently showing it with the source being the phone server and the destination of the phone.

If the tcpdump is being done on the server, wouldn’t this show the issue as being on the server because the source is the actual server?

dave69s · April 25, 2016, 10:07pm

One more note, I had the same issue on my primary server so I failed over to this one. That leads me to believe it’s not a server hardware issue.

cynjut · April 25, 2016, 10:35pm

Network it is then. As you eliminate the possible, the impossible becomes more and more likely.

xrobau · April 25, 2016, 10:41pm

The interesting thing is that there are no packets lost. You see the sequence number increments perfectly, 9844, 9845, 9846, 9847, 9848.

Just that between 9846 and 9847 there was 200 OTHER packets that weren’t displayed in that filter. I’d be looking at what those are.

dave69s · April 25, 2016, 10:59pm

Dave … I’m still with you on network, but I’m not sure why I’m seeing these results heading out of my interface. I wouldn’t expect to see them on a tcpdump inside the server. Am I wrong? This wouldn’t be the first time for that!

Rob … The 200 or so packets are on the other side of the stream. What you’re seeing in the phone server to the phone. The other 200 are the phone to the phone server. They show on the Forward tab in the Wireshark analysis.