Clock Drift

cynjut · April 26, 2016, 1:20pm

tcpdump should process anything that appears on the interface - it runs in promiscuous mode before the firewall processes it. That’s why it can mislead people - they see traffic coming into their machine but it never gets processed because the firewall mangles the traffic after they see it on the inbound leg of the interface.

dave69s · April 26, 2016, 2:18pm

Dave, so if I see the jitter on the outbound, can I assume it’s the server causing it? I’m sorry if I’m naive on some of this, it’s the first time I’ve had to dive down to this level to identify an issue.

cynjut · April 26, 2016, 2:54pm

It isn’t quite as simple as one would hope.

The tcpdump program gives you real-time access to the network interface on the server. You can watch the interface and see what’s happening, but the information you see in the dump is what tells you where the problem is.

For example, if you are watching the interface and see a huge flood of messages from the server to one of your other devices that shuts everything else down, THAT would be where the jitter would come from.

dave69s · April 26, 2016, 7:25pm

I definitely see that in my capture. When I do an analysis on the rtp stream I see the packet number that the jitter begins. Going back to that point in my full capture, I see the traffic rushes in from the phone, then rushes back out to the phone about 200 packets later.

cynjut · April 26, 2016, 10:53pm

If there is no other traffic during that period, the problem is then likely confined to the server. If there is other traffic on the port at the same time (causing the delay, for example) then the source of that traffic (regardless of where) is the culprit.

As a suggestion, try looking at your swap performance. This sounds like something like a process getting swapped out or otherwise losing the ball.

dave69s · April 27, 2016, 3:57pm

Interesting. I do have a AMI call that happens right before a call enters a queue. It does a webservice lookup to determine which queue to place the call in based on the ANI. I could potentially be getting a threading locking issue.

Any thoughts on that?

I’m going to look at my swap performance now.

cynjut · April 27, 2016, 5:56pm

Thread locking is less likely than the system (since it’s a single thread) waiting for your web service call to complete before it continues. Check the performance of your web service check - if it takes a couple of seconds to complete, you might have found a likely culprit.

dave69s · April 27, 2016, 10:06pm

Thanks so much for all your help! I’ve seen different answers about whether the AMI requests are single threaded. Are you referring to Asterisk’s single thread, the service’s thread, or the OS itself?

cynjut · April 29, 2016, 1:54pm

You should assume that every call process is a single thread, regardless of the underlying technology.

There are several factors in Asterisk’s processing model that vary the actual processing experience, so while Asterisk may or may not be hyperthreaded, the processing of a single call (based on hardware interface or UDP datagram stream) is likely to be a single logical process.

dave69s · May 2, 2016, 5:05pm

Thanks again for your help! I believe I’ve narrowed it down to the web service mentioned. I temporarily disabled that and the issue seems to have been resolved. I’m working on moving that lookup to directly query the database. My tests got that down to about 8ms instead of the 200 it was taking.

digitalb · May 2, 2016, 5:32pm

how did you disable the offending web service?

dave69s · May 2, 2016, 5:46pm

It was a inside a custom destination I created. I just took that destination out temporarily.

xrobau · May 3, 2016, 2:02am

How exactly where you doing this lookup? AMI is AMI, and Dialplan is Dialplan, and they’re totally different things, so saying you’re doing AMI in Dialplan is confusing.