Clock Drift

tcpdump should process anything that appears on the interface - it runs in promiscuous mode before the firewall processes it. That’s why it can mislead people - they see traffic coming into their machine but it never gets processed because the firewall mangles the traffic after they see it on the inbound leg of the interface.

Dave, so if I see the jitter on the outbound, can I assume it’s the server causing it? I’m sorry if I’m naive on some of this, it’s the first time I’ve had to dive down to this level to identify an issue.

It isn’t quite as simple as one would hope.

The tcpdump program gives you real-time access to the network interface on the server. You can watch the interface and see what’s happening, but the information you see in the dump is what tells you where the problem is.

For example, if you are watching the interface and see a huge flood of messages from the server to one of your other devices that shuts everything else down, THAT would be where the jitter would come from.

I definitely see that in my capture. When I do an analysis on the rtp stream I see the packet number that the jitter begins. Going back to that point in my full capture, I see the traffic rushes in from the phone, then rushes back out to the phone about 200 packets later.

If there is no other traffic during that period, the problem is then likely confined to the server. If there is other traffic on the port at the same time (causing the delay, for example) then the source of that traffic (regardless of where) is the culprit.

As a suggestion, try looking at your swap performance. This sounds like something like a process getting swapped out or otherwise losing the ball.

Interesting. I do have a AMI call that happens right before a call enters a queue. It does a webservice lookup to determine which queue to place the call in based on the ANI. I could potentially be getting a threading locking issue.

Any thoughts on that?

I’m going to look at my swap performance now.

Thread locking is less likely than the system (since it’s a single thread) waiting for your web service call to complete before it continues. Check the performance of your web service check - if it takes a couple of seconds to complete, you might have found a likely culprit.

Thanks so much for all your help! I’ve seen different answers about whether the AMI requests are single threaded. Are you referring to Asterisk’s single thread, the service’s thread, or the OS itself?

You should assume that every call process is a single thread, regardless of the underlying technology.

There are several factors in Asterisk’s processing model that vary the actual processing experience, so while Asterisk may or may not be hyperthreaded, the processing of a single call (based on hardware interface or UDP datagram stream) is likely to be a single logical process.

Thanks again for your help! I believe I’ve narrowed it down to the web service mentioned. I temporarily disabled that and the issue seems to have been resolved. I’m working on moving that lookup to directly query the database. My tests got that down to about 8ms instead of the 200 it was taking.

how did you disable the offending web service?

It was a inside a custom destination I created. I just took that destination out temporarily.

How exactly where you doing this lookup? AMI is AMI, and Dialplan is Dialplan, and they’re totally different things, so saying you’re doing AMI in Dialplan is confusing.