Linux Nerds - How to pinpoint where the Network Problem is?


(Greg Snover) #1

Ok - Almost all my customers are in a similar situation as far as their setup:

Desk Phones -> Local ISP -> Internet -> FreePBX Hosted at Vultr -> Internet -> Trunking Provider

Here is my question - Do you all know any way (with a Linux Box at the Customer Site) that I could isolate where the problem is actually coming from? Some cool Linux utility that looks at the Network Hops and can break them out as to where we are being let down?

I know I could use Traceroute to show me the times on the individual hops, but I need more info than that - basically I need to be able to isolate who is dropping the ball on the connections.

I am hoping there is a utility/suite that I could run on the On-Premise Box and the Co-Located box that would provide the smoking gun - i.e.:

Customer Site to Vultr Miami - Perfect
Vultr Miami to ITSP - High Latency and dropped packets utilizing Zayo (or Level(3) or whoever)

I just tried to track down a problem I had yesterday and it involved three calls with everyone (the ISP and the Hosting company) claiming there was no problems, and yet the ITSP gives me a MOS score graph for the duration of the call showing it started out good and went downhill until it dropped.

Anybody have any cool ideas about this?


(Andrew) #2

You can use mtr, but here’s the rub:

There are hops along your route that do not prioritize ICMP traffic, so you have to keep that in mind when reading the results. Basically what MTR is going to tell you is if there is a hop causing issues that are cascading throughout the rest of the route. If there is a hop that regularly reports back with high ping but the following hops do not, that does not indicate an issue with that hop or your route.

For example, the below MTR shows one hop with very high ping times, but it doesn’t cascade through the rest of the route so there is no issue with that.

HOST: Andrews-MBP-6               Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- router.asus.com            0.0%    10    1.1   1.4   1.0   2.2   0.0
  2.|-- X.X.X.X                    0.0%    10    1.9   1.9   1.7   2.2   0.0
  3.|-- X.X.X.X                    0.0%    10   12.6  11.7   3.4  47.4  13.3
  4.|-- X.X.X.X                   70.0%    10  10876 10832 10785 10876  45.1
  5.|-- X.X.X.X                    0.0%    10   10.0  11.4   8.1  13.7   1.8
  6.|-- X.X.X.X                    0.0%    10   14.3  12.9   8.8  20.5   3.4
  7.|-- X.X.X.X                    0.0%    10    7.5   7.3   7.1   7.7   0.0
  8.|-- X.X.X.X                    0.0%    10    7.5   7.1   6.7   7.5   0.0
  9.|-- X.X.X.X                    0.0%    10    7.7   8.4   7.6  12.1   1.2
 10.|-- X.X.X.X                    0.0%    10    7.3   7.5   7.2   8.6   0.3
 11.|-- X.X.X.X                    0.0%    10    8.0   8.1   7.7   8.6   0.0
 12.|-- X.X.X.X                    0.0%    10   36.2  49.0  35.6 128.4  29.6
 13.|-- X.X.X.X                    0.0%    10   51.9  52.3  51.8  53.8   0.5
 14.|-- X.X.X.X                    0.0%    10   51.4  51.5  51.2  52.3   0.0
 15.|-- X.X.X.X                    0.0%    10   52.0  55.7  51.8  86.7  10.9
 16.|-- X.X.X.X                    0.0%    10   51.8  52.0  51.7  52.8   0.0
 17.|-- X.X.X.X                    0.0%    10   51.5  51.9  51.5  52.3   0.0

Based on my experience, if you are experiencing issues with poor call quality (assuming) it is likely an issue with the local ISP and MTRs should help you to provide info to them that proves it.


(Jared Busch) #3

Use mtr from your PBX back to the site and from the PBX to the provider. I’m not home to grab an example for a bit, but with the right switches you get jitter calculated as well.


(Greg Snover) #4

Cool - That looks like exactly what I was looking for! Thanks!


(Greg Snover) #5

I will do some research and post my setup - Loading an OnSite box now…

Thanks!


#6

when in mtr press h for help, press j to show jitter u to switch between icmp and udp datagram


#7

You could also use tcpdump to capture the VOIP traffic on the PBX, then use wireshark to analyze the calls to see which leg of a call is having issues (if not both!)

Also, I use a commercial tool called voipmonitor that runs like that all the time across all our PBX and generates alerts for MOS, fraud alerts, etc. issues. I think the agent component is open source if one doesn’t want to pay for the rest of it.


(Jared Busch) #8

I like this for a live view.

mtr site.domain.com --order LSRDNABWVJMXI

so think vultr chicago and voip.ms chicago are close?

Chicago Vultr to St Louis site.


(Greg Snover) #9

Nice - I am about to take a box onsite and start it collecting.


(Jared Busch) #10

I have a script that I cron or make a systemd timer for this purpose. Usually once per hour or something. It uses a 60 ping cycle before stopping.

./mtrlog.sh

#!/usr/bin/env bash

if [ $# -ne 1 ]
then 
  echo "You must specify a single domain or IP."
  echo "Usage: './mtrlog.sh some.domain.com'"
  exit 0
fi

mtr --report --report-cycles 60 --order LSRDNABWVJMXI $1 > ~/$1_`date +"%Y%m%d-%H%M%S"`.log

this outputs some.domain.com_YYYYMMDD-HHMSS.log with this content

Start: Thu Jul 15 15:15:54 2021
HOST: somepbx.domain.com            Loss%   Snt   Rcv Drop  Last   Avg  Best  Wrst StDev Jttr Javg Jmax Jint
  1.|-- ???                       100.0    60     0   60   0.0   0.0   0.0   0.0   0.0  0.0  0.0  0.0  0.0
  2.|-- 209.222.28.1               0.0%    60    60    0  18.7  17.9   6.8  59.7   6.9  3.5  5.5 46.5 133.9
  3.|-- ???                       100.0    60     0   60   0.0   0.0   0.0   0.0   0.0  0.0  0.0  0.0  0.0
  4.|-- ???                       100.0    60     0   60   0.0   0.0   0.0   0.0   0.0  0.0  0.0  0.0  0.0
  5.|-- ???                       100.0    60     0   60   0.0   0.0   0.0   0.0   0.0  0.0  0.0  0.0  0.0
  6.|-- ae0-100.cr10-chi1.ip4.gtt  0.0%    60    60    0   1.6   3.7   1.1  51.8   7.4  0.4  4.6 50.2 89.4
  7.|-- ae19.cr9-chi1.ip4.gtt.net  0.0%    60    60    0   2.6   4.2   1.0  65.0  10.0  0.6  5.5 63.7 78.4
  8.|-- ip4.gtt.net                0.0%    60    60    0   1.4   2.0   1.2  17.9   2.5  0.2  1.2 16.5 20.3
  9.|-- ???                       100.0    60     0   60   0.0   0.0   0.0   0.0   0.0  0.0  0.0  0.0  0.0
 10.|-- ???                       100.0    60     0   60   0.0   0.0   0.0   0.0   0.0  0.0  0.0  0.0  0.0
 11.|-- 76-73-164-153.knology.net  0.0%    60    60    0   2.7   3.0   2.6  15.7   1.7  0.0  0.6 13.0  5.5
 12.|-- ???                        0.0%     0     0    0   0.0   0.0   0.0   0.0   0.0  0.0  0.0  0.0  0.0

(Greg Snover) #11

That’s cool - I wonder (without too much programming) if it could be set to alarm on any rogue values and send an E-Mail when it happens.

I used to subscribe to a service (when we first got into VoIP and SIP Trunking) where they gave us a box, and we could put it on the perspective clients network and let it run for a few days and it would tell us if their connection was solid and stable over time - it was quite pricey (but it worked great) and saved our reputation many times - we would insist on network changes/upgrades until it was happy.

Fast-forward many years and it pretty much was always coming back good (the network here in Albuquerque improved) so we cancelled it because it really was pricey.

But that is what I am looking for - mtr will work fine when I have identified a problem and can trace who to blame, but I really kind of want to set up a monitoring “net” that is running 24/7 at all the customer sites, and on all the hosted FreePBX’s - to tell me who is having problems before I get angry calls.

Vultr says they use mtr to monitor their networks, so that is helpful - I am working on a broader solution.

I am tired of always being blamed when calls don’t sound good - I have invested so much time and effort with the back end and the ITSP that it’s almost always the clients connection or their network, but I have to be able to prove it simply with data that I can give to the offending party so they will fix it.


#12

Perhaps add HEP/EEP monitoring

https://wiki.asterisk.org/wiki/display/AST/Asterisk+13+Configuration_res_hep

would help , your various PBI send to a hep capture server realtime stats, the capture server can then separate each call from each PBX and give you some meat to work with


(Greg Snover) #13

Checking it out now…


(Itzik) #14

This is a good thread.

By the last Sangoma Open Source Lounge we had discussed a couple more helpful tools. I think it’s time for a wiki post with all useful troubleshooting tools and “How-to” snippets.