Hi all,
Though not strictly FreePBX related, I still consider this topic vital to running a sound PBX solution and therefore worthwhile to share here. This will provide detailed insight in the network quality.
Primer
Tshark is Wireshark’s command line counterpart. It has a good deal of Wireshark’s functions and features, amongst which 2 I’m utilizing here: RTP stream statistics and custom selection of (RTCP) packet fields.
Example results
RTP stream statistics will log something like below;
==============================================================
Mon Mar 28 15:00:01 CEST 2011
========================= RTP Streams ========================
Src IP addr Port Dest IP addr Port SSRC Payload Pkts Lost Max Delta(ms) Max Jitter(ms) Mean Jitter(ms) Problems?
10.0.0.94 21560 10.0.0.20 13054 0xFDE2D779 ITU-T G.711 PCMU 5344 0 (0.0%) 24.61 0.74 0.10
10.0.0.20 19900 193.169.138.162 10228 0x113E791C ITU-T G.711 PCMU 5343 0 (0.0%) 26.15 1.00 0.13
193.169.138.162 10228 10.0.0.20 19900 0x77F87D71 ITU-T G.711 PCMU 5344 0 (0.0%) 26.11 1.28 0.29
10.0.0.20 13054 10.0.0.94 21560 0x0C92CC76 ITU-T G.711 PCMU 5344 0 (0.0%) 29.08 1.70 0.31
193.169.139.167 11156 10.0.0.20 12318 0x1C261F02 ITU-T G.711 PCMU 2305 0 (0.0%) 28.40 1.41 0.42
10.0.0.20 19612 10.0.0.68 28782 0x5EC4AEE8 ITU-T G.711 PCMU 2301 0 (0.0%) 28.46 1.45 0.44
10.0.0.68 28782 10.0.0.20 19612 0x0DD89BE3 ITU-T G.711 PCMU 2299 0 (0.0%) 27.37 2.35 0.10
10.0.0.20 12318 193.169.139.167 11156 0x10DB52A4 ITU-T G.711 PCMU 2299 0 (0.0%) 32.70 2.39 0.13
RTCP packet field selection will log something like below;
2011-03-28 15:55:48.790319 10.0.0.20 -> 193.169.138.162 RTCP id == 0x28f32e0b id == 0x3061e0d8 jtr == 19 cumnr == 2 fract == 1
2011-03-28 15:56:08.802233 10.0.0.20 -> 193.169.138.162 RTCP id == 0x28f32e0b id == 0x3061e0d8 jtr == 26 cumnr == 3 fract == 1
2011-03-28 15:56:28.790554 10.0.0.20 -> 193.169.138.162 RTCP id == 0x28f32e0b id == 0x3061e0d8 jtr == 21 cumnr == 4 fract == 1
2011-03-28 16:30:09.073794 193.169.139.167 -> 10.0.0.20 RTCP id == 0x649cfc1c id == 0x1afcadf2 jtr == 50 cumnr == 4 fract == 4
2011-03-28 16:30:14.073483 193.169.139.167 -> 10.0.0.20 RTCP id == 0x649cfc1c id == 0x1afcadf2 jtr == 51 cumnr == 6 fract == 2
2011-03-28 16:30:24.074152 193.169.139.167 -> 10.0.0.20 RTCP id == 0x649cfc1c id == 0x1afcadf2 jtr == 58 cumnr == 7 fract == 1
2011-03-28 16:35:54.982355 10.0.0.20 -> 193.169.138.166 RTCP id == 0x1db75684 id == 0x3407dae3 jtr == 19 cumnr == 1 fract == 1
2011-03-28 16:43:44.978312 10.0.0.20 -> 193.169.138.166 RTCP id == 0x1db75684 id == 0x3407dae3 jtr == 24 cumnr == 2 fract == 1
Where fract is the number of packets lost in this interval (5 second intervals of 250 packets total (50/s)).
Location, location, location
RTP stream statistics are calculated from the RTP media traffic captured by tshark on the pbx itself. So stats on streams sent by the PBX should by quite perfect. But this doesn’t say anything on how the other end will receive the stream. So the stream statistics are primarely usefull for determining the quality of incoming streams.
RTCP packets on the other hand contain statistics themselves, generated by a phone, pbx, trunk provider (if you’re lucky) and sent over to provide quality feedback. Asterisk generates them, Cisco phones generate them, my trunk provider generates them, and so do probably a whole lot of others.
Complications
Because tshark only outputs the RTP stream statistics when ended, I make it restart every 15 mins. This can cause it to be started again in the middle of a conversations, which UDP traffic it might not (immediately) interpret as RTP again. Further, sometimes off the chart figures are seen, but these are easily recognized and can only be ignored.
According to various records on the internet RTCP is far from perfect too: differences in calculation, odd figures, trouble passing NAT (it typically uses the RTP port +1) or even no support at all. Again, anomalies are often recognizable and can only be ignored.
As is true for all statistics; don’t consider the numbers complete or perfect, just give sufficient thought to their origin.
The source
The RTP stream monitor (schedule in cron to run ie. every 15 mins):
# Kill the running process
kill `cat ~/stream.pid`
# Remove any temporary files tshark might have left
rm -f /tmp/ether*
# Clean up files older then 20 days
/usr/bin/find /var/rtpmon -type f -mtime +20 | /usr/bin/xargs /bin/rm -f >/dev/null 2>&1
# Set current day for log file
STREAMDAY=$(date +"%m-%d-%Y")
# Append the date to the log
/bin/date >> /var/rtpmon/streams$STREAMDAY.log
# Start the new monitor process
/usr/sbin/tshark -a duration:897 -qp -z rtp,streams >> /var/rtpmon/streams$STREAMDAY.log &
# Write the started process's PID to file
echo $! > ~/stream.pid
The RTCP monitor (scheduled in cron to restart at midnight) (also note that I’m filtering out RTCP reports that have no packet loss):
# Kill the running process
kill `cat ~/rtcpmon.pid`
# Remove any temporary files tshark might have left
rm -f /tmp/ether*
# Clean up files older then 20 days
/usr/bin/find /var/rtcpmon -type f -mtime +20 | /usr/bin/xargs /bin/rm -f >/dev/null 2>&1
# Set current day for log file
STREAMDAY=$(date +"%m-%d-%Y")
# Append the date to the log
/bin/date >> /var/rtcpmon/rtcpmon$STREAMDAY.log
# Start the new monitor process
tshark -ltad -z proto,colinfo,rtcp.ssrc.fraction,rtcp.ssrc.fraction -z proto,colinfo,rtcp.ssrc.cum_nr,rtcp.ssrc.cum_nr -z proto,colinfo,rtcp.ssrc.jitter,rtcp.ssrc.jitter -z proto,colinfo,rtcp.ssrc.identifier,rtcp.ssrc.identifier -R "rtcp.ssrc.fraction>0" | sed -u 's/rtcp.ssrc.identifier/id/g' | sed -u 's/rtcp.ssrc.jitter/jtr/g' | sed -u 's/rtcp.ssrc.cum_nr/cumnr/g' | sed -u 's/rtcp.ssrc.fraction/fract/g' | sed -u 's/Sender Report Source description//g' >> /var/rtcpmon/rtcpmon$STREAMDAY.log &
# Write the started process's PID to file
echo $! > ~/rtcpmon.pid
I hope anyone finds this usefull.
With best regards,
Bas