Random Dropped Calls

I have a client that has been experiencing dropped calls. They were using an end of life distro (I didn’t setup) that had all sorts of problems on top of the dropped calls. I recommended to build a new server with the current distro 5.211.65-14. Bulk imported DID’s and extensions and recreated everything else.

Calls were still dropping. The only thing in I saw in the logs were “Manager ‘admin’ from 127.0.0.1, hanging up channel: SIP/XXXXXXXX” and something that led me to believe their Watchguard Firewall had an ALG that was disturbing things. We got another static IP and setup a basic router and forwarded my sip and rtp ports. Thought it was fine for a while then suddenly 4 dropped calls in a row.

I’m stumped and don’t know what is causing “Manager ‘admin’ from 127.0.0.1, hanging up channel: SIP/XXXXXX” as that’s the only thing I see now. Any ideas on where to look now?

The manager errors just mean some process is trying to connect to Asterisk that doesn’t have a manager user account. This is not the cause of dropped calls.

Neither is the hangup up SIP channel. This is just housekeeping after the call has already dropped.

I would be looking at RTP debug. Does the audio go away and then the call disconnect in about 20 seconds afterward? If it is you are losing the media stream during the call and you were much closer when checking router and ALG.

That’s what I thought. The fact that I couldn’t find anything on google about the manager made me thing the call was gone and it was just cleaning it up.

The call drops at the same time as the audio so I wasn’t thinking RTP, but I turned it on so I can check when they get dropped calls again. The problem is when I test it, it’s fine. I was on through a softphone via VPN and called myself for over an hour. I don’t think it’s an ALG thing anymore since the firewall is bypassed and I put in a basic Asus RT-N12 which I use in my own setup with no problems.

No the SIP hangup is cleanup. The manager is the AMI (Asterisk Manager Interface). 127.0.0.1 is the loopback address of the computer so we know it’s a local process trying to connect with the wrong credentials. Likely candidates are FOP, Aastra or iSymphony daemons.

I would have SIP trace running to your carrier so that when one happens you can catch it. Use tcpdump to a file https://www.youtube.com/watch?v=OFpQLyQxt84 is a good video on how to do it. Make sure you use the destination filter and only capture packets destined to your ITSP and the answers.

With this data you should be able to solve the mystery.

On a problematic system, I start a capture with a command like:
tcpdump -s 0 -C 100 -W 10 -w rbuf -Z root &
This writes all network traffic into a series of files rbuf0, rbuf1, …, rbuf9. Each is limited to 100 MB and after filling rbuf9, it continues, overwriting rbuf0. You can let this run 24/7, since it won’t use more than 1 GB of disk. When you get a report of a problematic call, copy the file with the proper time range to a temporary and transfer it to your PC. Open it in Wireshark and with Telephony->VoIP calls, view and analyze the failing call. You have everything, including extension and trunk signalling, announcements heard, DTMF sent, and the actual conversation.

No guarantee that you’ll be able to fix the trouble, but at least you will know exactly what went wrong :slight_smile:

2 Likes

Thanks for the help guys, but still no luck. Maybe I just don’t know what I should be looking for. On the RTP debug, I don’t see anything different from a good call and a bad call. The only thing that looks odd to me is the port from the trunks are in the 40000 range and the polycom phones in the 2000 range. They Yealink phones are in the 11000 range but they drop too so I’m assuming these are just ports used internally so it should be fine?

On the tcpdump I tried your method Stewart, which gave me a 2 hour range to fill the 1GB. But I did capture some of the dropped calls, but to my untrained eye, it looks like the call ended normally. “X-Asterisk-HangupCauseCode: 16”

It does look like the pbx is sending the BYE though, but I don’t know what’s initiating it.

explain a bit more about when the call drops. for example is it after putting the call on hold? or does it just drop in the middle of a conversation?

It’s not held or parked. It will drop mid sentence. They have reported to me that sometimes they’ll even drop while dialing into a queue, but I can’t find any of those calls in the logs. They have call recording enabled so we have quite a few examples of the hangup mid word so it’s not like the example Skyking gave earlier. Is it safe to put the cap online if someone can look at it? Or would some of that info compromise the pbx?

On a normally ended call, the PBX receives a BYE from the extension and sends a BYE to the trunk, or vice-versa, depending on which end hangs up first.

If you see that the PBX sends BYE to both extension and trunk, it may have torn the call down because of a technical problem; check the previous minute to see if RTP stopped flowing, a re-invite failed, etc.

Otherwise, the call was probably torn down by an administrative function gone awry, e.g. maximum call duration, operator panel, etc. Do you see anything in the log just before the BYE gets sent?

Stewart is exactly right. Asterisk thinks RTP continuity had been lost so it tears down the call as part of zombie protection.

This is almost always a NAT issue.

Ok so just before the BYE I don’t see anything odd on the RTP debug. This is the last bit from the log.

[2014-07-10 13:09:33] VERBOSE[28250][C-00001c47] res_rtp_asterisk.c: Sent RTP packet to      208.64.**.**:49456 (type 18, seq 001939, ts 682304, len 000020)
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] res_rtp_asterisk.c: Got  RTP packet from    208.64.**.**:47702 (type 18, seq 012299, ts 1963360, len 000020)
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] res_rtp_asterisk.c: Sent RTP packet to      192.168.11.115:2226 (type 18, seq 004074, ts 1963360, len 000020)
[2014-07-10 13:09:33] VERBOSE[28009][C-00001c3c] res_rtp_asterisk.c: Got  RTP packet from    208.64.**.**:46108 (type 00, seq 019409, ts 3102880, len 000160)
[2014-07-10 13:09:33] VERBOSE[21327][C-00001c2c] res_rtp_asterisk.c: Got  RTP packet from    208.64.***.***:47046 (type 18, seq 031999, ts 5116160, len 000020)
[2014-07-10 13:09:33] VERBOSE[27943][C-00001c2c] res_rtp_asterisk.c: Sent RTP packet to      192.168.11.200:2230 (type 18, seq 027013, ts 5116160, len 000020)
[2014-07-10 13:09:33] VERBOSE[28275][C-00001c48] res_rtp_asterisk.c: Got  RTP packet from    192.168.1.100:11794 (type 00, seq 032202, ts 271600, len 000160)
[2014-07-10 13:09:33] VERBOSE[28275][C-00001c48] res_rtp_asterisk.c: Sent RTP packet to      208.64.***.***:45616 (type 00, seq 049354, ts 271600, len 000160)
[2014-07-10 13:09:33] VERBOSE[28203][C-00001c42] res_rtp_asterisk.c: Got  RTP packet from    208.64.***.***:46518 (type 18, seq 008109, ts 1295680, len 000020)
[2014-07-10 13:09:33] VERBOSE[28275][C-00001c48] res_rtp_asterisk.c: Got  RTP packet from    208.64.***.***:45616 (type 00, seq 001717, ts 274720, len 000160)
[2014-07-10 13:09:33] VERBOSE[28275][C-00001c48] res_rtp_asterisk.c: Sent RTP packet to      192.168.1.100:11794 (type 00, seq 001840, ts 274720, len 000160)
[2014-07-10 13:09:33] VERBOSE[28009][C-00001c3c] res_rtp_asterisk.c: Sent RTP packet to      208.64.***.***:46108 (type 00, seq 017489, ts 3102840, len 000160)
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] res_rtp_asterisk.c: Got  RTP packet from    192.168.11.115:2226 (type 18, seq 056763, ts 2183091699, len 000020)
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] res_rtp_asterisk.c: Sent RTP packet to      208.64.***.***:47702 (type 18, seq 028875, ts 2183091696, len 000020)
[2014-07-10 13:09:33] VERBOSE[28110][C-00001c3a] res_rtp_asterisk.c: Got  RTP packet from    192.168.11.118:2230 (type 00, seq 046847, ts 1340137307, len 000160)
[2014-07-10 13:09:33] VERBOSE[27996][C-00001c3a] res_rtp_asterisk.c: Sent RTP packet to      208.64.***.***:48740 (type 18, seq 027668, ts 1340137304, len 000020)
[2014-07-10 13:09:33] VERBOSE[27943][C-00001c2c] res_rtp_asterisk.c: Got  RTP packet from    192.168.11.200:2230 (type 18, seq 026077, ts 3901201421, len 000020)
[2014-07-10 13:09:33] VERBOSE[21327][C-00001c2c] res_rtp_asterisk.c: Sent RTP packet to      208.64.***.***:47046 (type 18, seq 047416, ts 3901201416, len 000020)
[2014-07-10 13:09:33] VERBOSE[28250][C-00001c47] res_rtp_asterisk.c: Got  RTP packet from    208.64.***.***:49456 (type 18, seq 004303, ts 682560, len 000020)
[2014-07-10 13:09:33] VERBOSE[28203][C-00001c42] res_rtp_asterisk.c: Sent RTP packet to      208.64.***.***:46518 (type 18, seq 042158, ts 1295544, len 000020)
[2014-07-10 13:09:33] VERBOSE[5191] manager.c:     -- Manager 'admin' from 127.0.0.1, hanging up channel: SIP/128-00002f73
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] pbx.c:     -- Executing [h@macro-dial-one:1] Macro("Local/128@from-queue-00004346;2", "hangupcall,") in new stack
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] pbx.c:     -- Executing [s@macro-hangupcall:1] GotoIf("Local/128@from-queue-00004346;2", "1?theend") in new stack
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] pbx.c:     -- Goto (macro-hangupcall,s,3)
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] pbx.c:     -- Executing [s@macro-hangupcall:3] ExecIf("Local/128@from-queue-00004346;2", "0?Set(CDR(recordingfile)=)") in new stack
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] pbx.c:     -- Executing [s@macro-hangupcall:4] Hangup("Local/128@from-queue-00004346;2", "") in new stack
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] app_macro.c:   == Spawn extension (macro-hangupcall, s, 4) exited non-zero on 'Local/128@from-queue-00004346;2' in macro 'hangupcall'
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] pbx.c:   == Spawn extension (macro-dial-one, h, 1) exited non-zero on 'Local/128@from-queue-00004346;2'
[2014-07-10 13:09:33] VERBOSE[27996][C-00001c3a] res_rtp_asterisk.c: Got  RTP packet from    208.64.***.***:48740 (type 18, seq 020398, ts 3261120, len 000020)
[2014-07-10 13:09:33] VERBOSE[28110][C-00001c3a] res_rtp_asterisk.c: Sent RTP packet to      192.168.11.118:2230 (type 00, seq 053875, ts 3261120, len 000160)
[2014-07-10 13:09:33] VERBOSE[28250][C-00001c47] res_rtp_asterisk.c: Sent RTP packet to      208.64.***.***:49456 (type 18, seq 001940, ts 682464, len 000020)
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] res_rtp_asterisk.c: Got  RTP packet from    208.64.***.***:47702 (type 18, seq 012300, ts 1963520, len 000020)
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] app_macro.c:   == Spawn extension (macro-dial-one, s, 43) exited non-zero on 'Local/128@from-queue-00004346;2' in macro 'dial-one'
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] app_macro.c:   == Spawn extension (macro-exten-vm, s, 14) exited non-zero on 'Local/128@from-queue-00004346;2' in macro 'exten-vm'
[2014-07-10 13:09:33] VERBOSE[28261][C-00001c40] pbx.c:   == Spawn extension (from-internal, 128, 2) exited non-zero on 'Local/128@from-queue-00004346;2'
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] pbx.c:     -- Executing [h@ext-queues:1] Macro("SIP/InboundCallsLA-00002f49", "hangupcall,") in new stack
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] pbx.c:     -- Executing [s@macro-hangupcall:1] GotoIf("SIP/InboundCallsLA-00002f49", "1?theend") in new stack
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] pbx.c:     -- Goto (macro-hangupcall,s,3)
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] pbx.c:     -- Executing [s@macro-hangupcall:3] ExecIf("SIP/InboundCallsLA-00002f49", "0?Set(CDR(recordingfile)=)") in new stack
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] pbx.c:     -- Executing [s@macro-hangupcall:4] Hangup("SIP/InboundCallsLA-00002f49", "") in new stack
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] app_macro.c:   == Spawn extension (macro-hangupcall, s, 4) exited non-zero on 'SIP/InboundCallsLA-00002f49' in macro 'hangupcall'
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] pbx.c:   == Spawn extension (ext-queues, h, 1) exited non-zero on 'SIP/InboundCallsLA-00002f49'
[2014-07-10 13:09:33] VERBOSE[28009][C-00001c3c] res_rtp_asterisk.c: Got  RTP packet from    208.64.***.***:46108 (type 00, seq 019410, ts 3103040, len 000160)
[2014-07-10 13:09:33] VERBOSE[21327][C-00001c2c] res_rtp_asterisk.c: Got  RTP packet from    208.64.***.***:47046 (type 18, seq 032000, ts 5116320, len 000020)
[2014-07-10 13:09:33] VERBOSE[27943][C-00001c2c] res_rtp_asterisk.c: Sent RTP packet to      192.168.11.200:2230 (type 18, seq 027014, ts 5116320, len 000020)
[2014-07-10 13:09:33] VERBOSE[28154][C-00001c40] pbx.c:   == Spawn extension (ext-queues, 600, 40) exited non-zero on 'SIP/InboundCallsLA-00002f49'
[2014-07-10 13:09:33] VERBOSE[28169][C-00001c40] app_mixmonitor.c:   == MixMonitor close filestream (mixed)
[2014-07-10 13:09:33] VERBOSE[28169][C-00001c40] app_mixmonitor.c:   == End MixMonitor Recording SIP/InboundCallsLA-00002f49

The failed call is on C-00001c40.

From what I can tell, it was still sending packets to and from the istp. The only debugs I see just before that are on a different channel: [2014-07-10 13:09:33] DEBUG[28275][C-00001c48] res_rtp_asterisk.c: - RTP 2833 Event: 00000003 (len = 4)

They do have a licensed FOP2. I never set any max call limit, and the time between dropped calls is random, sometimes 30 sec, sometimes 10 min.

Just so I’m clear, I have nat set to yes on sip settings. This should be correct if I’m behind a router and have exterinip set?

You have to set NAT on, you have a public and a private address. Asterisk has to work that out, that what NAT is for.

i assume that freepbx is behind a firewall, the phones are local and the sip trunks are external? if so, are the trunks registering or are they using ip authentication?

also, are you using the internet to connect to your itsp? and if yes, what speed? are you doing traffic shaping? running out of bandwidth will cause dropped calls.

@SkykingOH, I did have NAT on, but I think I remember reading somewhere that was getting depreciated, that’s why I asked, I must be mistaken.

@bksales, It used to be behind a watchguard firewall, but there were far more problems with that so we’re using a 2’nd static ip running to an Asus router running Tomato.

There is no reg string, it’s using ip authentication. I didn’t have any traffic shaping on the watchguard because I couldn’t figure the thing out. So if we need to do any traffic shaping it would need to be done on the ISP’s modem which I don’t have access to. There are no other devices attached to the Asus router so 100% of the traffic wil be voip. They have a 40mbs down and 20mbs up fiber connection and don’t think they are doing anything that can use that amount of bandwidth but I tell them to check with the ISP to have the PBX have priority on the modem.

i am assuming that you have forwarded the appropriate ports to the pbx and have set the udp timers way up or turned them off.

yes, 5060-5062 and 10001-20000, both udp forwarded to the pbx.

but what about the udp timers? what are they set to?

You know I never touched that so it’s the default Unreplied: 30 Assured: 180.

As a side note , rtp (audio/media) uses pairs of ports starting on an EVEN number, so better 10002-20000 if you need 4999 concurrent calls (adjust appropriately downwards in both your router tables and /etc/asterisk/rtp.conf to suit your needs) .

But my guess is that for quite a few reasons, you should change the ports for webmin if it is for that reason you changed the defaults, in /etc/webmin/miniserv.conf from 10000 to something else, the udp port is for “clustering” multiple webmin boxen, the default tcp port is an open invitation to the knuckle draggers out there, JM2CWAE.