Disconnecting call for lack of RTP activity

clyde277 · September 4, 2018, 8:23pm

Hi folks, so I’ve been getting this log for quite a while now. It doesn’t seem to be an easy puzzle for me, and so I was wondering if I could get some clues from you guys.

The full message from asterisk log is this:
2018-09-04 12:06:49] NOTICE[2482] chan_sip.c: Disconnecting call 'SIP/FlowRoute-0000067a' for lack of RTP activity in 301 seconds

Just so you guys know I am using a Mikrotik CCR1009-7G-1C-1S+ running the latest software and firmware. We’ve done some packet queuing in regard with the firewall mangle. So every packet gets identified and queued properly based on priority. RTP, SIP, TCP-ACK… HTTP which is the last one. So far there are 0 RTP and SIP packets dropped from our queues. I have disabled SIP-ALG and set the extension NAT-Mode to Yes - ( force_rport,comedia).
My current asterisk version is 11.25.2 and PBX version is 10.13.66-21.
Let me know if I need to include any other information.

BlazeStudios · September 4, 2018, 8:47pm

That’s a direct copy and paste from the logs? There was an edit or typo or missing data in the copy and paste? Because that would mean that Asterisk waited over 5 minutes for there to be no RTP on this call. That is way above the 30 seconds FreePBX uses for the default RTP Timeout.

Show your Flowroute trunk config so we can make sure it’s not messed up.

clyde277 · September 4, 2018, 9:01pm

Here is my trunk config:

type=friend
secret=XXXXXXXXXX
username=YYYYYYYYY
host=sip.flowroute.com
dtmfmode=rfc2833
context=from-trunk
canreinvite=no
allow=ulaw
;allow=g729 ;uncomment this line if you have G.729 licenses installed.
insecure=port,invite
fromdomain=sip.flowroute.com

Yes that’s a direct copy paste from Asterisk log. I setup 300 seconds because I was having issues with dropped calls, and I thought that increasing the Timeout from 30 to 300 might help.

also not sure if worth mentioning these other Media & RTP settings:

Reinvite Behavior: No
RTP Timeout : 300
RTP Hold Timeout: 300
RTP Keep Alive: 0

cynjut · September 5, 2018, 2:12pm

Obvious questions - are you properly forwarding the UDP RTP ports from the external firewall to the PBX, and do you have all of your external connections correctly identified in the Internal Firewall?

clyde277 · September 5, 2018, 9:06pm

That’s a great question. Let me explain.

On the PBX the RTP port range is from 10000 to 200000.
On one mikrotik RTP ports are the same, 10000 to 20000
But on another mikrotik RTP ports are from 8000 to 65000. Now this is the setup that a VoIP consultant suggested, since he was the one who setup our Mangle and Queue on the Mikrotik. Obviously the Mangle rule specifies to mark packets that comes off our PBX public IP. Since PBX will probably never send any ports above 20000, I should probably change that on this mikrotik. (Although I have already changed on the other mikrotik in another office).

On the other side, our polycom conference phones have an option when we specify the start of the RTP ports, but we can’t specify the ending of the RTP ports (not sure why). We have like 10 conference phones, and RTP ports are left at their default one which is 2222. I can certainly change this port, and I am actually doing some test after I just changed the port. But I am not sure if this is my issue, since dropped calls are coming from cisco phones as well.

So basically from what I understanding, PBX does not receive any RTP packet from OUR network and then after 300 seconds it just drops the call. But to me this is very odd, because one day we had one of our recruiters talking on the phone, when she realized after 1 min that the call had dropped. So indeed our network was sending RTP packets to PBX.

BlazeStudios · September 5, 2018, 10:39pm

You need to be more specific with this. Your first post stated there was a Mikrotik router and now there are two involved.

Where is the PBX located, On-premises or hosted?
If the PBX is hosted, is one of these Mikrotik’s in front of it?
Where are the phones located in relation to the PBX? In the same network structure or remote (going over the Internet) to the PBX?
Where does this second Mikrotik fit in?

clyde277 · September 6, 2018, 3:18pm

Sorry, my bad, poor explanation.

Two mikrotiks are just two different offices that are talking to the PBX.
Our PBX is not on premise. Is cloud-hosted. No mikrotik is in front of the PBX.
So phones are remotely in regard to PBX. They’re behind the NAT, and behind the Mikrotik’s.

cynjut · September 6, 2018, 3:44pm

So you have “PBX ↔ NAT Router ↔ Internet ↔ NAT Router ↔ Phone” right?

Everything needs to be set up and identified as being behind a NAT, and everything needs to know its external (Internet side) address. I’m going to guess that something, somewhere, is using the “local” address for the RTP return address through the Firewalls and the phones are losing their connections because of that.

clyde277 · September 6, 2018, 4:34pm

Is actually
PBX <-> Internet <-> NAT Router (Mikrotik) <-> Phone

There is no NAT for the PBX. Our PBX is configured with a Public IP.

As I mentioned, for the phone to be NAT aware I had SIP-ALG enabled. And SIP ALG was handling the process of converting the Private IP of the phone, to our Public IP.
Now I disabled SIP-ALG, and instead im using NAT Mode (yes) on all extensions.

cynjut · September 6, 2018, 5:20pm

SIP-ALG is OK for phones (not for the PBX, just for phones), most of the time, but the “best practice” we recommend is using the NAT settings in your phone. In order to do this correctly, you need to make sure that the settings for RTP UDP port range and the “public IP” of the phone is set correctly.

Are you still having problems, or did setting up the NAT settings correctly (including STUN servers for connections that are dynamic) work?

clyde277 · September 6, 2018, 9:04pm

Understood. Just to be clear, this issue happens here and there. For the past week we have had this 4 time (after disabling SIP ALG and Enabled NAT on Phones), after checking the logs. One instance I know for sure that it was a real drop issue, since our recruiter was talking to the other person. I am not entire sure about the other instances, but I like them to treat the same way.

When you say, that I need to make sure that the settings for RTP UDP port range and the “public IP” of the phone is set correctly. Do you please mind to explain that in a bit more detail? I am having a hard time to understand it.

Also, not sure if this helps, but we use Cisco SPA504G for desk phones, and Polycom soundstation IP 5000 for conference phones.

cynjut · September 6, 2018, 11:19pm

The PBX uses ports 10000-20000 for RTP traffic. Your devices should match, although a mismatch seldom causes problems. The other piece is that traffic that traverses the Internet can only go from routable address to routable address. In this case, your phones need to be set up to use a STUN server to set that external address) or set the external (routable address on your firewall) so the NAT can get the traffic back to your router.

Since you’re seeing this error on the server, make sure the extensions (in the server) and the phones are all set up for NAT and make sure the phones know their “routable Internet address” as well as their local (non-routable) LAN address.

BlazeStudios · September 7, 2018, 12:40pm

@clyde277 Alright, this is starting to get out of hand with these suggestions. I use Mikrotik 100% of the time with my deployments for phones/FXS gateways/ATAs/PBX’s,etc deployments at the locations I service. I never am required to setup NAT/port forwards, etc because the Mikrotik deals with the NAT properly. Not only that I use Queues for prioritizing the traffic on the networks when the Internet connection is not the greatest in the world.

The PBX is hosted off-premises without a firewall in front of it, based on the description. The PBX should have ITS settings to show it is on a public/static IP that is not NAT’d.
The extensions, as previously pointed out, need to have their NAT settings under the Advanced tab for the extension set to Yes.
In regards to RTP, the PBX uses 10000-20000 for ITS audio. The phones DO NOT HAVE TO MATCH THAT. Phones will generally show a single RTP port this is the port they will use for the first SIP call it handles. If it receives/initiates another call it will add a random (or set) number to the original RTP Port setting. So if the phone has 8000 then the first call will use 8000/8001 (RTP always takes two ports) then the next call will be 8000+15 so it would use 8015/8016, etc, etc, etc. Because most of these phones can support up to 24+ CALLS which will require 24+ RTP port pairs.
You do NOT need to entire the WAN IP, etc into the phones NAT settings. The Mikrotik will handle all of that just fine. With or without the SIP services enabled on the Mikrotik.

This is sounding like an issue with your Queues and how the person who set them up as handled the Packet Marks and the Post/Pre Mangle firewall rules and perhaps a few other settings. I would need to see the full export of the Mikrotik config to figure out where the issue is.

But the bottom line is, I have Mikrotiks doing EXACTLY what you have described yours is supposed to do and the poor rules for the Queues/Marking can cause these issues.

BlazeStudios · September 8, 2018, 12:44am

@clyde277 OK so I looked at your firewall configs that you sent. They are wrong. The mangle process is not setup properly. At least not how I’ve done it with SIP setups. There should be rules that use the pre/post routing chains not forward.

But this is looking like your issue is a poorly configured router.

Stewart1 · September 8, 2018, 7:17am

IMO it’s extremely unlikely that this is the cause of or directly related to your dropped calls.

If something goes wrong with RTP in one or both directions, when the party presently speaking pauses and fails to hear a response, he’ll say “hello, are you still there?” a couple of times, decide the line has gone dead and hang up. This will typically occur within 30 seconds. Normally, one or both ends will send BYE and Asterisk will take the call down, long before the 300-second timeout.

So, I suspect that whatever caused the drop also prevented Asterisk from receiving the BYE requests, in spite of multiple retransmissions by the phone, so Asterisk didn’t see a problem until almost 5 minutes later.

With luck, you’ll find relevant entries in the Asterisk log between 5 and 4 minutes prior to the RTP timeout. Or, there may be other clues, e.g. if there are multiple calls in progress at the same office when one drops, do they all drop?

Otherwise, packet captures on the PBX and/or Mikrotik may allow you to see what goes wrong when conversation stops flowing. I typically start a command such as:
tcpdump -s 0 -C 100 -W 100 -w rbuf -Z root &
which will capture all packets to/from the PBX into 100 files of 100 MB each (rbuf00 … rbuf99).

When trouble occurs, depending on your system’s traffic, you’ll have from a few hours to a few days to locate the file or files with the problematic call, download them to your PC and analyze them with Wireshark.

clyde277 · September 13, 2018, 5:16pm

Thanks for taking your time and looking through this, I appreciate it. Sorry for the lack of response, been busy with a bunch of security projects, now I got some time to get back to VoIP.

So you’re saying that instead of forwarding udp ports from 10000 to 20000, I should use pre/postrouting.
My questions are:

Do I need to do this with all my mangle forwarding rules, or just RTP and SIP?
When should I use prerouting, when the connection is as incoming state? And the same question goes for postrouting, do I have to do postrouting when the connection is at outgoing state?
Do I need to do this for both, mark the connection and the packet, or just one of them?

The reason why I am asking is because as you know all mangle rules are then connected with queues, and I don’t want queues to stop working (i can still figure it out if they’re working or not from counts thought).

Thanks again for your help

clyde277 · September 13, 2018, 5:26pm

FYI, I just checked asterisk logs and for the past 6 days this log hasn’t showed at all (finger crossed).

Do you guys think that if there were really severe issues on the config, either PBX or my network, I would have seen this message more often? I would certainly keep looking asterisk logs more frequently and see if I still get this message.

My hope is that disabling SIP-ALG on the mikrotik and use NAT - yes,force from PBX might have made the UDP connections more stable…?

clyde277 · September 13, 2018, 5:32pm

This is a great suggestion! I have done packet capture on my internal network with port mirroring, and I didn’t have a chance to experience a real-time call to be dropped. Although, I think that doing tcpdump on my PBX is a good call.

Note that these dropped calls happens very randomly, and it definitely was happening more often when SIP-ALG was enabled. They never happened in a group or office.

You mentioned that there are some other logs relevant to this issue between 5 to 4 minutes prior the call. Do you know how these logs would look like by any chance?

Stewart1 · September 13, 2018, 5:48pm

I don’t know of anything specific, but since it’s likely that the problem started about 5 minutes prior to the RTP timeout, it’s worth looking to see whether Asterisk detected anything then.

On most systems, the relevant log is
/var/log/asterisk/full
or (if it has already been rotated)
/var/log/asterisk/full-(some date)

clyde277 · September 13, 2018, 5:51pm

Ah, got you. Yes makes total sense, I didn’t thought about this before, good call!
And yes, those are the paths that I have been looking asterisk logs from.

Thanks!