Calls drop at 30 minutes

I have posted about this before but the thread expired since I didn’t get a change to duplicate the error. However, I have been able to narrow down a few specifics since my last post. In inbound calls, the calls drop at almost exactly 30 minutes every time. This time I’ve been able to spend the past 12 hours troubleshooting and still come up with nothing. These are the logs around the time of the drop:

[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] bridge_channel.c: Channel SIP/SCRUBBED left ‘simple_bridge’ basic-bridge
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] app_macro.c: Spawn extension (macro-dial, s, 24) exited non-zero on ‘SIP/SCRUBBED’ in macro ‘dial’
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx.c: Spawn extension (ext-group, 001, 18) exited non-zero on ‘SIP/SCRUBBED’
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx.c: Executing [h@ext-group:1] Macro(“SIP/SCRUBBED”, “hangupcall,”) in new stack
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx.c: Executing [s@macro-hangupcall:1] GotoIf(“SIP/SCRUBBED”, “1?theend”) in new stack
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx_builtins.c: Goto (macro-hangupcall,s,3)
[2020-05-15 15:54:41] VERBOSE[32691][C-00000006] bridge_channel.c: Channel PJSIP/202-00000026 left ‘simple_bridge’ basic-bridge
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx.c: Executing [s@macro-hangupcall:3] ExecIf(“SIP/SCRUBBED”, “0?Set(CDR(recordingfile)=)”) in new stack
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx.c: Executing [s@macro-hangupcall:4] NoOp(“SIP/SCRUBBED”, "PJSIP/304-00000028 montior file= ") in new stack
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx.c: Executing [s@macro-hangupcall:5] GotoIf(“SIP/SCRUBBED”, “1?skipagi”) in new stack
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx_builtins.c: Goto (macro-hangupcall,s,7)
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx.c: Executing [s@macro-hangupcall:7] Hangup(“SIP/SCRUBBED”, “”) in new stack
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] app_macro.c: Spawn extension (macro-hangupcall, s, 7) exited non-zero on ‘SIP/SCRUBBED’ in macro ‘hangupcall’
[2020-05-15 15:54:41] VERBOSE[32674][C-00000006] pbx.c: Spawn extension (ext-group, h, 1) exited non-zero on ‘SIP/SCRUBBED’

This was happening to us using a Sophos UTM 9 firewall. At exactly 30 minutes the call would cut out. No click, no noise nothing. You would be talking to yourself for a while until you realize the other person dropped off. In the end it was our firewall and we could not fix the firewall so we setup a PFSense firewall just for the FreePBX and the problem went away. A true pain because it is another device to maintain but zero drops since doing this a year ago.

You’re the second person to suggest that. We run USG Pros and I just can’t believe that something that expensive doesn’t support it. I figured it had something to do with the state table but the Vyetta settings under the hood don’t allow me to change anything other than times instead of how aggressive the state table is rolled.

No issues like this with ZyXEL ATP and USG firewalls, if you want another option.

We tried everything and our firewalls are not cheap but free PFSense fixed it. If you got the time install it on an old box with two network cards create a few NAT rules and it’s ready to go. Do some test calls and see if it drops. 99% it is your firewall.

We’re running USGs today…what am I missing? I’d like to keep it simple without the pfsense box but if that’s what it boils down to, I’ll just swallow my pride and do it.

sip or pjsip?

maybe see if “RTP Keep Alive” in sip setting has any effect?
check your debug on the calls to confirm if you have session-expires 1800 set /. try session-timers=refuse.

is there any SIP signaling when it drops or does the firewall just drop the forwards from the NAT table?

pjsip

RTP timers don’t seem to phase the issues. I have session timers off but that too seems to have no impact.

I captured the logs just before the call fail. I had to upload the logs to pastebin due to length. If anything jumps out at you, please let me know.
https://pastebin.freepbx.org/view/6201e711

Please use pastebin.freepbx.org to post logs – you can use pastebin.com if you prefer, but make sure that Paste Expiration is set to Never so future readers of this thread can benefit.

I’m puzzled why the PBX is trying to do the re-invite. Do you have any session-timers parameters set for the trunk? But in any case, I suspect that the 491 is because SIPStation sent a re-invite that somehow didn’t make it through to the PBX.

Do you have any trunks other than SIPStation on this PBX? If so, do they fail in the same way?

Please confirm that the Conntrack Module for SIP has been turned off in the USG, and that port 5160 is hard forwarded to the PBX (from anywhere, or from the trunk IPs). Post any other VoIP-specific settings in the USG.

Why are you using chan_sip? If you had trouble with pjsip, post details. Also, post the present settings for the trunk.

Have you tried capturing traffic on the WAN side of the USG? That may show an incoming re-invite that is being mishandled.

This is almost always caused by a system timeout for UDP or SIP sessions. You will want to adjust the following:

  1. Ensure that SIP ALG is disabled in your firewall.
  2. Verify any UDP session timeouts related to the firewall or specific policies.
  3. Specifically map any RTP ports that your carrier may require.
  4. For the PBX on the Firewall disable any port randomization for the NAT policy on the RTP and SIP ports.
  5. Make sure any outbound traffic is being mapped to the same virtual ip addresses on the outbound policies as they are for the inbound policies; this only applies if you have additional static ip address.

The reason why people get this to work with pfsense or similar firewalls and not the higher end models such as the Fortigates, Sophos and other NextGen firewalls is because the pfsense and related firewalls don’t have an Application Layer Gateway built into them.

If you take care of those five items I mentioned it will almost always just work. We deal with a great many issues like this and almost for every client it is one of those 5 things.

Jeremy

What models? I can pull settings from one similar.

As others have said, disable SIP ALG. Also ADP, entirely.

I attached the firewall settings below. We’re running a USG 4P at this location.


image

I’ll try and catch up on the previous comments below…

No session timers on the trunk, I event attempted to explicitly disable session timers on the trunk in case Sipstation was doing something funny.

I have 4 trunks, all fail in the same way. You mentioned the chap_sip vs pjsip, that was what auto configured with I initialized it with Sipstation.

I did a tcpdump on the interface of the PBX and the USG both and, to me at least, it just looks like the call terminated. Very likely I could have missed something though.

See attached screenshots for firewalls. I’m not entirely sure I’ve ever seen port randomization for NAT policies?.. We only have one IP for the location so the VIP policies are non-existent.

well your debug on pastebin shows otherwise unless i’m missing something

Session-Expires: 1800;refresher=uac

  1. INVITE sip:[email protected]:5060 SIP/2.0

  2. Via: SIP/2.0/UDP WAN_IP-SCRUBBED:5160;branch=z9hG4bK26806f87;rport

  3. Max-Forwards: 70

  4. From: <sip:DID-SCRUBBED@WAN_IP-SCRUBBED:5160>;tag=as2c38befe

  5. To: “UNKNOWN CALLER” <sip:[email protected]>;tag=D2319Qt73c8Fr

  6. Contact: <sip:DID-SCRUBBED@WAN_IP-SCRUBBED:5160>

  7. Call-ID: 61c86a37-1227-1239-af81-c81f66c921dc

  8. CSeq: 104 INVITE

  9. User-Agent: FPBX-15.0.16.49(16.9.0)

  10. Session-Expires: 1800;refresher=uac

1 Like

your debug shows chan_sip not PJSIP

what’s going on

Welcome to my world. None of the settings in the GUI match what I’m seeing in the logs. According to CLI, the trunks are legacy SIP but all extensions inside are pjsip. Where would one go hunting for the Session-Expires in the CLI?

That’s a Ubiquiti product, not a ZyXEL product. Ubiquiti is prosumer gear. ZyXEL is SMB and Enterprise; not at all on the same level.

USG in our world stands for UniFi Security Gateway. Sorry if that created confusion.

We once had an issue with session timer on the carrier side. The customer was very far, with poor Internet service from the POP. The carrier session timer set for 30 minutes was not receiving a timely response. The carrier session timer kept cutting the call at 30 minutes until we asked them to turn it off.

we once had the problem that “refresher=uac” was handled out but not interpreted by our uac (that was on avaya cm6.3 in a special scenario). so check if after 15 minutes a session refresh is done by uac (it always appears after half session-expires). if this timer is not refreshed, the session expires after 1800 seconds in this case.
regards,
andre

One of my customers had this same issue. I told them to tell employees not to make calls longer than 29 minutes. Apparently this solved the problem as I have not heard back from them since.

FPBX humor

2 Likes