Intermittent dropped calls

Hi,
Hoping someone here can help.
I have Freepbx running 16.0.40.4 and Asterisk 20.4.0 on two VM’s, connected by an IAX trunk to pass the calls from the PBX1 to PBX2. Our SIP trunk is registered on PBX1 and configured that if unreachable it fails to an external line as a backup.
Recently we have been having strange drops where the calls are appearing in the trace in PBX1, creating a recording file, but instantly dropping and the trunk pushes the call to its failover line.
I have checked and my level of understanding does not allow me to identify what could be the issue.
See the SIP debug from the relevant seconds
https://pastebin.freepbx.org/view/18e5afe1

This call is a DID that should push be recognized by an inbound route on PBX1 as destined for a Misc destination “800”, which the dial plan should send via the IAX2 trunk to PBX2 where it enters a queue.

OVH don’t seem to be seeing your 100 Trying’s , although they do see your request terminated. When you get in consistent behaviour, like that, I’d tend to suspect a router with SIP ALG enabled.

Thanks for replying. I’m passing through a Unifi USG Pro.
Just googled it and apparently I should disable “SIP” and “H323” to deactivate ALG (which I have just done… see screenshot).
If this is the issue I will be seriously impressed!
(I’ll need 6-12 hours of calls to see if this is the solution)…

Do you recognize the address 10.7.1.163? I suspect that you may be double-NATted. Does the USG have a public IP address on its WAN interface?

If yes, check in Asterisk SIP Settings, that External Address and Local Networks are correctly set, and not overridden on the chan_sip tab. If you change these, after Submit and Apply Config you must restart Asterisk.

If no, please explain (your modem is configured as a router, ISP is doing NAT, etc.)

Also, this is an apparently new system, but your trunk is still configured with chan_sip. Why?

I don’t recognize 10.7.1.163. It doesn’t correspond at all to any of my subnets at either location and nothing like my external IP’s.
I am unfortunately having to connect to the outside via a French ISP box which cannot be put into a fully bridged mode. So, I have the USG behind that on a single static address in the DMZ. This is as close to bridged as I can get.
Asterisk SIP setting does indeed have my External IP and internal networks correctly set, nut NAT in the Chan_Sip is set to “Yes”. Should this perhaps then be better as “Never”?
The system has been working fine for several years… this is a quite recent issue.
To be honest I’m not that knowledgeable and the Chan_Sip conversion to and PJSip thing confused me, and I haven’t had time to investigate it properly.
You support hugely appreciated.

Calls still being kicked out unfortunately.
I can’t quite see what it could be as I don’t see how a mis configuration could be causing intermittent interruption… there seems to be intermittent non-responses… and this is reflected by the user experience on some extensions (long delay with no sound before eventually connecting… things like that).
Here is another inbound call (from +33 650159870) that was registered then kicked out to the failover.
https://pastebin.freepbx.org/view/451c7c74

The previous problem of 100 Trying not reaching the provider seems to have been corrected.

I believe that in this case, OVH dropped the call because they only allow 5 seconds to receive some progress indication beyond the 100 (180 Ringing, 183 Progress, 200 OK), which did not happen.

Line 119 of the paste:
[2023-09-14 11:48:40] VERBOSE[5150][C-0000079e] pbx.c: Executing [0493123600@from-trunk:6] Gosub("SIP/ovh0033422480220-00000a9f", "app-blacklist-check,s,1()") in new stack
is followed by no further progress in the dialplan.

On my system:

[app-blacklist-check]
include => app-blacklist-check-custom
exten => s,1(check),GotoIf($["${BLACKLIST()}"="1"]?blacklisted)
exten => s,n,Set(CALLED_BLACKLIST=1)
exten => s,n,Return()
...

The BLACKLIST function just does a local astdb lookup, so I don’t understand why it would hang. Have you made any customizations in this area?

I have no customisations to the blacklist module - I don’t use it at all - so I have uninstalled it for now to see if that makes a difference.
I believe the issue lies elsewhere though… As it happens multiple times a day. I also have users saying internal calls (phone to phone on same LAN) sometimes take a long time before connecting so I suspect something odd on the network.
I will try to capture more than one example and paste the log extract to try to identifying the commonality.

And to be sure I’ve changed all my extensions to PJSIP.

Sure, especially since your problem on internal calls would not involve trunk signaling or blacklist at all.

I am guessing that the Asterisk thread processing a dialplan somehow stops running for a while. But is it preempted by another Asterisk thread, by another process in FreePBX, by another VM on the host, or by a host process?

Are you running the FreePBX Distro? If not, which OS, how built, etc? Virtualization system? Is the other PBX on the same host? Other (non-PBX) VMs? Host OS? Significant workload on the host?

Posting the log of a delayed internal call may be useful.

But in this scenario, I’d expect that Asterisk would intermittently stop passing RTP on established calls, i.e. there would be periods of no audio, which you have not reported.

Is there reason to suspect exhaustion of another resource (out of memory, disk full, thread limit reached, etc.)?

When trouble occurs, is there always other activity (calls in progress or being set up)?

I would assume that is a private address, but on a different subnet from your 192.168.60.x LAN and not 10.7.1.163? That address is still bothering me, because Asterisk is presenting it. Is there any output from
grep 10.7.1.163 /etc/asterisk/*.conf

Hi Stewart
I’m with FreePBX 16.0.40.4, PBX Distro 12.7.8-2306-1.sng7 with Asterisk 20.4.0
I used to have it in a Sangoma FreePBX 60 box but the drives crapped out so I moved the PBX to a VM (Intel i7, 4 core with 8GB memory, running Linux Centos 7.4) in a NAS (QNAP TS-EC880U, 32GB memory) as I figured that would handle it.
I have that identical setup (QNAP NAS, VM) in two geographic locations, and these are my two PBX. PBX1 is connected to the external (OVH) trunk and handles all the inbound calls, connected to the second PBX by a IAX2 bridge.
Changing the ALG setting has already helped I think as we had fewer issues today.
I have just updated all the extensions to PJSIP and I will see how we go…
As to your question re: exhaustion of another resource – I don’t think so. That was my first idea… but I have checked the Ram and CPU monitors and they never go anywhere near something worrying.
As to your question re: another activity during issue: the answer is unfortunately “sometimes”. Not that helpful I know. But as soon as I get an issue (either with a refused call or a delayed internal call) I’ll capture the log and post it here. Ideally find a couple so one can identify the common theme…
Lastly, 10.7.1.163 worried me also a bit as it isn’t in our subnets. Ran a grep on the *.conf files in /etc/asterisk…. Nothing. Where the hell that’s coming from, I have no idea.
Many many thanks for your assistance!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.