Urgent-How to prevent unreachable trunk

All voip providers I’ve used, namely voipms and telnyx, are becoming unreachable every few minutes (as per the FreePBX logs) and during the period of unreachability, the calls drop and you can’t place outgoing calls.
Can someone please advise of general ways how to prevent trunks from becoming unreachable i.e. which firewall timeout settings…

This is happening to all trunks, including IP AUTHENTICATION and standard SIP auth trunks, so it is not a PBX issue most likely. Firewall is a Fortigate. UDP idle timeout on the firewall is 300 secs, I’ve also tried 120 (no difference).
Logs look like below (same for all providers; just becoming unreachable then reachable on a regular schedule). So probably a NAT problem. SIP helper/ALG turned off on firewall. What e.g. UDP timeout settings does anyone recommend for setting on the firewal, etc. to stop the unreachability?

41866	[2022-06-14 17:22:10] VERBOSE[19452] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Unreachable	
41867	[2022-06-14 17:24:07] VERBOSE[31230] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Reachable	
41868	[2022-06-14 17:53:10] VERBOSE[16847] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Unreachable	
41869	[2022-06-14 17:55:08] VERBOSE[16847] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Reachable	
41870	[2022-06-14 17:59:10] VERBOSE[31230] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Unreachable	
41871	[2022-06-14 18:00:08] VERBOSE[31230] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Reachable	
41872	[2022-06-14 18:26:10] VERBOSE[9123] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Unreachable	
41873	[2022-06-14 18:27:07] VERBOSE[9123] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Reachable	
41874	[2022-06-14 18:28:10] VERBOSE[2397] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Unreachable	
41875	[2022-06-14 18:29:07] VERBOSE[26835] res_pjsip/pjsip_configuration.c: Endpoint Telnyx is now Reachable	

always a network problem , tell us more about your router setup and connectivity.

Fortigate firewall
Static WAN IP
PBX is not behind a VLAN;
ports forwarded for VOIP correctly
internet connection 100% stable (internet is not going down, ISP not closing any ports, it is just that all the trunks are becoming unreachable then reachable again, no matter if IP auth or regular SIP auth)

all IP’s and FQDN’s of the voip providers are 100% whitelisted on firewall (they are not being blocked; I think something is timing out??)

No IPS or Antivirus etc. on the firewall rules relating to PBX.

SIP ALG and SIP HELPER currently off - should I try enabling and if so, how could that help prevent the unreachability problem?

Qualify/reg expiration/retry intvl etc. settings on PBX aren’t making any difference; the trunks become unreachable no matter what… it seems it is a higher-up network problem which is disrupting the traffic i.e. something to do with firewall session timeouts ???

From 10000 feet diagnose with sngrep , just look at the registrations if not using IP rules, the INVITES will later show the route for media.

How are you determining this? Web browsing can tolerate quite large temporary outages, or packets stuck in buffers behind high volumes of traffic, but VoIP is a lot more time critical.

Also, for urgent requirements, peer support forums are a bad choice. Marking things as urgent won’t change how fast you get responses, or whether you get useful ones.

Logs on firewall show absolutely no outage; there are several servers/camera streaming stations/etc etc etc. applications all of which are used constantly at this location that require internet and any slight (less than a second) outage would be noticed and logged. There’s also a special uptime monitor device installed which constantly monitors internet status and it never goes down. Please believe me the internet is not going out.
trying to sort out the unreachability problem

“unreachable” means that it appears to Asterisk that the internet has gone way, a least to the other end of the trunk. It is the result of failing to get a timely response to an OPTIONs request after several retransmissions. You would have to have NAT timeouts of less than a second for them to cause false positives on this.

You don’t have to send OPTIONS, and without them you won’t get any “now unreachable” messages, but you may find that you lose incoming traffic, because there is nothing to keep dynamic NAT and firewall rules live (which you could fix by making them static). However, you say you are having calls dropped. Unreachable will stop calls being started but if they ar dropping they are being directly affected by the outage. What is being logged as the cause when this happens?

You indicated that you had a lot of high priority traffic. Is it possible that that is being prioritised in your network, but SIP is not (and possibly RTP)?

Could your service provider be under DDoS attack? Could some part of the internet between you and the ITSP, but not involved in your high priority traffic, be overloaded?

For chan_pjsip, setting qualify frequency to 0 should stop the OPTIONS and the “unreachable” messages, but this is just hiding a symptom.

I know 100% for a fact that internet is not going down at all, and I understand that if it were unstable then it would be causing problems with VOIP

What you are saying about OPTIONS makes sense;
Should trunks using IP AUTHENTICATION (not traditional sip password auth) have blank fields for most of the registration expiration/qualify etc. fields such as below? maybe the IP AUTH trunks sending useless qualify/register messages was clogging something since IP AUTH trunks don’t need to do any of that

VOIP traffic including RTP is 1st highest priority, everything else is medium/low on QOS in the firewall, so QOS is not a problem
there isn’t an extreme amount of WAN bandwidth being used on the network

No DDOS attack.
Unreachability problem happening on voipms and telnyx and none of them are undergoing a DDOS attack as the advanced failover options set on the provider-side still works without problem, etc. I feel that I don’t have to explain in detail why there isn’t a DDOS…

Using PJSIP, not chan_sip.

here is what happens with calls:
Trunk reachable - call starts, all good. The moment that it becomes UNreachable, audio gets lost. Both inbound/outbound calls are completely down until it becomes REachable again. Once again- during period of unreachability, nothing works. No logs that any calls are reaching either provider.

So what do you think about the OPTIONS and NAT timetout settings as above?

Plus I should also point out that I know it’s not an internet loss or DDOS problem because:
the trunk goes unreachable almost always after it’s registration expires. And trunk using IP auth still becomes unreachable mostly at regular intervals. It takes 1-4 mins for the trunk to become reachable again most of the time.

Update: removing the register exp/timeouts/OPTIONS values from the IP auth trunks’ config still doesn’t change anything; still becoming unreachable.

I look forward to suggestions i.e. regarding the main Asterisk SIP settings, firewall, NAT ??? since the qualify and reg exp don’t change anything ??

How to install sngrep on the PBX ? I am not using an SBC. Can’t find the centos package anywhere

It’s pre-installed in the FreePBX Distro. If you built FreePBX from sources, perhaps

will help. If you can’t get it to work easily, just capture with tcpdump and analyze with Wireshark.

Your post is marked ‘urgent’, so I assume that this is a production system that used to work reliably but doesn’t anymore. Do you have any idea what caused it (module update, system update, firewall update, new ISP, etc.)? Can you back out the change until you can understand what is wrong?

If this is a new system, I’m not impressed – it’s causing trouble several times per hour; even minimal testing would have shown that it’s not ready for production.

As @david55 said, qualify failures do not affect in-progress calls in any way. The RTP failures should be very easy to troubleshoot. Is outbound audio affected? Inbound audio? Both directions? Run packet captures on both the PBX and the WAN interface of the firewall. On a test call to or from an internal extension (one on the same LAN as the PBX), what happens when audio stops? Is Asterisk still sending it, is the firewall still passing it to the ISP, etc?

UDP timeouts should not be relevant in this situation. Assuming that Asterisk’s RTP port range is the default 10000-20000, the firewall should be configured so all UDP packets arriving from any address and any port with destination port 10000-20000 should be forwarded to the LAN address of the PBX. Likewise, any outbound UDP packets from the PBX with source port 10000-20000 should be passed to the WAN, with source and destination ports unmodified. This should happen regardless of any previous traffic and therefore does not depend on any timeouts.

Qualify is the only thing that affects the “unreachable” messages and is orthogonal to the use of registration. It is desirable even for IP authenticated endpoints.

I’m not sure, but I suspect leaving fields blank will give you FreePBX defaults.

For the record: No new changes have been made.
The trunks started becoming unreachable seemingly randomly.

Thanks for pointing to sngrep, I ran it and it’s easy to use to analyse the SIP traffic. The OPTIONS was the problem.

When I mentioned that call audio was lost it was because the trunk went unreachable in the middle of the call. No endpoint was losing RTP/there weren’t any issues with the RTP traffic. So when I say dropped call/audio lost in this case, it’s because the trunk became unreachable in the middle of the call.

After a lot more tweaking, I settled on Qualify frequency of 13secs for the regular SIP authentication trunks and register exp at 300secs. Now the regular SIP auth trunks aren’t becoming unreachable anymore. Odd that something this simple fixed it. As someone else mentioned on another thread, the qualify freq needs to be lowered in most cases if you are experiencing the unreachable endpoint problem because the NAT traversal poisons the connection / times it out, so if qualify freq is lowered then it constantly sends “are you there” messages (OPTIONS) to the endpoint to keep the connection established. However I still haven’t been able to stop the IP Authentication trunks from becoming unreachable; I’ve just settled on using the regular SIP auth trunks and seems to be working as intended now with the lowered qualify freq and register exp.

This would only happen if the trunk was really unreachable; it would happen because Asterisk declared it was unreachable.

1 Like

Something is wrong with your firewall setup. No ‘connections’ should be required and there should be nothing to time out. A SIP packet from any of your provider servers, or an RTP packet from anywhere, should be hard forwarded to the LAN address of the PBX. No outbound traffic should be needed.

With an IP authentication trunk set up and qualify (and of course registration) disabled, an incoming INVITE should reach the PBX. If not, your router/firewall settings are incorrect. This should be not difficult to track down.

For IP Auth trunks issue:
All the traffic is perfectly reaching the PBX (firewall doesn’t block anything from the providers; providers 100% whitelisted, QOS for voip#1, no dropped/rejected packets from providers), however Asterisk is declaring that the trunk is ‘unreachable’ therefore not accepting any packets or allowing the packets to be received until it deems that the trunk is ‘reachable’ once again.

If you have set Qualify Frequency to 0 (which disables qualify) and the trunk still shows unreachable, that’s a bug which you should report.

If you have qualify enabled but the provider is always ignoring the OPTIONS packets, you need to disable it, since the trunk is not compatible.

If you have qualify enabled but a few responses are getting lost (resulting in unreachable), you should track down where the failure is occurring (provider, ISP, firewall, other networking gear, etc.)

It doesn’t work that way. Declaring it unreachable stops Asterisk from starting an outbound call. It shouldn’t affect inbound calls, and it certainly shouldn’t affect media. If those are affected the forward or reverse path really is broken.

1 Like

I haven’t tried setting the qualify freq to “0” so I will try that now, monitor for a while and report back with findings if it continues to become unreachable.