SIP DID intermittently not connecting, fast busy tone

Newly Freepbx distro Asterisk 14 system built a couple weeks ago, still testing and creating IVR/Extensions, etc.
I have two SIP DIDs from two different providers.
All are registered whenever I look in Asterisk Info.
Today received provider email that a call to my DID failed (must have be robo call…)
I made 10 test calls to the DID about half succeeded, half failed fast busy. Call could fail one minute and three minutes later goes through fine. Provider says my DID account appeared offline.
I made call to my other DID with my other provider and 1 out of 3 calls failed.
Asterisk log files show nothing for the failed calls.
Fail to ban shows nothing blocked.
Intrusion detection shows nothing blocked.
Internet at office working afaik.

Appreciate your tips how to investigate further.

This is probably a network problem. The fact that you are able to connect back to their server (the report showing you are registered) isn’t actually indicative of whether they can see you. The two legs of your connection are basically independent.

You may also want to look at your network set to make sure that you don’t have a rogue DNS server setting out there somewhere. If, for example, you told your provider to go to “whatever.com” and whatever.com actually resolves to two different addresses, then every other access to the server will fail. Also, you may want to double check with your provider that they have the right IP address (instead of a host name) to avoid DNS latency problems.

This isn’t probably a problem with FreePBX. It is almost certainly a problem with your Internet connection, router, firewall, or DNS resolution. For it to be your PBX, the logs would show errors on all of these connections. Look in /var/log/asterisk/full and see if the registration (or the network connection) is dropping.

Nuts.
I register my DIDs via the SIP trunks, Incoming, via FreePbx.
I don’t see any place in the DID provider portals where I’m giving my WAN address (IP nor fqdn). I register with them via Asterisk.

I called about 10 times over several minutes and then inbound failed again for BOTH DIDs
I see this in FreePbx sip info
inbound25.vitelity.net:5060 Y userID 45 Request Sent Tue, 20 Mar 2018 20:59:46
sip.nyc.didlogic.net:5060 Y userID 105 Registered Tue, 20 Mar 2018 21:03:15

Then on Vitelity, I see my DID is ‘unregistered’.
So somehow on my end I’m losing registration?

I enable DDNS service via FreePbx, maybe I should not?
I rescanned the asterisk/full log again and only see the successful inbound calls.
We have a static WAN IP address to our LAN. I had already unplugged WAN2, our backup provider.

Log starts with the end of my last successful inbound DID call, then shows I lose registration.

[2018-03-20 21:00:45] VERBOSE[5617][C-00000013] res_agi.c: Launched AGI Script /var/lib/asterisk/agi-bin/sangomacrm.agi
[2018-03-20 21:00:45] VERBOSE[5617][C-00000013] res_agi.c: <SIP/vitel-inbound-00000012>AGI Script sangomacrm.agi completed, returning 0
[2018-03-20 21:00:45] VERBOSE[5617][C-00000013] pbx.c: Executing [s@crm-hangup:8] Return(“SIP/vitel-inbound-00000012”, “”) in new stack
[2018-03-20 21:00:45] VERBOSE[5617][C-00000013] app_stack.c: Spawn extension (ivr-5, h, 1) exited non-zero on ‘SIP/vitel-inbound-00000012’
[2018-03-20 21:00:45] VERBOSE[5617][C-00000013] app_stack.c: SIP/vitel-inbound-00000012 Internal Gosub(crm-hangup,s,1) complete GOSUB_RETVAL=
[2018-03-20 21:00:51] NOTICE[13186] chan_sip.c: – Registration for ‘[email protected]’ timed out, trying again (Attempt #2)
…keeps showing timeout attempts…

Restarted Asterisk and FreePbx SIP info shows registered.
Vitelity portal shows DID registered.
Calls immediately after Asterisk restart fast busy, after about 15 seconds, inbound DID call succeeds.

So, I’m losing registration on both SIP DIDs.

Then I just received this email alert from FreePbx

Hi,

The IP 66.241.111.30 has just been banned by Fail2Ban after
8 attempts against SIP on localhost.

Regards,

Fail2Ban

Sounds like you need to add your SIP providers to the ‘trusted’ network in the Integrated Firewall.

1 Like

Ok, getting close. Finally I see a fail2ban notice and
Provider traced a failed DID call from their side:



After reviewing a test call that failed, I show we are not receiving a response to our INVITES.

66.241.111.30 -> 72.111.33.122 SIP/SDP Request: INVITE sip:[email protected]:5060
And the lines repeat.

So is setting them as trusted in ‘Firewall’ is the only place I need to allow both provider IPs?
I see other locations for blocking in FreePbx -
White list?
Intrusion detection?
Anonymous sip?
Router?

Usually, but a couple of the others below “don’t hurt”.

Yes, you can add them to the System Admin Whitelist if you want. It’s not technically 100% necessary, but it doesn’t hurt anything.

Once they are added to the whitelist, intrusion detection should be taken care of.

No - you should always, without exception, turn off anonymous SIP on a production server. Any server that can make “toll calls” or calls to the PSTN should always have anonymous SIP turned off. Same with Guest Access.

Just to make sure you understand - this is not a place where “if you want” applies. You should never turn on anonymous or guest access to a server that connects to the outside world.

It depends on the router and firewall. In general, if you have a setting for it (the router includes the firewall) then setting these IPs as “trusted” (or whatever term the router people use) will alleviate a lot of possible problems.

1 Like

Thanks Dave
I put everything in the firewall setting to try and trust the did providers. Still intermittent failures on both DIDs. Nuts. Hard to verify any fix since intermittent.
I can’t find a way to track this on my end to isolate the cause other than the fail2ban at one point proving it is passing through the router???

My other DID provider intermittently fails too. Both are saying I fail to respond to invites.
I see nothing in router logs.
Here is what they say, same as other-

According to our records, you have placed 8 calls to 17575551212 today. 6 of them completed successfully.
As for the rest, they failed due to our system did not receive any responses to INVITEs from your FPBX.
It could be a temporary loss of connectivity/registration.

Just for grins - try running some “long term” pings. One should go to your SIP provider. One should go to your ISP, and one should go to your router from the PBX.

If you get packet loss to the ITSP, but not the other two, it could be a problem with your ITSP.
If you get packet loss between you and your local ISP, same deal.
If you get packet loss between your local router and your PBX, your local network is the problem.

Let it run for an hour and see what kind of loss you are getting. If it is anything less than 100%, you may have an intermittent network problem. If it turns out that this is the issue, now that you’ve isolated the problem, you can move to fix it.

Firewall problems are seldom intermittent, as are problems with the PBX. If it’s working, it should be working all the time. This sounds very much like a network problem to me.

1 Like

At this point, I had wished something crashed…

Pbx to router
— 192.168.0.1 ping statistics —
1836 packets transmitted, 1836 received, 0% packet loss, time 1834753ms
rtt min/avg/max/mdev = 0.394/0.462/0.776/0.035 ms

Pbx to internet provider dns address
— 68.105.28.11 ping statistics —
1800 packets transmitted, 1800 received, 0% packet loss, time 1801167ms
rtt min/avg/max/mdev = 9.590/13.997/58.587/2.508 ms

Pbx to ITSP - 35 minutes
inbound25.vitelity.net ping statistics —
2148 packets transmitted, 2148 received, 0% packet loss, time 2149088ms
rtt min/avg/max/mdev = 76.805/81.147/114.908/2.350 ms

inbound25.vitelity.net ping statistics —
7372 packets transmitted, 7372 received, 0% packet loss, time 7378380ms
rtt min/avg/max/mdev = 76.289/82.082/299.080/8.851 ms

Is your pbx directly connected to your voice provider or do you have a router in between? If there is, and the firewall on it is stateful, your router might be probably closing the inbound connection, in turn causing incoming call failure. You could try setting the keep-alive interval to a smaller value, in order to make keep-alive packets flow faster and thus avoiding the closure of the connection.

3 Likes

Thank you
I have a router.
I’ll give it a go. So keep alive interval should be smaller.
What keep alive interval, roughly, do you think i should set or have set on the router?

Add the following parameters to your trunk definition and verify if situation improves.

You can experiment with the values of qualifyfreq and keepalive, both are defined in seconds.

qualify=yes
qualifyfreq=20
keepalive=20

1 Like

I set the router UDP timeout to 300 secs.
Then I added the qualify, qualifyfre, and keep alive settings in the inbound trunks.
I have calls going through, but it is so intermittent, I can’t tell if I’m failing inbound without placing many manual calls.

My one ITSP can switch me to a different inbound server that can log activity. So they might be able to tell me if I’m dropping in and out over the day.

Will the qualify settings allow me to see in the asterisk log file if I’m dropping in and out?

It should be enough with the trunk parameters that I was referring to. The qualify will show you if your PBX loses connection to the trunk.

Have you performed a SIP debug on the trunk? That could also provide a first step to troubleshoot this issue.

1 Like

Ooops, just edited my response above.

My one ITSP can switch me to a different inbound server that can log activity. So they might be able to tell me if I’m dropping in and out over the day.

Will the qualify settings allow me to see in the asterisk log file if I’m dropping in and out?

So far…all calls going through. I’d have expected a failure at some point with 20 dials.

Past 24 hours doing ok with random calls. Wish I could somehow monitor inbound dropping.
I have an appt reminder system, I could stress test the system with some spaced out calls over a couple hours and send them to a voicemail.

I suppose now that I have qualify = yes I should see losing the connection in the asterisk log/full, whereas before I could not?

Yes, you should see whether the trunk is reachable or unreachable, but if the issue is indeed related to the router closing the connection, as soon as your freepbx sends traffic to the trunk, the connection would then be re-established.
That is what the keepalive command is used for, to keep sending packets to the trunk in order to keep the connection open.

2 Likes

Thanks. Without hard proof otherwise, it seems as suspected, the router is closing the connection. There is no regular timing pattern I can discern. Then inbound is unreachable.
At least the past 24 hours are 100% success.
I’ll try some automated appointment reminder calls to the DID, spaced out a bit and make sure I can leave voicemails to the extension.
Thought about switching my router to Tomato with custom dual WAN, but if this works, I’ll leave it be.
Thank you!