Sangoma Firewall banning "work from home" folks with Sangoma phones

bmartindcs · March 26, 2020, 5:48pm

With the surge in work from home, we’re seeing a surge in people who have taken their Sangoma phones home, subsequently getting banned by the Sangoma Firewall after a few days go by (days where the phone works perfectly fine).

The interesting thing is it’s across the board randomly after a few days go by. The phones otherwise work and provision perfectly fine up until this random “event”. If we unban the IP of the particular phone they’re fine for another few days. Not clear what triggers it. I’ve looked at /var/log/asterisk files and I can see them get banned but not why.

Can anyone point me in the right direction of how to sleuth WHAT is causing them to be banned so I can then resolve that? The only thing I have seen is they get RATE LIMITED (as shown in the Sangoma Firewall interface) prior to being banned.

One particular system is:

FreePBX 14.0.13.26
Asterisk 13.27.1
Firewall Module 13.0.60.3

brk · March 26, 2020, 6:19pm

Perhaps the phones are provisioning and having problems?

If your provisioning over http(s) look in /var/log/httpd (access & errors)

I’m not exactly sure what the firewall looks at, but that’s all I can think of for other options.

kierknoby · March 26, 2020, 6:27pm

Can’t you use the built-in VPN?

Ajhad · March 26, 2020, 7:20pm

As a temporary measure you can whitelist them in the firewall and ID.
That should stop them from being banned until their IP changes.

bmartindcs · March 27, 2020, 10:22pm

I can, and may go that route. Would prefer to solve the problem if possible though. For all I know it will still happen through VPN unless I whitelist the VPN subnet

bmartindcs · March 27, 2020, 10:23pm

We have whitelisted them as a stopgap. I would like to troubleshoot/figure out the actual problem causing this though. Could, and likely is misconfiguration somewhere, but the logs are not providing clues.

bmartindcs · March 27, 2020, 10:24pm

Provisioning works fine, and the resync interval is once a day, so that can’t be hitting the rate limiter. I’m thinking it’s something with Phone Apps, but can’t find where to find out in the logs.

FreerPBXer · March 27, 2020, 10:33pm

We’ve had to do this in many cases; the responsive firewall is banning at-home phones with some regularity.

bmartindcs · March 28, 2020, 2:22am

I can’t understand why the logs don’t show WHY though; all I see is the banhammer come down.

sorvani · March 28, 2020, 4:15am

Check /var/log/fail2ban.log for an IP that is banned.

Then check the asterisk log and the apache logs.

root@pbx ~]# tail /var/log/fail2ban.log
2020-03-26 14:22:23,895 fail2ban.actions[5155]: WARNING [asterisk-iptables] Ban 173.249.41.215
2020-03-26 14:52:24,667 fail2ban.actions[5155]: WARNING [asterisk-iptables] Unban 173.249.41.215
2020-03-26 16:27:14,100 fail2ban.actions[5155]: WARNING [asterisk-iptables] Ban 173.249.41.215
2020-03-26 16:57:14,893 fail2ban.actions[5155]: WARNING [asterisk-iptables] Unban 173.249.41.215
2020-03-26 18:41:43,880 fail2ban.actions[5155]: WARNING [asterisk-iptables] Ban 173.249.41.215
2020-03-26 19:11:44,678 fail2ban.actions[5155]: WARNING [asterisk-iptables] Unban 173.249.41.215
2020-03-26 21:02:43,031 fail2ban.actions[5155]: WARNING [asterisk-iptables] Ban 173.249.41.215
2020-03-26 21:32:43,763 fail2ban.actions[5155]: WARNING [asterisk-iptables] Unban 173.249.41.215
2020-03-27 09:21:55,331 fail2ban.actions[5155]: WARNING [asterisk-iptables] Ban 45.143.220.252
2020-03-27 09:51:56,089 fail2ban.actions[5155]: WARNING [asterisk-iptables] Unban 45.143.220.252

[root@pbx ~]# grep 173.249.41.215 /var/log/httpd/*log*
[root@pbx ~]# grep 173.249.41.215 /var/log/asterisk/full*
/var/log/asterisk/full-20200327:[2020-03-26 06:13:59] NOTICE[29437] res_pjsip/pjsip_distributor.c: Request 'REGISTER' from '"0010" <sip:[email protected]>' failed for '173.249.41.215:44803' (callid: 94761f064695e9784f8902baf300735b) - Failed to authenticate
/var/log/asterisk/full-20200327:[2020-03-26 06:14:00] NOTICE[29437] res_pjsip/pjsip_distributor.c: Request 'REGISTER' from '"0010" <sip:[email protected]>' failed for '173.249.41.215:44803' (callid: 94761f064695e9784f8902baf300735b) - Failed to authenticate
/var/log/asterisk/full-20200327:[2020-03-26 06:14:01] NOTICE[22276] res_pjsip/pjsip_distributor.c: Request 'REGISTER' from '"0010" <sip:[email protected]>' failed for '173.249.41.215:44803' (callid: 94761f064695e9784f8902baf300735b) - No matching endpoint found after 5 tries in 2.620 ms
<snippy snip>

bmartindcs · March 28, 2020, 5:52am

Thank you. I did some of that but not all, I’ll double check my work tomorrow and report back.

Ajhad · March 28, 2020, 12:13pm

We have had to do the same.
At the moment, that is the only way we can confirm that the remote phones will not be blocked by the Firewall and ID.

ngingras · March 28, 2020, 3:38pm

This is not fail2ban, this is the FreePBX firewall. I have encountered the same issue with many clients. Seems like any disconnect/reconnect is causing them to get blocked by the FreePBX firewall.

This is with pjsip endpoints and Responsive Firewall enabled for pjsip. Since there is no real firewall logging, this is difficult to troubleshoot.

Adding their IP to the firewall prevents the issue, but these users have dynamic IPs which is what the Responsive Firewall function is supposed to accommodate. I wonder could it be something like BLF subscriptions prior to registration being interpreted as failed logins or something like that…

PitzKey · March 28, 2020, 6:45pm

Setup a DDNS client on the remote side, whitelist it in the firewall and you can now turn off the responsive firewall.

bmartindcs · March 28, 2020, 7:13pm

That’s not practical on scale

ngingras · March 28, 2020, 7:21pm

Dynamic DNS may be a good workaround for some users, but not others. In many cases we are going to be asking people to put a DDNS client on their personal devices because their work device uses a VPN and would therefore report the wrong IP address.

I suspect the firewall is being indiscriminate about the kind of traffic it uses to rate limit and block users, and not taking into account whether or not the user has successfully authenticated in the past. These devices are not failing to authenticate, I suspect they are just sending some miscellaneous traffic at the server during the time between when the phone goes offline and when it re-registers.

This is not, it seems, a new issue, here’s a few similar threads:

Hopefully it’s something that can be improved in the firewall in the near future.

tmittelstaedt · March 31, 2020, 11:03pm

No, it likely CAN’T be improved not easily due to a fundamental networking issue. Here is the most likely problem

When the phone is running at your employee site it’s traffic is going out on to the Internet through an address translator in their router. That translator has a set of “connection track timeouts” That is, when the phone registers in, it setups up a translation entry in the memory of the user’s router at their house.

If the TCP registration sends keepalives periodically, then the user’s router will keep the TCP connection slot open. If it does not then eventually the router will expire the connection slot, and when the phone then tries to send something it will discover it’s unregistered (and maybe try to register again) I suspect the excessive registration attempts (when FreePBX thinks the phone is already registered) are triggering the firewall. If someone wants to plug a Linux system into a monitoring port on their ethernet switch their FreePBX system is on, then setup packet filters and run a trace under Wireshark they can probably get a more exact description of what is happening.

Different models of phones handle this differently. I have experimented with a number of different models and some of them you can set a configuration that will turn on keepalives and some you can’t. Some router models will pay attention to keepalives and some won’t.

And then there are UDP connection tracking timeouts. Yes, UDP is connectionliess but a “slot” has to be setup in the router’s memory to handle the translation and that can expire also.

Here are some “fixes”/hacks you can try:

For the Netgear R7000 put the inside interface into the DMZ zone. (the end user must have a R7000 obviously) This is mentioned here:

For dd-wrt the timeouts can be modified in the web interface as discussed here:

https://wiki.dd-wrt.com/wiki/index.php/Router_Slowdown

There are 2 more discussions on this that explain the pros and cons of fiddling with these settings and why different router firmware has set what it has set in dd-wrt and one in Linksys’es forum:

https://svn.dd-wrt.com/ticket/947
https://svn.dd-wrt.com/ticket/3559

What I have done to fix this is the following:

Assign an available private subnet consistent with the office (and with all other remote worker subnets) for the user
Turn off the translator in the user’s cable modem, or DSL modem.
Install a router capable of running dd-wrt (or reflash the users router if they already own one capable of running dd-wrt)
Setup an OpenVPN connection from the user’s router to an Untangle server at the main office (the dd-wrt router is setup as an OpenVPN client the Untangle box is setup as an OpenVPN server)
Configure the DHCP server in the users router to pass the address of a TFTP server located at the office.
Configure dynamic dns on the dd-wrt router
Setup an access rule in the OpenVPN server to only permit the remotes access to the FreePBX server and TFTP server
Renumber the users home network to fit in to the office subnet (renumber to 192.168.101.x etc.)

This solution has no software licensing costs, it does NOT require that VoIP be passed through ANY translators, and I can reprogram the phones easily by changing the configuration files in the TFTP server. It is rock solid stable, does not require a DDNS client to be run on a PC, (if you want to do that you can do it on the dd-wrt ) and I use a free dydns server on the Internet which is predefined in modern dd-wrt configurations.

Routers already loaded with dd-wrt can be purchased from dd-wrt or you can download the firmware and reflash your own devices.

Untangle is also free and runs fine on any old Core i3 with 8gb ram you have lying around (just install another ethernet card in it)

And 99 times out of 100 your dd-wrt router is a superior wifi access point for the end user over their cable company’s POS wifi-enabled cable modem.

ngingras · April 1, 2020, 12:16am

A UDP timeout value that is set too short would cause the phone to go unreachable by the PBX but should not cause the user’s public IP to get blocked by the firewall.

If the phone can stay connected without issue when the user’s public IP is added to the firewall’s Trusted zone, it’s hard to blame the user’s network.

bmartindcs · April 1, 2020, 12:52am

UDP timeouts have previously long ago been increased to mitigate those related issues. I get the other guys thought on the ports from nat changing but it’s hard to say. I’m going to try provisioning via open VPN client to see results.

FreerPBXer · April 6, 2020, 4:54pm

My understanding is that IPs from which devices successfully register; the PUBLIC IP, not the internal LAN IP, are supposed to be whitelisted for some period in the Responsive Firewall. This is what seems to be not working properly, as if that was happening, re-registrations wouldn’t be a problem.

We have this issue this AM at a client with one remote phone connected via router to router VPN. The remote handset has been in place almost a year without issue, and they just informed us that it’s disconnecting. It was blocked in the Responsive Firewall.

Sangoma folks, I hope that this can get bumped up the list; it’s something that has suddenly become critical with so many people working remotely.