Sangoma Firewall banning "work from home" folks with Sangoma phones

adairw · April 10, 2020, 9:42pm

The way the responsive firewall works is definitely flawed. We are an ISP that offers hosted phone and have a few PBX’s sitting on public IP’s. In the beginning we blocked everything except our subnets and had little trouble. In the last year we started to resell at&t fiber to customers that were outside our network. In an effort to NOT have to put a small vpn router at each customers home who had phone service we opened up the PBX server’s to the world.

Some endpoints will work perfect for days or weeks before they have problems. Others will work forever with never any problem. No rhyme or reason to why some get blocked and others don’t. Most of our clients are grandstream ATA’s with no call features of any sort. No BLF, or in many cases voicemail. They just pass calls to and from the PBX.

The biggest issue that I would like to see resolved is, if an end point which can’t correctly register (in other words, it has a proper extension and password) SHOULD NEVER get blocked. Now, if you are throwing random passwords at the PBX, yes, let’s watch you and block you.

I wish this was more configurable. I realize that’s opening a can of worms and some people would probably get themselves in trouble by incorrectly adjusting the rules… but not having ANY control or ANY real visibility in to why these endpoints get blocked at random is frustrating.

bmartindcs · April 10, 2020, 10:12pm

Try going to System Admin -> Intrusion Detection, and changing the failure count before blocking, and the window that it looks for the failures.

I’ve been tweaking that and it appears to be helping; though it’s a double edged sword as it’s also controlling the banning of bad guys obviously trying to probe the system

adairw · April 11, 2020, 12:49am

Going to test this out. Our system gets some regular blocks. curious if upping the max retry will at least slow down or limit the invalid blocking.

dcitelecom · April 12, 2020, 9:47pm

We have seen something similar with some of our users.

Typically the remote user gets the pass code wrong when trying to login to the User Control Panel from home and they get locked out by the firewall. This also locks out the phone.

bmartindcs · April 13, 2020, 3:59am

Unfortunately we’re not giving our users control panel access; the port isn’t even opened at the perimeter firewall (outside the system at the network level)

adairw · April 13, 2020, 2:55pm

Small update here. Upping the max retry to 100 on our public facing pbx and we haven’t had any incorrect blocks yet. Will need more time to be sure.

FreerPBXer · April 14, 2020, 3:01am

That’s helpful to know if we get in a pinch, and I think that’s too many attempts from a single IP. In other situations (not FreePBX) we’ve seen most of an entire class C get blocked in succession by brute force detection, which would appear to be a concerted multi-IP effort. 255x100 over a period of weeks or months adds up.

thx2000 · April 17, 2020, 9:21pm

I spent a ton of time trying to track down what might be causing this. I connected an s705 last week Friday, in the hopes that if the problem occurred for me I could trace down the issue before I had to deploy the phone to the customer. I thought I was out of the woods until I came in yesterday morning and found the phone had been blocked.

It appears, at least in my case, that it is unrelated to fail2ban. The rules that are initiating the ban seem to be derived strictly from iptables. Basically, once a phone registers, an iptables rule is created allowing most traffic from that IP. Once the phone is unregistered, depending on where the firewall monitoring system is at in its cycle, that iptables rule is removed. Then the phone is forced to go through the various rules to see if the first few packets initiate a SIP registration, after which the iptables rule is re-added.

By default, none of this is logged, so it makes troubleshooting this issue difficult after the fact. However, based on my interpretation of the iptables rules, what I believe is happening, is that after this whitelist rule is removed when a phone has been unregistered, the phone may not realize it, and either SIP packets or the various other traffic related to the rest-apps could still be hitting the PBX, while it is in this unregistered state, causing the ban.

To try to circumvent this, I created a couple of iptables rules that continuously whitelist registered IPs for 90 seconds, so that 90 seconds after a phone is unregistered, the PBX will still accept packets from that device and allow it to re-register before another ban is initiated.

This fix has not been put through any extensive testing, so I’m curious to get any and all input from the community. As of right now, this s705 appears to be staying registered, and hack attempts are still getting blocked, but I’ve only had this fix deployed for the last 3 hours.

Here are the additional rules to add to /etc/firewall-4.rules:

-I fpbxknownreg -m recent --set --name KNOWNREG --mask 255.255.255.255 --rsource
-I fpbxfirewall -m recent --update --name KNOWNREG --seconds 90 --reap --mask 255.255.255.255 --rsource -j ACCEPT

If you’re not sure how to enable custom firewall rules, check my last post here: The issue with let's encrypt certificate updating

Hope this helps, and hopes this resolves the issue once and for all!

thx2000 · April 17, 2020, 10:54pm

After posting, it occurred to me that my initial approach might have been a little heavy handed. Once a phone was registered it was allowing all traffic from that IP to hit the PBX. This updated method, pumps registered IPs through the same restrictions normally applied to registered IPs.

-N addknownreg
-A addknownreg -m recent --set --name KNOWNREG --mask 255.255.255.255 --rsource
-A addknownreg -j LOG --log-prefix "Updating IP in KNOWNREG: " --log-level 4
-A addknownreg -j RETURN

-I fpbxfirewall -m recent --rcheck --name KNOWNREG --seconds 90 --reap --mask 255.255.255.0 --rsource -j fpbxknownreg

-I fpbxknownreg -j addknownreg

I’ve added a logging line in here as well, for troubleshooting purposes. This results in every accepted packet getting logged into /var/log/messages. This could obviously get quite verbose, so if you don’t care to log, and/or after you’ve confirmed the rules are working, you may want to comment out that line by prefixing it with #.

bmartindcs · April 28, 2020, 5:12am

FUP on this, have you had long term proofing on the desired outcome here? Your hypothesis sounds like it could be the winner.

Basically the phones whitelisted IP needs to linger for awhile before being removed from the whitelist after it deregisters is what the TLDR version of the possible cause is?

yois · April 7, 2021, 3:14pm

@thx2000 I wish I would have seen this earlier, but there’s another thread addressing this issue, and I submitted a pull request with similar fixes.

You are correct that the bug involved here is that once an IP passes through the whitelist once, it is never entitled to it again since it never leaves xt_recent WHITELIST list. And, the monitoring service will only grant whitelist status to a packet that attempts to authenticate with a password. When a dynamic IP changes, the device behind the NAT doesn’t know to reregister, and the Qualify keep-alive packets are hitting the rate limits. This has nothing to do with fail2ban.

The thread is here:

And pull request is here:
https://git.freepbx.org/projects/FREEPBX/repos/firewall/pull-requests/92/overview

We’ve added a whitelist that allows any IP 90 seconds to register, and once it does it is removed from the whitelist so that when it deregisters it will be entitled to it again.

The only unsolved issue with my prorpsed patch is that fail2ban is running and will bam devices that fail auth. While this isn’t the issue in this thread, I’m suggesting that once a registered device is on an IP, no blocking should occur on that IP. Imagine a site with 100 endpoints, and a sleepy sysadmin mistypes credentials on one phone, the entire site will DoS by fail2ban. This scenario is far more likely than a brute force attack coming from the same IP where there are legitimate users.

LMK what you think.

dicko · April 7, 2021, 3:44pm

Starting the fail2ban service after my firewall leaves both functional.

I will probably already added

fail2ban-client asterisk addignoreip w.x.y.z

for sites with 100 endpoints

yois · April 7, 2021, 3:51pm

That’s a good idea.

Maybe we can script that into RFW, so that once an IP successfully registers, we can run
fail2ban-client set asterisk-iptables addignoreip w.x.y.z

and once the registration drops we can do:
fail2ban-client set asterisk-iptables delignoreip w.x.y.z

If you take a look at the other thread, @BlazeStudios doesn’t like the idea, so I’m really trying to get community input on whether this is worth pursuing. The first issue was a bug, this issue is an ‘enhancement’.

Thoughts?

shomi · April 7, 2021, 3:58pm

I see this all the time. Very easy to bring an entire office to it’s knees this way. If you don’t have a public static IP then whitelisting is not a proper solution either.

dicko · April 7, 2021, 4:04pm

I would not do that for dynamic sites or you will get in a tangle with ‘delignoreip’

for unrecognised addresses I prefer to look at

whois -h whois.cymru.com " -v -f w.x.y.z"

and

curl "https://api.ipdata.co/w.x.y.z/asn?api-key=test"

and make a decision based on likely risk to add the BGP Prefix or route to ACCEPT or DROP.

dicko · April 7, 2021, 5:06pm

Consider whitelisting the route from

curl "https://api.ipdata.co/w.x.y.z/asn?api-key=test"

It is highly likely that any awarded address will always be in that block and also quite unlikely that there will be a 400lb guy in his mum’s basement out to get you also in that block .