Firewall update

lgaetz · May 12, 2021, 7:44pm

Forum user @yois has contributed code to the firewall module, I believe as a result of discussion in this thread

These changes are published now to Firewall ver. 15.0.8.19 which can be installed from the edge repo with the command:

fwconsole ma upgrade firewall --edge

To revert back to the current stable repo version, use the command:

fwconsole ma downloadinstall firewall --stable

Perhaps @yois will remind us what the change is, and the conditions under which it can be tested.

yois · June 4, 2021, 3:49am

Wow, I’m not sure who just fixed the forum notifications, but I just received a flood of notifications of @ mentions and replies from the past 6 months… Sorry @lgaetz for not answering this sooner! I simply didn’t see it.

I think many of us as end-users of FreePBX have known for some time that the Responsive Firewall module is broken. I - like many of you - would frequently see legitimate traffic be blocked by the module, with no logging giving any clue as to why it was happening. This was happening regarding both SIP endpoints and trunks. The only way around it was to whitelist IP’s where I knew traffic was coming from, or to whitelist entire IP subnets based on ISP when my devices were behind dynamic IP’s. This is time consuming and unscalable.

After digging into the code, I determined what I believe to be the root cause of the problem. The firewall code refreshes itself every 60 seconds. During that refresh, a list of all registered devices and SIP trunks are collected and added to a whitelist, and devices that are no longer registered are removed. For new device connections (which can happen between refresh cycles) there is a monitoring service that watches Asterisk for successful registration packets. Once a successful registration packet is detected, the IP is given a 90 second pass through the Responsive Firewall. During those 90 seconds, the Firewall service will refresh and the IP will end up on a “permanent” whitelist.

This process had three problems. Firstly, the monitoring service would only add the packet to the whitelist once every hour. Therefore, if a device was set to a shorter registration timeout, or if the device would drop offline due to a lost QUALIFY packet, the reregistration attempt would not be granted the whitelist. If, for some reason, the device couldn’t register within 10 packets (like there’s a BLF subscription, MWI subscription, or there are multiple devices that all dropped off the network for a second), the rate limit would kick in, and the devices would keep sending packets and get locked out.

The second problem is that trunk IP addresses would only be determined by their DNS A record. But many service providers use DNS SRV records to indicate valid IP addresses that will send traffic. These IP addresses would never be added to the RFW, and once they would send 50 packets or so, they’d end up blocked as well.

The third problem was that there were times that I would misconfigure a device, and it would send bad registration packets. Fail2Ban would catch these packets and block the IP sending the traffic. Unfortunately for me, there were over 75 endpoints at the site and everyone lost phone service until I unblocked the IP from Fail2Ban.

The fixes that were pushed into Edge do 5 things:

Every IP gets a 90 second whitelist regardless of whether they send a valid registration packet. We can safely rely on Fail2Ban to catch any serious malicious traffic, and we’re not exposing ourselves to any significant increased risk, considering the DoS that RFW was causing.
Once an IP has successfully registered, it will be entitled to the whitelist again immediately upon deregistering. There’s no waiting period to get another 90 seconds if you have just come off a registration.
A DNS query will be made for first level UDP SRV records of trunks, and add them to the whitelist.
IP addresses added to PJSIP’s ‘match’ field will be added to the whitelist.
There is a new option in the GUI to have registered IP addresses be ignored by Fail2Ban. The likelihood of malicious traffic coming from an IP that has registered devices seems so obscure that this seems to be a wise choice to me. However, see the thread mentioned by @lgaetz above, that there may be good reason to leave this turned off. My compromise was to make this a choice in the GUI, so you can all decide for yourselves.

I know some of you are requesting additional logging for Responsive Firewall. Based on my limited understanding, I don’t think it’s possible. Since the underlying mechanism of Responsive Firewall is xt_recent, all that is done is just to count packets. There is no other useful information, other than that you have sent too many packets. Basically, if you’re being blocked, you’ve been dumped off the whitelist, and you need to figure out why.

Whoever can test this and post back if you’re getting less false positives it will be greatly appreciated.

Please note that I’m not a Sangoma employee. Buy me a beer!

ashcortech · June 4, 2021, 11:32am

+1 on the notifications fix. logged in this morning and it was like Armageddon had hit in the forums…

Not sure if it’s relevant to this or not but the last 3 installs I did, moving from using Digium Phones Module and no FreePBX Firewall to EPM with DPMA and using the FreePBX firewall I’ve had a hell of a lot of weird problems with phones intermittently not being able use the REST Apps. Voicemail and Parking are the two main ones but honestly that’s 99% of what my users use.

It’s intermittent and sometimes a phone reboot fixes it, sometimes it doesn’t. With parking, when it works I see an entry in the httpd access_log of the request. when it doesn’t, no request. So one of my thoughts was it’s a firewall issue.

Is it possible the issues you found could be blocking requests from phones for the REST apps like Parking?

ashcortech · June 4, 2021, 11:43am

Another thought…

I know you mentioned this above but is there honestly no way to produce any meaningful logging from the FreePBX firewall? Having a firewall with no logging is a nightmare. At least a record of what it blocked so that you don’t spend hours/days trying to figure out where the disconnect is.

yois · June 4, 2021, 1:45pm

Addressing both of those points in one post:
Both fail2ban and Responsive Firewall will block all traffic on all ports. It’s easy to at least see if it’s one one of those things blocking you, either from the blocked hosts tab for RFW, Intrusion Prevention for fail2ban, or from the shell in Fail2Ban (and I have a PR to bring fail2ban management to the GUI without Intrusion Prevention)
What you’re describing doesn’t seem related to anything that was changed; test and LMK.

msteinwinter · June 13, 2021, 9:36pm

I’m seeing far fewer invalid rate-limit blocks with the updated firewall.

Before this update, my mobile extensions are usually rate-limited when making their first outbound call. Careful inspection and time-correlation of firewall.log, the full log, a tcpdump, an sngrep display, and the “blocked hosts” tab in the GUI show exactly when the problem occurs and clears but offers no explanation about why. Specifically, there is no evidence of a sudden batch of requests from the extension trying to call, yet it’s rate-limited most of the time for anywhere from 2-40 seconds. The only clue I see is in tcpdump:
“vps1617465316.freepbxhosting.net > 165.sub-174-255-64.myvzw.com: ICMP vps1617465316.freepbxhosting.net udp port sip unreachable, length 556”
apparently after the firewall starts the block.

To yois, if this proves to be reliable. I’ll buy you a case of beer!

TheWebMachine · June 15, 2021, 11:32am

That makes two of us on the case of beer!

We’ve been awaiting exactly this type of improvement, especially (5). We’ve had to keep firewall disabled on AWS and rely only on f2b/AWS SG to prevent rate-limit blocks since everyone is “external.” We’ll be putting it into testing immediately. (Yes, we are that behind on notifications haha)