Wow, I’m not sure who just fixed the forum notifications, but I just received a flood of notifications of @ mentions and replies from the past 6 months… Sorry @lgaetz for not answering this sooner! I simply didn’t see it.
I think many of us as end-users of FreePBX have known for some time that the Responsive Firewall module is broken. I - like many of you - would frequently see legitimate traffic be blocked by the module, with no logging giving any clue as to why it was happening. This was happening regarding both SIP endpoints and trunks. The only way around it was to whitelist IP’s where I knew traffic was coming from, or to whitelist entire IP subnets based on ISP when my devices were behind dynamic IP’s. This is time consuming and unscalable.
After digging into the code, I determined what I believe to be the root cause of the problem. The firewall code refreshes itself every 60 seconds. During that refresh, a list of all registered devices and SIP trunks are collected and added to a whitelist, and devices that are no longer registered are removed. For new device connections (which can happen between refresh cycles) there is a monitoring service that watches Asterisk for successful registration packets. Once a successful registration packet is detected, the IP is given a 90 second pass through the Responsive Firewall. During those 90 seconds, the Firewall service will refresh and the IP will end up on a “permanent” whitelist.
This process had three problems. Firstly, the monitoring service would only add the packet to the whitelist once every hour. Therefore, if a device was set to a shorter registration timeout, or if the device would drop offline due to a lost QUALIFY packet, the reregistration attempt would not be granted the whitelist. If, for some reason, the device couldn’t register within 10 packets (like there’s a BLF subscription, MWI subscription, or there are multiple devices that all dropped off the network for a second), the rate limit would kick in, and the devices would keep sending packets and get locked out.
The second problem is that trunk IP addresses would only be determined by their DNS A record. But many service providers use DNS SRV records to indicate valid IP addresses that will send traffic. These IP addresses would never be added to the RFW, and once they would send 50 packets or so, they’d end up blocked as well.
The third problem was that there were times that I would misconfigure a device, and it would send bad registration packets. Fail2Ban would catch these packets and block the IP sending the traffic. Unfortunately for me, there were over 75 endpoints at the site and everyone lost phone service until I unblocked the IP from Fail2Ban.
The fixes that were pushed into Edge do 5 things:
- Every IP gets a 90 second whitelist regardless of whether they send a valid registration packet. We can safely rely on Fail2Ban to catch any serious malicious traffic, and we’re not exposing ourselves to any significant increased risk, considering the DoS that RFW was causing.
- Once an IP has successfully registered, it will be entitled to the whitelist again immediately upon deregistering. There’s no waiting period to get another 90 seconds if you have just come off a registration.
- A DNS query will be made for first level UDP SRV records of trunks, and add them to the whitelist.
- IP addresses added to PJSIP’s ‘match’ field will be added to the whitelist.
- There is a new option in the GUI to have registered IP addresses be ignored by Fail2Ban. The likelihood of malicious traffic coming from an IP that has registered devices seems so obscure that this seems to be a wise choice to me. However, see the thread mentioned by @lgaetz above, that there may be good reason to leave this turned off. My compromise was to make this a choice in the GUI, so you can all decide for yourselves.
I know some of you are requesting additional logging for Responsive Firewall. Based on my limited understanding, I don’t think it’s possible. Since the underlying mechanism of Responsive Firewall is xt_recent, all that is done is just to count packets. There is no other useful information, other than that you have sent too many packets. Basically, if you’re being blocked, you’ve been dumped off the whitelist, and you need to figure out why.
Whoever can test this and post back if you’re getting less false positives it will be greatly appreciated.
Please note that I’m not a Sangoma employee. Buy me a beer!