With the surge in work from home, we’re seeing a surge in people who have taken their Sangoma phones home, subsequently getting banned by the Sangoma Firewall after a few days go by (days where the phone works perfectly fine).
The interesting thing is it’s across the board randomly after a few days go by. The phones otherwise work and provision perfectly fine up until this random “event”. If we unban the IP of the particular phone they’re fine for another few days. Not clear what triggers it. I’ve looked at /var/log/asterisk files and I can see them get banned but not why.
Can anyone point me in the right direction of how to sleuth WHAT is causing them to be banned so I can then resolve that? The only thing I have seen is they get RATE LIMITED (as shown in the Sangoma Firewall interface) prior to being banned.
We have whitelisted them as a stopgap. I would like to troubleshoot/figure out the actual problem causing this though. Could, and likely is misconfiguration somewhere, but the logs are not providing clues.
This is not fail2ban, this is the FreePBX firewall. I have encountered the same issue with many clients. Seems like any disconnect/reconnect is causing them to get blocked by the FreePBX firewall.
This is with pjsip endpoints and Responsive Firewall enabled for pjsip. Since there is no real firewall logging, this is difficult to troubleshoot.
Adding their IP to the firewall prevents the issue, but these users have dynamic IPs which is what the Responsive Firewall function is supposed to accommodate. I wonder could it be something like BLF subscriptions prior to registration being interpreted as failed logins or something like that…
Dynamic DNS may be a good workaround for some users, but not others. In many cases we are going to be asking people to put a DDNS client on their personal devices because their work device uses a VPN and would therefore report the wrong IP address.
I suspect the firewall is being indiscriminate about the kind of traffic it uses to rate limit and block users, and not taking into account whether or not the user has successfully authenticated in the past. These devices are not failing to authenticate, I suspect they are just sending some miscellaneous traffic at the server during the time between when the phone goes offline and when it re-registers.
This is not, it seems, a new issue, here’s a few similar threads:
No, it likely CAN’T be improved not easily due to a fundamental networking issue. Here is the most likely problem
When the phone is running at your employee site it’s traffic is going out on to the Internet through an address translator in their router. That translator has a set of “connection track timeouts” That is, when the phone registers in, it setups up a translation entry in the memory of the user’s router at their house.
If the TCP registration sends keepalives periodically, then the user’s router will keep the TCP connection slot open. If it does not then eventually the router will expire the connection slot, and when the phone then tries to send something it will discover it’s unregistered (and maybe try to register again) I suspect the excessive registration attempts (when FreePBX thinks the phone is already registered) are triggering the firewall. If someone wants to plug a Linux system into a monitoring port on their ethernet switch their FreePBX system is on, then setup packet filters and run a trace under Wireshark they can probably get a more exact description of what is happening.
Different models of phones handle this differently. I have experimented with a number of different models and some of them you can set a configuration that will turn on keepalives and some you can’t. Some router models will pay attention to keepalives and some won’t.
And then there are UDP connection tracking timeouts. Yes, UDP is connectionliess but a “slot” has to be setup in the router’s memory to handle the translation and that can expire also.
Here are some “fixes”/hacks you can try:
For the Netgear R7000 put the inside interface into the DMZ zone. (the end user must have a R7000 obviously) This is mentioned here:
For dd-wrt the timeouts can be modified in the web interface as discussed here:
Assign an available private subnet consistent with the office (and with all other remote worker subnets) for the user
Turn off the translator in the user’s cable modem, or DSL modem.
Install a router capable of running dd-wrt (or reflash the users router if they already own one capable of running dd-wrt)
Setup an OpenVPN connection from the user’s router to an Untangle server at the main office (the dd-wrt router is setup as an OpenVPN client the Untangle box is setup as an OpenVPN server)
Configure the DHCP server in the users router to pass the address of a TFTP server located at the office.
Configure dynamic dns on the dd-wrt router
Setup an access rule in the OpenVPN server to only permit the remotes access to the FreePBX server and TFTP server
Renumber the users home network to fit in to the office subnet (renumber to 192.168.101.x etc.)
This solution has no software licensing costs, it does NOT require that VoIP be passed through ANY translators, and I can reprogram the phones easily by changing the configuration files in the TFTP server. It is rock solid stable, does not require a DDNS client to be run on a PC, (if you want to do that you can do it on the dd-wrt ) and I use a free dydns server on the Internet which is predefined in modern dd-wrt configurations.
Routers already loaded with dd-wrt can be purchased from dd-wrt or you can download the firmware and reflash your own devices.
Untangle is also free and runs fine on any old Core i3 with 8gb ram you have lying around (just install another ethernet card in it)
And 99 times out of 100 your dd-wrt router is a superior wifi access point for the end user over their cable company’s POS wifi-enabled cable modem.
UDP timeouts have previously long ago been increased to mitigate those related issues. I get the other guys thought on the ports from nat changing but it’s hard to say. I’m going to try provisioning via open VPN client to see results.
My understanding is that IPs from which devices successfully register; the PUBLIC IP, not the internal LAN IP, are supposed to be whitelisted for some period in the Responsive Firewall. This is what seems to be not working properly, as if that was happening, re-registrations wouldn’t be a problem.
We have this issue this AM at a client with one remote phone connected via router to router VPN. The remote handset has been in place almost a year without issue, and they just informed us that it’s disconnecting. It was blocked in the Responsive Firewall.
Sangoma folks, I hope that this can get bumped up the list; it’s something that has suddenly become critical with so many people working remotely.