This is going to be a very long shot since Lorne wasn’t able to resolve it & he seemed very, very knowledgeable.
In short, I have two PBXact 13 appliances configured in HA mode. Due to our users familiarity and comfort level with our existing Cisco phones and overall cost involved with replacing 70+ phones, it was decided to keep the old Cisco 7960 phones, flash them to SIP, and utilize them with our new phone system for the majority of our phones; basically, managers & our call center got Sangoma phones - everyone else has a 7960. After I figured out how to do this, setting them up went relatively smoothly. (Figuring out how to flash them & the exact config file structure/content the phones needed was the hard part. On a side-note, would anyone be interested in a how-to style guide on how to do this?) Being old, they provision using TFTP and I have option 66 enabled on our DHCP server to point them to the cluster IP address. This went well and the phones work as expected with PBXact.
However, while configuring the server, I noticed something very odd.
Whenever I factory reset a 7960 to remove all settings from it, it will not pull it’s config from PBXact. I can see the TFTP connection coming into the server using a packet sniffer, but the server refuses to respond and there’s nothing in /var/log/messages from the tftp daemon, even though verbose logging is turned on. If I disable the firewall temporarily from the command line (service iptables stop), the phone receives it’s configuration and registers perfectly. The firewall restarts after a 5-7 second delay, but this is more than enough time for the phone to pull it’s config. After a given phone has registered once, it can then boot up normally without needing to disable the firewall to let it through. If a functional phone is factory reset though, it is again not allowed through the firewall and refuses to pull it’s configuration until I manually disable the firewall.
If I change the eth0 to ‘Local (trusted traffic)’ instead of ‘Internet (default firewall)’ under Firewall->Main->Interfaces, this problem does not present itself. Obviously, this is not something I wish to run with long-term. I have already set the phone subnet /24 to be ‘Local (trusted traffic)’ under Firewall->Main->Networks, but this does not seem to have any effect - the ENTIRE interface needs to be set to ‘Local (trusted traffic)’ in order for the phones to provision.
I tried disabling the responsive firewall in Firewall->Main->Responsive Firewall, but this had no effect either - the entire firewall needs to be disabled or have eth0 be set to Local in order for them to provision. I also set TFTP to be allowed in the Internet zone in Firewall->Services->Extra Services, but alas this did not change the symptoms either. There are 0 ‘Rate Limited’ or ‘Blocked Attackers’ shown in Firewall->Status->Blocked Hosts. I also added the entire subnet to the whitelisted clients under System Admin->Intrusion Detection on the off-chance that this is what was causing it.
The last thing I tried was to upgrade the firewall module from 13.0.57.1 that was on the system when installed to edge release 13.0.60.2, then re-save & apply, but again this had no impact.
After opening a support ticket, Lorne found out that apparently, I’m affected by a bug that’s supposed to be fixed already:
https://issues.freepbx.org/browse/FREEPBX-14483
Unfortunately, nobody seems to know how it was fixed in the past. I tried loading the TFTP connection tracking module as the bug report suggests, but this had 0 impact on the issue.
Thankfully, my PBXact system is protected by a Meraki edge device, so I can use that to shield it from the internet & leave the entire interface configured as being completely trusted/local. (It was this or disable the firewall entirely.) I would much rather have the firewall on & functioning as it should to provide another layer of protection, but simply do not have the knowledge/experience to do so; the fact that I set TFTP to be allowed in the Internet zone and it still won’t work makes me think there’s a flaw in the GUI or firewall rule generation. Lorne went back & forth with R&D and a decision was made to not spend any more time on it.
I was wondering if anyone here has experienced this, has a solution, or is a wizard at iptables and could perhaps provide an auxiliary rule that might alleviate this issue. (Frankly, it’s been years since I used iptables; I use BSD for most of our servers & so know/like pf MUCH more; it’s so much simpler and easier IMO. Looking at the iptables output on the PBX makes me slightly dizzy… )