Inbound calls stop working, but outbound calls work

I’ve been trying to figure out the issue with our FreePBX servers for over a month and I’ve come to dead end after dead end. Eventually inbound calls will stop working, however I can still make outbound calls. This used to happen once every two days, now it seems to happen once a day. Trunk provider is VoicePulse:

(obviously I changed some parameters around for security reasons)

type=peer
context=from-trunk
host=*voicepulsehost
username=*username
secret=*password
qualify=4000000
disallow=all
allow=ulaw
dtmfmode=rfc2833
rfc2833compensate=yes
insecure=port,invite
trustrpid=yes
rtptimeout=0
rtpholdtimeout=0
rtpkeepalive=20000
session-timers=refuse

I initially used context=from-ptsn but still had the issue. I called VoicePulse and they think it’s a registration issue, however debugging the trunk IP shows that packets are going just fine. The office setup has two WAN ports for redundancy, one for each fiber box that connects our pfsense firewall to the outside world. We have two identical FreePBX servers, each one pointing to one of the WANs. For example, Server A points to WAN port A, while Server B is listening to WAN port B. Shutting off one of the servers doesn’t seem to help as the issue pops up again a couple days after. What’ll usually happen is one server will stop taking inbound calls and the other will start listening. For example Server A stops taking inbound calls so Server B starts handling them. Eventually Server A comes back and handles them again. Depending on the day they will flip-flop with one another. Also whichever server is handling inbound calls also handles outbound calls. Just yesterday it got to the point where neither server was handling inbound calls at all, only outbound calls.

Can anyone at all lead me onto the right path? This issue has gotten to the point where I’ve contemplated finding another career because of how bad it’s gotten.

This clearly sounds as a NAT issue. Are your FreePBX boxes sitting behind a NAT device? If so, you can experiment with keepalive and qualifyfreq parameters on the trunk, in order to keep the NAT session opened

Yes they’re sitting behind a NAT device. The pfsense NAT is configured to forward the ports from VoicePulse’s networks to the public IP address of our WAN ports, and then to redirect them to the IP address of the FreePBX. I’ve done this for both SIP and RTP ports. The same is configured for the Arris fiber boxes that sits in front of the pfsense in terms of how traffic is routed to us.

I have the qualify variable set to 4,000,000ms (which is an hour and 6mins 40 seconds) because the servers weren’t waiting long enough for a registration reply from the VoicePulse servers. I set it so high because I didn’t want to sit there and guess how long it should be.

You mentioned Keepalive. That’s currently set at 20,000ms. I tried to find out more info on what keepalive does but Googling it sent me down a rabbit hole. What does the keepalive variable do exactly?

The keepalive sends a packet very X seconds in order to keep the connection open. You would want to set the following values on your trunk to start testing:
qualify=yes
qualifyfreq=20
keepalive=20

If you hardcoded those parameters anywhere else, delete them and just set them on your trunk definition. Also it would be a good idea to turn SIP ALG off on pfSense.
Check that NAT settings are correct on SIP setting on FreePBX, specially external IP and local networks.

If I’m not mistaken, qualify=yes defaults to 2000ms, which is 2 seconds, so I’m assuming the qualifyfreq=20 increases that to 20 seconds? My keepalive variable was set to yes, it was NOT set to 20,000ms as I stated before. That was RTPkeepalive. SIP ALG is turned off on pfSense. The NAT settings on the FreePBX have Server A pointing to the external IP of WAN A, and Server B for WAN B. Each server knows the local networks. NAT is set to yes and Public IP was selected.

Actually, qualify and qualifyfreq are two different parameters, and as such they control two different aspects of the qualify process. I suggest you try with those 3 parameters and verify your results.

I can confirm that with qualify set to yes, our trunk registration will fail because 2 seconds isn’t long enough for the VoicePulse server to reply back. It jumps to 5000ms+ frequently. The highest I’ve personally seen it go is 10,000ms. The other two I’ll leave as is and verify my results within the next week or so. Unfortunately this issue takes days to troubleshoot and see if a change fixes things or not.

Qualify parameter is not directly related to registration, is more of a monitoring process.

That’s odd. Any idea why I changed qualify from yes to 4,000,000 it solved my registration issue?

No idea at all. You can leave that parameter out, which by default means qualify=no and it would still register.

I doubt that the qualify has anything to do with your calls failing.
It’s there to monitor the latency between your server and an endpoint.
You say that VoicePulse routinely takes more than 2000ms to respond and that latency is a problem.

If you have your ports forwarded on your firewall, you don’t neep keep alives to keep them open.

Find out why and where an inbound call is failing, perform a packet capture on your pfsense while you have a call not going through.

Another question I have is sometimes FreePBX will listen to a WAN port it wasn’t initially assigned to listen to. We have two fiber boxes that route traffic in and out of our network, and the pfsense has a WAN port with a public address assigned to it for each of the fiber boxes. Upon doing an debug on the trunk’s IP address, FreePBX is now listening for traffic on WAN B instead of WAN A. I’m not aware of any way to configure FreePBX to listen to two public IP addresses at once. Is there a way to do that or should I dive deeper into the business’s network configuration and try to essentially force the traffic through one WAN port?

That is probably why you lose inbound. It is highly probable that your VoIP provider is expecting your registration to come from the same IP all the time, but your FreePBX is randomly going from one IP to the other.

Ah so it’s not a FreePBX problem it’s a pfsense problem due to how our network is set up. Funny because I thought that was the issue before hence why I pointed each server to one of the WAN ports. That makes sense why they flip-flop. I’m going to investigate the pfsense some more and report back whether or not it was solved. Since it seems like FreePBX isn’t the issue at all, this thread should probably be labeled as SOLVED.

UPDATE

Upon investigating pfSense settings, there’s a type of sticky that keeps connections alive for an hour from the last time some sort of packet along some path. If the pfSense doesn’t notice anything for an hour, it’ll shut off that connection. What we think was happening was the registration was set to send packets much much longer than an hour, meaning the line would go down if FreePBX took too long to send another registration packet. Defaults are currently being used and there hasn’t been an issue since.

Thank you both for taking the time to help brainstorm some ideas and to lead me to the right path. I really appreciate it.

I know exactly what you’re talking about.

In pfSense
System/Advanced/Firewall & NAT
Change Firewall Optimization Options to Conservative

Wow I completely forgot about that setting…I’ll try that, too. Thanks.

That is what keepalive parameter is used for, to keep a connection alive.

But isn’t that only for when the ports haven’t been forwarded properly via the network’s firewall?

Keepaplive is a parameter to keep a connection active, that’s all. It doesn’t matter if a firewall is involved or not. But when there is a firewall, it may be necessary to set it, if the firewall keeps closing the connection.