Cannot reconnect my trunks after hard reset

Hi to all,
just searching a little help to understand something that happens recently in my production environment, sorry for the long post, just wanna add more details as possible.

First of all I have a freepbx kvm inside a big HA proxmox cluster, in the same cluster is present another PFsense KVM that I use for networking stuffs and as gateway for freepbx and all the other VMs and workstations.
Behind this cluster there is a super fast VDSL 300/30Mb that is combined with a fritzbox router(this is the provider’s router and I cannot change it or I will loose internet connectivity) so freepbx is behind double NAT.

  • Frepbx and pfsense are latest.
  • I have 4 pjsip trunks to the Italian provider EHIWEB.
  • CHAN_SIP is disabled so only CHAN_PJSIP DRIVER is available.
  • my DNS are 127.0.0.1,8.8.8.8, 8.8.4.4
  • I have around 100 pjsip extensions that are all local or through an IPSEC VPN.
  • I have a public static IP.
  • Since all extensions are local I don’t have forwarded any ports to freepbx.

Everything works perfect, I didn’t have any issue for 2 years(except some problems related to the voip provider)
in a massive production stage(30/35 external calls together plus all local traffic).

But in the last mounth something is happening and I didn’t understand why, aleatory I cannot ping anymore my voip provider IP voip.vivavox.it so trunks goes offline and to fix this I have to reboot. In that moment I can ping any other IP and from local LAN I can ping voip.vivavox.it with no problems, so is related only to freepbx. Anyway even if is a little anooying is not so bad to reboot freepbx sometime but this friday things went really wrong:

the node that is hosting FREEPBX and PFSENSE shutted down due an hardware failure but my proxmox cluster did things well and migrated the freepbx KVM and pfsense KVM from the faulty node to an online node, but obviously this wasn’t a live migration so the freepbx VM and Pfsense VM received an hard reset and then they came back online again. HERE COMES THE PROBLEMS again I was able to ping everything but not voip.vivavox.it and obviously all my trunks were unregistered receiving error “no response from voip.vivavox.it”, so I rebooted again in a safe way pfsense and freepbx and ping was working, the trunks turn back online but I have no bidirectional voice, I rebooted, disabled firewall, disabled fail2ban but with no success so production went down. I tried to register to my trunks with a softphone from the same pfsense connectivity and it was working fine. No way to fix this, after let’s say 1 hour I rebooted again freepbx and then everything worked fine. I really can’t understand why this happens, in particular why at a certain point I cannot ping anymore voip.vivavox,it. I called the voip provider(just in case) and they told to me that there wasn’t any particular issue in that moment.
I think that for some reason my freepbx locks the connection with this particular DNS,
if this help I noticed lately that when I send a netcat to this DOMAIN from my freepbx I’m receiving this

[email protected]:~# netcat -u -vv voip.vivavox.it 5060
DNS fwd/rev mismatch: voip.vivavox.it != voip.eutelia.it
voip.vivavox.it [83.211.227.21] 5060 (sip) open

or with ping

[[email protected] ~]# ping voip.vivavox.it
PING voip.vivavox.it (83.211.227.21) 56(84) bytes of data.
64 bytes from voip.eutelia.it (83.211.227.21): icmp_seq=1 ttl=55 time=22.3 ms
64 bytes from voip.eutelia.it (83.211.227.21): icmp_seq=2 ttl=55 time=21.1 ms
64 bytes from voip.eutelia.it (83.211.227.21): icmp_seq=3 ttl=55 time=21.2 ms
64 bytes from voip.eutelia.it (83.211.227.21): icmp_seq=4 ttl=55 time=21.1 ms
^X^C
— voip.vivavox.it ping statistics —
4 packets transmitted, 4 received, 0% packet loss, time 3280ms
rtt min/avg/max/mdev = 21.153/21.459/22.319/0.496 ms

I’M PRETTY SURE THAT THIS MISMATCH WASN’T HAPPENING ONE MOUNTH AGO

Any suggestion on what I can try if this happens again?
many thanks

Your post makes it sound like you are using FQDNs instead of IP addresses. Is that the case?

Using DNS names gives your firewall and pbx another failure vector, in that if the services at the address fail, you fail, but if the DNS at any of your addresses fail, you lose the server and the services both.

sorry I didn’t understand were you think I’m using FQDNs maybe in the pjsip trunk setup? if this is the case yes I’m using voip.vivavox.it as SIP SERVER in my trunks, but if I use IP instead of FQDNs for sure the trunks will not register don’t ask me why.

OK, but that seems like a real problem. You’re experiencing firewall lockouts because the forward and reverse names don’t match, so I’d think that going to straight IP addresses would make a lot of sense.

I’m old school, but I don’t want to have DNS in the middle of any of this trunk negotiation. All it takes is one DNS hack or a failure in your local DNS settings to screw you over untraceably.

What happens if you try using the voip.eutelia.it for your connections? It might help if you post your trunk config (redacting passwords, of course) so we can look at it.

are you sure about that? so do you think that my point was right?

one of my four trunks has this eutelia domain(don’t ask me why) even if they point on the same IP if I change these domains to eutelia or viceversa the trunks won’t register, maybe on the provider side they can check this.

ok I attach some pics of my trunk(they are all the same)


many thanks

This is a 100% you. As running your same netcat command on my PBX/proxy systems have no issue with this whatsoever. On either domain.

[email protected]:~# netcat -u -vv voip.vivavox.it 5060
Connection to voip.vivavox.it 5060 port [udp/sip] succeeded!

[email protected]:~# netcat -u -vv voip.eutelia.it 5060
Connection to voip.eutelia.it 5060 port [udp/sip] succeeded!

So this is sound like your setup is not doing something right after you failed over.

oooh this is really strange, I just tried the same command from another server in Italy and doesn’t gives to me that mismatch error

[email protected]:~# netcat -u -vv voip.vivavox.it 5060
netcat: voip.eutelia.it (83.211.227.21) 5060 [sip] open
netcat: using datagram socket
netcat: using buffer size of 131072
netcat: using remote receive nru of 65536
netcat: using remote send mtu of 8192

what is really strange is that this thing doesn’t happens only on the freepbx install but in my entire network, this is really strange cause my DNS SERVERS are all 8.8.8.8 even in the remote location. Honestly I don’t have a starting point.

So only things behind your PFSense firewall are being screwed up? Crazy.

(Note: That was completely sarcastic)

I have behind this network 2 miktrotiks phisical routers and 1 pfsense KVM, if I run a linux cli directly connected to the mikrotik I have the same results. this is really crazy

And what model Mikrotik’s do you have? And how can you run this command on the Mikrotik appliance?

Also, this is the first time you mentioned them. In your original layout you posted there was zero mentioning of two other routers being in the mix along with your PFSense VM being in there. That is rather important information in this type of scenario.

yes you are right but these mikrotiks are for other stuffs in a completely separeted network, I used them now just for testing. But it’s useless, I called a colleague in a different geographical area around 400km from here and he receveid the same exact error. So maybe the output of netcat is somehow related to the distro or the geographical area. For sure this is an error that I wasn’t receveing some mounths ago. I think that is related to the provider that are making some errors with the setup of the DNS. is impossible that this is related to my local setup.

And how are they making errors in their DNS? I ran that command and other tools from places like mxtoolbox.com that do things like A record, DNS checks, SRV lookups, etc. Everything passed just fine in regards to standard A record/DNS lookups. They don’t have any SRV records for these domains and both domains resolve to the same IP address.

Since an IP can only have one PTR record (reverse lookup) when you do an rDNS lookup on the IP the domain is voip.eutelia.it. I highly doubt this is their issue since it happened after your “HA failover” kicked in.

So now these Mikrotiks aren’t even involved in this network so they are pointless to mention. The only thing involved is PFSense which is a horrible solution for SIP to begin with and I’m sure it is at the root of this.

I highly doubt this is the provider.

did you read this? about pfsense, I’m working 2 years wihtout any problem, this happened only in the last mounth without any modification to my setup. I don’t think that this is related to the HA failure, as I wrote before I cannot aleatory ping this domain. and what about the linux console attached directly to the mikrotiks that gives to me exatly the same error? This is strange for sure but cannot be pfsense problem if my colleague have the same issue and mikrotiks too. I don’t know why you don’t receive this, maybe is related to the geographical area.

and sorry for this noob question but don’t you think that to have the ptr 83.211.227.21 not to voip.vivavox.it but to voip.eutelia.it is a wrong setup?

And what is your geographical location in all this? I’m doing this from the USA to a place in Italy. That is probably a bigger geographical area than what you are in compared to the provider.

Are you actually doing something that makes this an ordered lookup? Because DNS doesn’t do ordered lookups by default. It queries all the DNS servers and returns the first reply that it is given. This is why when people make DNS changes (like an IP change) in the primary DNS server and then the secondaries don’t update some people end up on the old IP and some end up on the new IP. Because the DNS queries were answered by different DNS servers.

So what happens when you take 127.0.0.1 out of the DNS queries and you fully rely on Google DNS?

So to recap you are saying this problem started a month ago where your system would just go offline and not connect to the provider. You reboot the server (which clears memory, cache, etc) and things are just fine until after sometime and things go offline. Repeat process and it works just fine.

Also, why do you need four trunks to the same provider? What is that all about?

And did you leave you softphone connected and in use for at least an hour or so to see if the issue happens there?

Thank you for your patience @BlazeStudios,
did you read this?

for your questions…

no

honestly I didn’t tried this cause it was the suggested setup in freepbx docs.

exactly

cause unfortunately my provider doesn’t gives to me DID numbers, I have already asked this but the only solution was to pass to me 4 different accounts one for each number I need so I have to create 4 trunks to the same provider.

yes Blaze, during the debug I had the sofpthone connected and fully working with no problems, then freepbx starts to work again after one hour without touching anything in my setup.

So this isn’t a trunking provider? Do they have any rules about using PBX systems? It would seem very odd that a voice provider like this doesn’t supply trunking options unless they aren’t setup for trunking services like this and treat every account individually.

I have asked for this I know that is ridicolous but they gives to me only username and password for each number and there is nothing more, I can’t ask for any advanced technical support. but… is a very very chip provider and for a massive call center like this we are talking about 15.000 euro/year of difference in comparision with other local providers and as I told you for 2 years It was working fine.

at least I want to understand if this is related to my local setup or to the provider

Well it could be a combination of both. Your provider is not the right type of provider that is needed for this type of setup. It sounds like they are a SOHO/Residential type company expecting everyone to have ATA devices or simple dumb SIP devices to plug their analog phones into. They aren’t going to be able to provide proper support because you are an out-of-scope customer. This isn’t the type of setup you want for SIP Trunking and especially if you are running a call center.

And while you can keep harping on “It’s been working fine for two years” it doesn’t really matter. Perhaps your provider has finally said after two years, “When we see a PBX connected to us, specially on multiple accounts. We dump it.” Something could have happened in your setup. Who knows but if your provider cannot or will not work with you on why your SIP trunks keep going offline and require a reload to reconnect then you’re kinda stuck.