OpenVPN LAN2LAN VPNs break Cisco and Linksys SPA phones but not others, any suggestions?

So, I have a remote network I use for testing, I’ve discussed it before, here:

Issues with Cisco 7940G phone extension registration over an OpenVPN vpn - FreePBX / Endpoints - FreePBX Community Forums

The original problem I brought up 4 years ago has been mostly fixed with a combination of things, newer openvpn code, faster OpenVPN gateways, and changes to the configuration.

The subnets involved are: 172.16.1.0/24 (the FreePBX 17 server is on this net) 192.168.1.0/24 (this is connected to the 172.16.1.0/24 subnet via a 2 port ethernet router) and 172.16.100.0/24

The 172.16.100.0/24 subnet is 100 miles away from the 172.16.1.0/24 subnet, connected by 2 Netgear Nighthawk routers running DD-WRT both with OpenVPN configured in a LAN2LAN configuration, similarly to the overview of how to do this located here:

RoutedLans – OpenVPN Community

nat is disabled on the client openvpn router.

I have verified NAT is not involved on EITHER of the DD-WRT routers with the

iptables -t nat -L -n -v

command run on each, as the tun interface is not listed as being masqueraded to.

All apps work normally. If I ssh into a server on the 172.16.100.0/24 subnet, from the 192.168.1.0/24 subnet, and run the w command, it shows me logged in from a 192.168.1.x IP address, not from a masqueraded public address.

I have a variety of Cisco and Polycom phones located on 172.16.100.0/24, 172.16.1.0/24 and 192.168.1.0/24

Calls between phones on each subnet work. All phones register into the FreePBX server no problem.

Calls to and from all phones (Cisco and Polycom) from the 172.16.1.0/24 and 192.168.1.0/24 subnets, as well as to the outside PSTN, work fine.

Calls from the Polycom phones on 172.16.100.0/24 to the PSTN and other phones and vis-versa, all work fine.

Calls from the Cisco phones on all networks, to other phones and to the PSTN work fine.

Calls from the PSTN to the Cisco phones on the 172.16.100.0/24 network have 1 way audio. The caller on the PSTN cannot hear the talker on a Cisco phone on the 172.16.100.0/24 network.

I have read and understand:

Easy Guide: How To Configure NAT For PJSIP Endpoints
Configuring res_pjsip to work through NAT - Asterisk Documentation

and have tried playing with all of the different options and no setting makes a difference. I’ve tried both chan_pjsip and chan_sip, tried setting and disabling NAT and no difference.

I have wireguard captures of the successful outgoing calls from the Polycom and Cisco phones, and wireguard captures of the successful incoming calls to the Polycom from the PSTN and the one-way, failed audio incoming calls from the PSTN to the Cisco phone.

The same problem exists on pjsip and chan sip with the Cisco phones not with the Polycom

I noticed this post:

OpenVPN + OpenWRT - General Help - FreePBX Community Forums

It’s the same problem with the Cisco SPA phones. (since Cisco developed their phones from what they bought from Linksys, this makes sense)

I can only conclude that even though OpenVPN is being told to NOT nat - it’s actually still rewriting the packets.

I know it’s a border case but I’d sure love to know what openVPN is doing behind the scenes with SIP traffic and why it works with 1 model phone and not others.

Even though NAT is not required, try enabling NAT for the extensions that are behind the OpenVPN connection

“…tried playing with all of the different options and no setting makes a difference…tried setting and disabling NAT and no difference…”

Unfortunately, I tried picking the low-hanging fruit on this one but there wasn’t any. :slight_smile: My decision now to fix this is to pick one:

  1. Replace the OpenVPN with Wireguard. Of course, I don’t know if Wireguard will do the same baloney behind the scenes that OpenVPN is doing. Or, replace OpenVPN with a regular IPSec VPN.

  2. Sink a ton of time into the Wireshark captures and try to more carefully document what’s going on - the problem with this however is that BOTH the phone AND FreePBX/Asterisk are working the way they are supposed to be working - otherwise I would be having the same problem with the 192.168.1.0/24 subnet as on the 172.16.100.0/24 subnet - since those subnets are all configured the same in Asterisk and the phone config is the same no matter what subnet it’s on. So, basically, I’d be saying “OpenVPN is broken is there a way I can break FreePBX to hack around OpenVPN”

  3. Try to figure out what the Polycom firmware is doing different in the Polycom phone vs the Cisco Enterprise firmware and document that - but that of course, is just documenting oddities not documenting a fix - useful in an academic sense but not a practical one.

  4. Collect and test more different models of different phones on the 172.16.100.0/24 subnet and see if I can find others that also duplicate the one-way-audio bug that OpenVPN has created - create lots more wireshark captures and study and report the findings from them - then write it all up and dump it in the laps of the OpenVPN developers and tell them to fix it. The problem with this is that the OpenVPN devs clearly have just hacked the hell out of OpenVPN and it only really works as a VPN because of constant pluckery and hacking on the OpenVPN codebase. (come on - this is a VPN software that is constantly flipping packets in and out of userspace and kernelspace) NAT is clearly baked into it and NAT is, and has always been, a horrible disgusting network hack.

  5. Just use Polycom and other “known, working” phones on the 172.16.100.0/24 subnet.

I’m mainly reporting this now because I’ve spent a ton of time on chasing what I thought was a bug in the Cisco phone firmware - but now I can pretty much conclusively state is a bona-fied bug in OpenVPN - which may be also biting someone else. It is a very very subtle bug - but it’s a bug, nevertheless. The simple fact is that if OpenVPN is going to support LAN2LAN VPNs - and it does with all of the push subnet configuration commands it has - it needs to be COMPLETELY transparent to the packets going through the VPN. It needs to ROUTE them not double translate them secretly at both ends and pretend that it’s routing them.

The fact that the Polycoms can, apparently, detect what’s going on and work around it is of no consequence. Maybe it’s possible that the Cisco phones running Enterprise firmware aren’t paying attention to the force symmetrical port and other hacks used in the Asterisk configuration to get past NATs while the Polycom’s can - and maybe if the Cisco phones were running 3PCC firmware they could do it also - I need to test that BTW - but the fact is that this shouldn’t even be an issue in the first place - because the OpenVPN LAN2LAN vpn should be ROUTING not double-translating behind the scenes and making it look like it’s routing.

I use OpenVPN regularly and I understand your frustration. I’m not sure who’s fault it is, but some brands just work and some don’t. The ones that don’t, I made them work enabling NAT for the extension that is connected through OpenVPN. That always made the trick for me. Just make sure the OpenVPN subnet is listed in the trusted networks section of the SIP settings.

You start with option 2. What does the SDP look like? Is it the proper details? This is clearly a networking configuration issue with the two routers and OpenVPN. Sounds like the router in front of the PBX either isn’t receiving the audio or it’s screwing with it before it hits the PBX.

Get some packet captures.

I have captures already. What’s the preferred method of posting them - pastebin?

Sure. That works.

https://paste.c-net.org/TorturedPacing call-from-outside-to-polycom-working
https://paste.c-net.org/XanderPrunes call-from-polycom-to-outside-working
https://paste.c-net.org/FlunkingEnters call-from-7940-to-outside-working
https://paste.c-net.org/MorrieStranded call-from-outside-to-7940-1-way-audio

I should probably mention that the PSTN is reached via an IP gateway located on the inside network that FreePBX knows about. And any phone numbers that appear are actual published phone numbers.

It is.

Where were these packet captures taken at? The PBX?

A mirror port on the ethernet switch on the remote 172.16.100.0/24 network

Well there’s two way audio in the capture you have labeled “1-way audio”. So at the remote location before it goes out the tunnel, there’s 2-way audio. Now you need a capture of that same type of call at the PBX level or on the side of the PBX before it hits the PBX. The first one should be enough.

So make another test call and get a capture at the PBX level.

I won’t be out there for another couple weeks, I’ll make the capture at that time from wireshark running on the pbx. I should have thought about that issue. :frowning:

Is there anything of any significance about the audio on the file labeled call-from-outside-to-polycom-working? Is that going to a symmectrical rtp port while the other isn’t, for example?

Here are those captures:

https://paste.c-net.org/DriftingCrawley call-from-pstn-to-phone-one-way-audio-only-to-pstn.pcap

https://paste.c-net.org/AcquireShaving call-from-phone-to-pstn-ok.pcap

Same phones, call works when the deskphone initiates it to the PSTN, call has one way audio when an incoming call from the PSTN is transferred to the desk phone. Desk phone is a Cisco CP8941 running 9-3-2SR1-2 firmware

Capture made with tshark running on the FreePBX system

And this is where the two endpoints are on different networks going over a VPN? Because once again, the call you said has one way audio has two way audio. So where ever the first capture was taken, there was two way audio streaming. Capturing it at the PBX level shows two way audio on the call.

So you have A Phone <—> PBX <—> B Phone with one way audio. Which side has the one way audio? Is it the caller or callee? Which side did you take the original PCAP from?

The desk phones are on 172.16.100.0 network. One is 172.16.100.222 the other is .226 The PBX is on 172.16.1.0 network at 172.16.1.16. The PSTN phones are coming through POTS lines to a POTS-to-SIP gateway at 172.16.1.6

172.16.100.0 and 172.16.1.0 are connected by the LAN2LAN OpenVPN

The first set of 4 packet captures was taken from a monitoring switch port on the 172.16.100 network 2 were monitoring a working polycom phone which has no problems, those are reference. 2 were from the cisco desk phone which did have the problem.

The second set of captures were taken from the PBX at 172.16.1.16 There are only 2, of a different Cisco deskphone on the 172.16.100.0 network. All the Cisco phones no matter what model, have the issue.

The one-way audio is happening when a call originates from the PSTN, is answered by the IVR in the PBX, and then once established the extension # is dialed in the IVR causing the PBX to transfer the call to the deskphone on the 172.16.100 network. Once that phone answers the call, you get no audio received by the 172.16.100 phone, however it has no problem sending audio which is heard by the PSTN phone.

Calls from the Cisco deskphones to each other on the 172.16.100 network have no problem. So I know that the setup and call monitoring SIP traffic from phones to the PBX all works. Once a call is established from deskphone to deskphone, I believe the rtp goes directly between phones thus it’s not going over the OpenVPN, VPN. It’s only calls going to the PSTN where the rtp goes over the OpenVPN that are affected - and only calls that originate from the network the PBX is on and answered by a phone on the 172.16.100 network. Calls originating from 172.16.100 and answered from the PSTN (the gateway) don’t have a problem.

So you have a capture of the .226 phone? This is showing audio only between the PBX and one phone. Did you define this capture only to get traffic between the PBX and one phone?

The filter used for the .226 phone for the capture from the monitoring port on the 172.16.100 network was

host 172.16.100.226 && ip net 172.16.1.0/24

That was 2 of the first captures 21 days ago

The filter used for the capture at the PBX I just took was just

host 172.16.100.222

I probably should also mention there is one other network reachable from 172.16.1 it is 192.168.1 A cisco deskphone on 192.168.1 has no problems calling, but only a regular router (not a VPN) is used to connect 172.16.1 and 192.168.1

And, routing is correct for all, since for example I can ping 172.16.1.6 from 172.16.100.x and 172.16.100.222 from 172.16.1.6

Well it’s hard to tell if things are flowing from the PBX to each phone if you only capture one phone. So between .222 and the PBX there’s two way audio. What about between the PBX and .226?

But that’s what I was saying, in the trace labeled DriftingCrawley, the one from the PBX to .222, there’s NOT 2 way audio. Meaning, I cannot hear audio at the deskphone.

What I think you are telling me is, that in that trace, it is showing audio being sent from the PBX to the phone. That’s the trace run on the PBX itself.

What I am wondering is, in the trace MorrieStranded, is that ALSO showing audio from the PBX to the .226 phone? Because in that one, I’m also not hearing audio from the phone. That trace was taken at the switch that the phone is plugged in on, on network 172.16.100 - at a point AFTER the traffic had passed through OpenVPN

Because, if DriftingCrawley is showing audio going from the PBX to the .222 phone - and MorrieStranded is not showing audio from the PBX to the .226 phone, then since those are both the same direction kind of calls (just different phones) - OpenVPN is somehow destroying the audio packets when the call is originating from the PBX/PTSN. This tracks with observed behavior of the phones.

So then the next question is - if that is the case - what is different about the audio packets in the trace TorturedPacing, from the audio packets in the trace DriftingCrawley? It’s the same kind of call, same origination direction, just terminating on a different model of phone. Yet it works, while the other one does not.