Remote Phones dropping off, NAT issue or PBX Config?

Have FreePBX 14 set up on a cloud server
phones set up as PJSip TLS/SRTP from 3 different locations, two with sonicwall one with zyxel routers.

All the phones at every location keep randomly dropping off and then reconnecting to the PBX. example for extension 232 over the last 24hrs:

[2019-07-29 13:17:54] VERBOSE[626] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 14:48:31] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 14:48:32] VERBOSE[22139] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 14:48:35] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 14:49:38] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 14:50:35] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 15:14:38] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 15:15:35] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 15:36:38] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 15:37:35] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 16:30:38] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 16:31:35] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 17:44:38] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 17:45:35] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 19:24:38] VERBOSE[14269] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 19:25:35] VERBOSE[14269] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 20:40:37] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 20:41:35] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 21:05:27] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 21:05:35] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 21:08:38] VERBOSE[14269] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 21:09:35] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 21:50:38] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 21:51:35] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 22:01:38] VERBOSE[16748] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 22:02:35] VERBOSE[7284] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 22:33:38] VERBOSE[7284] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 22:34:35] VERBOSE[16748] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 23:16:38] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 23:17:35] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-29 23:38:38] VERBOSE[16748] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 23:39:35] VERBOSE[14269] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-30 00:19:38] VERBOSE[7284] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-30 00:20:35] VERBOSE[7284] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-30 01:43:38] VERBOSE[1811] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-30 01:45:35] VERBOSE[7284] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-30 02:49:38] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-30 02:50:35] VERBOSE[7284] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-30 03:50:38] VERBOSE[16748] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-30 03:51:35] VERBOSE[16748] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-31 05:50:38] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-31 05:51:35] VERBOSE[30515] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-31 06:19:38] VERBOSE[30515] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-31 06:20:35] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-31 06:27:38] VERBOSE[26784] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-31 06:28:35] VERBOSE[14269] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

[2019-07-31 06:51:35] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-31 08:05:49] VERBOSE[20611] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

Is likely a nat issue with the routers or setting issues on the pbx, extension etc? Since it happens to all phones at any location behind different routers I suspect is a configuration issue on the PBX side.

Thoughts?

I’d suggest getting a packet capture to see what is actually happening. What that is saying is that Asterisk sent an OPTIONS request to the remote side, but did not receive a response in a reasonable amount of time. It then tried again after a period of time and it was reachable again.

would that be run against UDP being that it’s TLS/SRTP?

for instance: tcpdump -w siptrace.pcap -s 0 udp

Doing it from the Asterisk CLI using “pjsip set logger on” will show things in plaintext. A packet capture would show you the state of the TLS connection itself - if it’s disconnecting, why, that kind of stuff.

duh, over thinking…

this what we’re looking for?

<— Received SIP response (653 bytes) from TLS:xxx.xxx.95.229:42630 —>
SIP/2.0 200 OK
Via: SIP/2.0/TLS xxx.xxx.42.145:5161;rport=5161;received=149.28.42.145;branch=z9hG4bKPj59a3a422-e6b1-448f-8872-ef03947fd522;alias
Call-ID: PIc1sYLUcdkVi-1pirg6buJUGAW.zCWL
From: sip:[email protected];tag=72e96ac1-3f36-45c7-998d-5a1b54da5ff1
To: ““name” <232>” sip:[email protected];tag=d8r1kx.GW3ct3RsXapZPe5rKr6GZyP6A
CSeq: 26588 NOTIFY
Contact: sip:[email protected]:42630;transport=TLS;ob
Allow: PRACK, INVITE, ACK, BYE, CANCEL, UPDATE, INFO, SUBSCRIBE, NOTIFY, REFER, MESSAGE, OPTIONS
Supported: replaces, 100rel, timer, norefersub
Content-Length: 0

That’s a SIP message, yes. Specifically a 200 OK to a NOTIFY request - which would be showing presence or MWI. If an OPTIONS goes out it’ll appear in there, along with any response.

so then more like this?

[2019-07-31 09:16:49] VERBOSE[22014] res_pjsip_logger.c: <— Transmitting SIP request (465 bytes) to TLS:xxx.xxx.95.229:42630 —>
OPTIONS sip:[email protected]:42630;transport=TLS;ob SIP/2.0
Via: SIP/2.0/TLS xxx.xxx.42.145 :5161;rport;branch=z9hG4bKPj8eda3ff6-19ff-458f-b75e-dbc4d56e22cf;alias
From: <sip:[email protected] >;tag=ce255dc6-40ff-4357-baed-464ddc1994b2
To: sip:[email protected];ob
Contact: <sip:[email protected] :5161;transport=TLS>
Call-ID: 81550b74-18e5-4026-802a-d3f34ff6b929
CSeq: 61406 OPTIONS
Max-Forwards: 70
User-Agent: FPBX-14.0.13.4(15.5.0)
Content-Length: 0

Yes, that’s a SIP request to make sure that the given endpoint is reachable. If it’s not then no response will be received and the endpoint will be marked as unreachable.

You can also use this to see if the TLS connection is dropping by looking at the IP address and port. If the port is changing between going unreachable and reachable that means the TLS connection is dropping for some reason (a router for example dropping it).

would this be an indication of the port changing due to TLS dropping?

[2019-07-29 21:05:27] VERBOSE[19542] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Unreachable
[2019-07-29 21:05:35] VERBOSE[9898] res_pjsip/pjsip_configuration.c: Endpoint 232 is now Reachable

No, that number is not the port. It’s the thread identifier for where it is executing.

this is in the log just before one of the unreachable events:

[2019-07-31 09:18:24] VERBOSE[22014] res_pjsip_logger.c: <— Transmitting SIP response (645 bytes) to TLS:xxx.xxx.95.229:55527 —>
SIP/2.0 401 Unauthorized^M
Via: SIP/2.0/TLS xxx.xxx.95.229:55527;rport=55527;received=xxx.xxx.95.229;branch=z9hG4bKPj.h1X5doXLo9U.Ht2xc14gIFH7dFV9qTc;alias^M
Call-ID: pCvDRYv-NuZRxucDGMLLaQlVGFqFl6tx^M
From: ““name” <232>” sip:[email protected];tag=4fPM-oB22tt3Sd8ekEeUM.qQXYLGNsDI^M
To: sip:[email protected];tag=z9hG4bKPj.h1X5doXLo9U.Ht2xc14gIFH7dFV9qTc^M
CSeq: 31660 SUBSCRIBE^M
WWW-Authenticate: Digest realm=“asterisk”,nonce=“1564579104/dbe34f8a7f9abc4bc7c714c38907d2ec”,opaque=“203f63a600989d16”,algorithm=md5,qop=“auth”^M
Server: FPBX-14.0.13.4(15.5.0)^M
Content-Length: 0

That’s an authentication request for a subscription. It’s a normal thing to occur.

this is in the logs just after the extension becomes unreachable at 09:18:24:

[2019-07-31 09:18:26] VERBOSE[30264] res_pjsip_logger.c: <— Transmitting SIP request (465 bytes) to TLS:xxx.xxx.95.229:64414 —>
OPTIONS sip:[email protected]:64414;transport=TLS;ob SIP/2.0^M
Via: SIP/2.0/TLS xxx.xxx.42.145:5161;rport;branch=z9hG4bKPjc264f324-55d2-4133-93d3-ade511d7b0a3;alias^M
From: sip:[email protected];tag=531ef714-5558-4a82-8765-1a206a54fb79^M
To: sip:[email protected];ob^M
Contact: sip:[email protected]:5161;transport=TLS^M
Call-ID: b8c0a6e2-d324-4678-bb38-a4403d55c368^M
CSeq: 41888 OPTIONS^M
Max-Forwards: 70^M
User-Agent: FPBX-14.0.13.4(15.5.0)^M
Content-Length: 0^M
^M

[2019-07-31 09:18:26] VERBOSE[32351] res_pjsip_endpoint_identifier_dpma.c: Updating DPMA user ‘232’ uri=‘pjsip:xxx.xxx.95.229:64414;transport=TLS’ ua=‘Digium D62 2_8_5’
[2019-07-31 09:18:26] VERBOSE[30264] res_pjsip_logger.c: <— Transmitting SIP request (676 bytes) to TLS:xxx.xxx.95.229:64414 —>
NOTIFY sip:[email protected]:64414;transport=TLS;ob SIP/2.0^M
Via: SIP/2.0/TLS xxx.xxx.42.145:5161;rport;branch=z9hG4bKPj8d59a1b9-351a-442a-b107-77a25cd2f5c9;alias^M
From: sip:[email protected];tag=b20f8099-fd19-4551-8d6f-16d8445c6547^M
To: sip:[email protected];ob^M
Contact: sip:[email protected]:5161;transport=TLS^M
Call-ID: d46f22c2-835f-4c58-9269-481c909f00f2^M
CSeq: 18612 NOTIFY^M
Subscription-State: terminated^M
Event: message-summary^M
Allow-Events: presence, dialog, message-summary, refer^M
Max-Forwards: 70^M
User-Agent: FPBX-14.0.13.4(15.5.0)^M
Content-Type: application/simple-message-summary^M
Content-Length: 48^M

There’s not enough to really say anything. Like I said before, check the IP address and port to see if it has changed between when it was reachable, went unreachable, and then went reachable again. That will tell you if the connection dropped for some reason.

before drop: OPTIONS sip:[email protected]:42630
after recovery: Via: SIP/2.0/TLS xxx.xxx.95.229:64414

Then it seems as though the TLS connection dropped for some reason. You’d need to look at a packet capture to examine the TLS level traffic and what caused it to drop.

and for that use wireshark or tcpdump?

Either. Tcpdump will produce a file that Wireshark can then examine.

you wouldn’t happen to know the proper tcdump command to capture the TLS traffic would you? :slight_smile:

I don’t know what your TLS port is configured to, but this would act as a base[1] with tcp instead of udp. I’m sure there’s also guides and such on the internet.

[1] https://wiki.freepbx.org/display/PPS/Capture+TCPDumps