PJSIP Trunk DNS SRV unstable, only IP address working

We are facing a particular problem which we don’t seem to be able to resolve. Our VOIP provider (cm.com) has a SRV record for its trunks. When we put the DNS SRV record in the SIP-server settings (PJSIP-Settings > General) (in this case nl.voip.cm.com - as provided by the provider) and leave the port empty (as we found out you have to do with a SRV record) - there is not outbound calls possible (inbound is working). The “only” message in the logs are:

24819[2021-03-18 18:02:33] ERROR[19470] res_pjsip.c: Endpoint 'CM.com_Federation': Could not create dialog to invalid URI 'CM.com_Federation'. Is endpoint registered and reachable?

24820[2021-03-18 18:02:33] ERROR[19470] chan_pjsip.c: Failed to create outgoing session to endpoint 'CM.com_Federation'

24821[2021-03-18 18:02:33] WARNING[425][C-00000172] app_dial.c: Unable to create channel of type 'PJSIP' (cause 3 - No route to destination)

So the PJSIP cannot register the endpoint. However suddenly after several minutes of inactivity we get the following lines in the Log (note the time difference between the previous log entries and and this one, only 3 minutes passes and it takes 1 minute to loose connection again).

24850[2021-03-18 18:05:13] VERBOSE[167680] res_pjsip/pjsip_configuration.c: Endpoint CM.com_Federation is now Reachable

24851[2021-03-18 18:05:13] VERBOSE[167680] res_pjsip/pjsip_options.c: Contact CM.com_Federation/sip:[email protected] is now Reachable. RTT: 82.420 msec

24852[2021-03-18 18:06:16] VERBOSE[10827] res_pjsip/pjsip_configuration.c: Endpoint CM.com_Federation is now Unreachable

24853[2021-03-18 18:06:16] VERBOSE[10827] res_pjsip/pjsip_options.c: Contact CM.com_Federation/sip:[email protected] is now Unreachable. RTT: 0.000 msec

No further details why it was reachable and why suddenly it wasn’t anymore. When we change the SIP servers settings the Trunk to one of the IP addresses provided by the VOIP service (they were giving for whitelisting purposes - it all seems to work.

After exactly another 5 minutes the following repeats in the logs:

24854[2021-03-18 18:10:24] NOTICE[8195] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - Failed to authenticate
    
24855[2021-03-18 18:10:24] NOTICE[167680] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - Failed to authenticate
    
24856[2021-03-18 18:10:24] NOTICE[19470] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - Failed to authenticate
   
 24857[2021-03-18 18:10:24] NOTICE[225329] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - No matching endpoint found after 5 tries in 0.675 ms
   
 24858[2021-03-18 18:10:24] NOTICE[225329] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - Failed to authenticate
    
24859[2021-03-18 18:11:13] VERBOSE[167680] res_pjsip/pjsip_configuration.c: Endpoint CM.com_Federation is now Reachable
    
24860[2021-03-18 18:11:13] VERBOSE[167680] res_pjsip/pjsip_options.c: Contact CM.com_Federation/sip:[email protected] is now Reachable. RTT: 69.807 msec
    
24861[2021-03-18 18:12:16] VERBOSE[55683] res_pjsip/pjsip_configuration.c: Endpoint CM.com_Federation is now Unreachable
    
24862[2021-03-18 18:12:16] VERBOSE[55683] res_pjsip/pjsip_options.c: Contact CM.com_Federation/sip:[email protected] is now Unreachable. RTT: 0.000 msec

There is not much help from cm.com - they simply provide and you have to solve your own problems.

Details of our system:
PBX Version: 15.0.17.24
PBX Distro: 12.7.8-2012-1.sng7
Asterisk Version: 18.1.1

Could someone help us where we can find more details error logs for the SIP trunk, to see what is going wrong?

According to the SRV records:

|_sip._udp.nl.voip.cm.com. 120|IN|SRV|0 25 5060 gw03.nl.voip.cm.com.|
|_sip._udp.nl.voip.cm.com. 120|IN|SRV|0 25 5060 gw02.nl.voip.cm.com.|
|_sip._udp.nl.voip.cm.com. 120|IN|SRV|0 25 5060 gw04.nl.voip.cm.com.|
|_sip._udp.nl.voip.cm.com. 120|IN|SRV|0 25 5060 gw01.nl.voip.cm.com.|

They are doing load balancing. This means that Asterisk will distribute requests, so it is possible that one doesn’t respond but another does and as the SIP OPTIONS changes targets it goes reachable/unreachable. You can see the SIP traffic using “pjsip set logger on” and this would include any retransmissions.

Thank you for your response. Setting “pjsip set logger on” makes watching the log almost impossible as every second there are numerous lines added. I managed to extract some lines but I cannot make any sense out of it:

[2021-03-18 18:46:14] VERBOSE[2116] res_pjsip_logger.c: <--- Transmitting SIP request (454 bytes) to UDP:31.169.63.1:5060 --->
33542OPTIONS sip:[email protected] SIP/2.0
33543Via: SIP/2.0/UDP 81.82.227.130:5099;rport;branch=z9hG4bKPj6edd7d0e-0339-481b-8516-dd04dd63aec2
33544From: <sip:[email protected]>;tag=d556502e-8a2f-47d8-a65f-ee7a50c9add1
33545To: <sip:[email protected]>
33546Contact: <sip:[email protected]:5099>
33547Call-ID: 9084e2fc-e143-4bfa-8e64-0c8233579c17
33548CSeq: 38223 OPTIONS
33549Max-Forwards: 70
33550User-Agent: FPBX-15.0.17.24(18.1.1)
33551Content-Length: 0
33552
33553
33554[2021-03-18 18:46:15] VERBOSE[2116] res_pjsip_logger.c: <--- Transmitting SIP request (454 bytes) to UDP:31.169.63.1:5060 --->
33555OPTIONS sip:[email protected] SIP/2.0
33556Via: SIP/2.0/UDP 81.82.227.130:5099;rport;branch=z9hG4bKPj6edd7d0e-0339-481b-8516-dd04dd63aec2
33557From: <sip:[email protected]>;tag=d556502e-8a2f-47d8-a65f-ee7a50c9add1
33558To: <sip:[email protected]>
33559Contact: <sip:[email protected]:5099>
33560Call-ID: 9084e2fc-e143-4bfa-8e64-0c8233579c17
33561CSeq: 38223 OPTIONS
33562Max-Forwards: 70
33563User-Agent: FPBX-15.0.17.24(18.1.1)
33564Content-Length: 0
33565
33566
33567[2021-03-18 18:46:17] VERBOSE[2116] res_pjsip_logger.c: <--- Transmitting SIP request (454 bytes) to UDP:31.169.63.1:5060 --->
33568OPTIONS sip:[email protected] SIP/2.0
33569Via: SIP/2.0/UDP 81.82.227.130:5099;rport;branch=z9hG4bKPj6edd7d0e-0339-481b-8516-dd04dd63aec2
33570From: <sip:[email protected]>;tag=d556502e-8a2f-47d8-a65f-ee7a50c9add1
33571To: <sip:[email protected]>
33572Contact: <sip:[email protected]:5099>
33573Call-ID: 9084e2fc-e143-4bfa-8e64-0c8233579c17
33574CSeq: 38223 OPTIONS
33575Max-Forwards: 70
33576User-Agent: FPBX-15.0.17.24(18.1.1)
33577Content-Length: 0

As you can see every second there are these lines appearing, but no real error messages that I can see. It would be strange that 80% of their loadbalancing servers are unreachable, as the phones basically cannot make phone calls 80% of the time. Luckily we are in a test phase, but tomorrow we are going live.

That would make it seem as though the server at that address is not responding to Asterisk, or something is firewalling it.

When we enter the IP address in the SIP Server with port 5060, it works as a charm. Just DNS SRV seems to create this, as if Asterisk cannot deal properly with the SRV record?

I’ve seen no other reports of SRV issues, so I don’t think that it can’t handle it. I can only comment based on the logging presented. If you showed logging using explicitly the IP address then it may show the difference. There could be a difference in the SIP packet itself that the provider doesn’t like for some reason.

We changed to the IP address now and this is the result from the log:

52576[2021-03-18 19:00:13] VERBOSE[10827] res_pjsip_logger.c: <--- Transmitting SIP request (453 bytes) to UDP:31.169.63.1:5060 --->
52577OPTIONS sip:[email protected]:5060 SIP/2.0
52578Via: SIP/2.0/UDP 81.82.227.130:5099;rport;branch=z9hG4bKPj3c428030-d90c-4d7c-ac67-a229fbe7fa68
52579From: <sip:[email protected]>;tag=50107e86-65c9-4900-bbb3-5edfa859a4d7
52580To: <sip:[email protected]>
52581Contact: <sip:[email protected]:5099>
52582Call-ID: 1a2d3c11-9c3f-4569-a0fb-a8910353945e
52583CSeq: 52638 OPTIONS
52584Max-Forwards: 70
52585User-Agent: FPBX-15.0.17.24(18.1.1)
52586Content-Length: 0
52587
52588
52589[2021-03-18 19:00:13] VERBOSE[2116] res_pjsip_logger.c: <--- Received SIP response (386 bytes) from UDP:31.169.63.1:5060 --->
52590SIP/2.0 200 OK
52591Via: SIP/2.0/UDP 81.82.227.130:5099;rport=5099;received=81.82.227.130;branch=z9hG4bKPj3c428030-d90c-4d7c-ac67-a229fbe7fa68
52592From: <sip:[email protected]>;tag=50107e86-65c9-4900-bbb3-5edfa859a4d7
52593To: <sip:[email protected]>;tag=4e662dd9a46adf76e3e2c207b4b77dae.f18a
52594Call-ID: 1a2d3c11-9c3f-4569-a0fb-a8910353945e
52595CSeq: 52638 OPTIONS
52596Content-Length: 0

It is very difficult to capture the log, as it gives much information (i guess that’s why it is switched off normally ;-)).

Would it be bad to keep it to the IP address and port and not using the SRV record?

That changes the SIP signaling, specifically the contents of it switch from referring to the hostname to IP address. I would expect them to accept either, and I can’t think of a way to force the IP address to be used when using hostname.

The consequence of not using a hostname is no load balancing on their side, and if they change IPs then you have to be aware of it.

Can you see from the log entry that there is a difference between using SRV record and the IP address? Because now it works with IP, but with the SRV record there is not outbound calls possible.

Yes, as I stated the difference is that in the SIP OPTIONS it refers to the hostname instead of the IP address in the SRV case. The remote side seems to ignore us in that case, I don’t know why.

Thanks … at least I know a little bit more what is going. Will investigate further.

BTW:
You get much better readable SIP traces by writing a pcap file and inspecting it afterwards with wireshark. You can do this with:

asterisk -x "pjsip set logger on"
asterisk -x "pjsip set logger verbose off"
asterisk -x "pjsip set logger pcap trace.pcap"

The trace.pcap could be found in /var/lib/asterisk/
You can stop it with

asterisk -x "pjsip set logger off"

EDIT:
Some more thoughts:
There are providers out there, which rely on always using the same destination server which has been used for registration. Asterisk can’t handle this.

I would recommend sngrep with it’s simple but comprehensive filtering.

You have to do “pjsip set logger pcap trace.pcap” before starting logging.

Hmm, I’m using this 3 line script as written above since months w/o any problems. Don’t know where you see a problem :slight_smile:

Just thinking OOTB , does nl.voip.cm.com allow you to send/accept calls using the result of your own SRV records ? (given that you have control over the DNS of adventist.be, i.e. pbx.adventist.be and not the IP address)

So it seems that it was a temporary issue - today everything seems to work. There are 200 OK responses on the SRV DNS nl.voip.cm.com. Thanks for the insight into the logging. This helped greatly.

It turned out to be a firewall issue. For anyone facing similar problems. When we configured the Trunk to connect to an IP address and not the DNS-SRV record, the 200 OK response would happily come back. When changed do DNS-SRV record, no response would come back.

Even though we thought it was a temporary issue, sometimes the trunk would get a 200 OK response with the DNS-SRV record. It turned out that we configured a second trunk to connect via IP, and whenever the first trunk, configured to use the DNS-SRV record, would hit the same IP address as the second trunk, the 200 OK response would get through. Whenever the DNS-SRV record would hit another IP address the 200 OK response would not get through.

We contacted our provider, thinking it would be the issue as a probable cause mentioned before by @dirk2358:

There are providers out there, which rely on always using the same destination server which has been used for registration. Asterisk can’t handle this.

The response of our provider was that they were sending out the 200 OK responses and that they registered the SIP trunk at all their gateways, so the issue mentioned before was none issue.

So it had to be on our side and the first thing that I thought was that the firewall was playing up. We switched it off and suddenly the trunk remained up.

To cut a long story short - we had to add the domain nl.voip.cm.com to firewall exception list of freepbx and all 200 OK responses came through.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.