We are facing a particular problem which we don’t seem to be able to resolve. Our VOIP provider (cm.com) has a SRV record for its trunks. When we put the DNS SRV record in the SIP-server settings (PJSIP-Settings > General) (in this case nl.voip.cm.com - as provided by the provider) and leave the port empty (as we found out you have to do with a SRV record) - there is not outbound calls possible (inbound is working). The “only” message in the logs are:
24819[2021-03-18 18:02:33] ERROR[19470] res_pjsip.c: Endpoint 'CM.com_Federation': Could not create dialog to invalid URI 'CM.com_Federation'. Is endpoint registered and reachable?
24820[2021-03-18 18:02:33] ERROR[19470] chan_pjsip.c: Failed to create outgoing session to endpoint 'CM.com_Federation'
24821[2021-03-18 18:02:33] WARNING[425][C-00000172] app_dial.c: Unable to create channel of type 'PJSIP' (cause 3 - No route to destination)
So the PJSIP cannot register the endpoint. However suddenly after several minutes of inactivity we get the following lines in the Log (note the time difference between the previous log entries and and this one, only 3 minutes passes and it takes 1 minute to loose connection again).
24850[2021-03-18 18:05:13] VERBOSE[167680] res_pjsip/pjsip_configuration.c: Endpoint CM.com_Federation is now Reachable
24851[2021-03-18 18:05:13] VERBOSE[167680] res_pjsip/pjsip_options.c: Contact CM.com_Federation/sip:[email protected] is now Reachable. RTT: 82.420 msec
24852[2021-03-18 18:06:16] VERBOSE[10827] res_pjsip/pjsip_configuration.c: Endpoint CM.com_Federation is now Unreachable
24853[2021-03-18 18:06:16] VERBOSE[10827] res_pjsip/pjsip_options.c: Contact CM.com_Federation/sip:[email protected] is now Unreachable. RTT: 0.000 msec
No further details why it was reachable and why suddenly it wasn’t anymore. When we change the SIP servers settings the Trunk to one of the IP addresses provided by the VOIP service (they were giving for whitelisting purposes - it all seems to work.
After exactly another 5 minutes the following repeats in the logs:
24854[2021-03-18 18:10:24] NOTICE[8195] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - Failed to authenticate
24855[2021-03-18 18:10:24] NOTICE[167680] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - Failed to authenticate
24856[2021-03-18 18:10:24] NOTICE[19470] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - Failed to authenticate
24857[2021-03-18 18:10:24] NOTICE[225329] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - No matching endpoint found after 5 tries in 0.675 ms
24858[2021-03-18 18:10:24] NOTICE[225329] res_pjsip/pjsip_distributor.c: Request 'INVITE' from '<sip:[email protected]>' failed for '103.145.13.73:60152' (callid: 1350377613-635327929-1644842666) - Failed to authenticate
24859[2021-03-18 18:11:13] VERBOSE[167680] res_pjsip/pjsip_configuration.c: Endpoint CM.com_Federation is now Reachable
24860[2021-03-18 18:11:13] VERBOSE[167680] res_pjsip/pjsip_options.c: Contact CM.com_Federation/sip:[email protected] is now Reachable. RTT: 69.807 msec
24861[2021-03-18 18:12:16] VERBOSE[55683] res_pjsip/pjsip_configuration.c: Endpoint CM.com_Federation is now Unreachable
24862[2021-03-18 18:12:16] VERBOSE[55683] res_pjsip/pjsip_options.c: Contact CM.com_Federation/sip:[email protected] is now Unreachable. RTT: 0.000 msec
There is not much help from cm.com - they simply provide and you have to solve your own problems.
They are doing load balancing. This means that Asterisk will distribute requests, so it is possible that one doesn’t respond but another does and as the SIP OPTIONS changes targets it goes reachable/unreachable. You can see the SIP traffic using “pjsip set logger on” and this would include any retransmissions.
Thank you for your response. Setting “pjsip set logger on” makes watching the log almost impossible as every second there are numerous lines added. I managed to extract some lines but I cannot make any sense out of it:
As you can see every second there are these lines appearing, but no real error messages that I can see. It would be strange that 80% of their loadbalancing servers are unreachable, as the phones basically cannot make phone calls 80% of the time. Luckily we are in a test phase, but tomorrow we are going live.
When we enter the IP address in the SIP Server with port 5060, it works as a charm. Just DNS SRV seems to create this, as if Asterisk cannot deal properly with the SRV record?
I’ve seen no other reports of SRV issues, so I don’t think that it can’t handle it. I can only comment based on the logging presented. If you showed logging using explicitly the IP address then it may show the difference. There could be a difference in the SIP packet itself that the provider doesn’t like for some reason.
That changes the SIP signaling, specifically the contents of it switch from referring to the hostname to IP address. I would expect them to accept either, and I can’t think of a way to force the IP address to be used when using hostname.
The consequence of not using a hostname is no load balancing on their side, and if they change IPs then you have to be aware of it.
Can you see from the log entry that there is a difference between using SRV record and the IP address? Because now it works with IP, but with the SRV record there is not outbound calls possible.
Yes, as I stated the difference is that in the SIP OPTIONS it refers to the hostname instead of the IP address in the SRV case. The remote side seems to ignore us in that case, I don’t know why.
BTW:
You get much better readable SIP traces by writing a pcap file and inspecting it afterwards with wireshark. You can do this with:
asterisk -x "pjsip set logger on"
asterisk -x "pjsip set logger verbose off"
asterisk -x "pjsip set logger pcap trace.pcap"
The trace.pcap could be found in /var/lib/asterisk/
You can stop it with
asterisk -x "pjsip set logger off"
EDIT:
Some more thoughts:
There are providers out there, which rely on always using the same destination server which has been used for registration. Asterisk can’t handle this.
Just thinking OOTB , does nl.voip.cm.com allow you to send/accept calls using the result of your own SRV records ? (given that you have control over the DNS of adventist.be, i.e. pbx.adventist.be and not the IP address)
So it seems that it was a temporary issue - today everything seems to work. There are 200 OK responses on the SRV DNS nl.voip.cm.com. Thanks for the insight into the logging. This helped greatly.
It turned out to be a firewall issue. For anyone facing similar problems. When we configured the Trunk to connect to an IP address and not the DNS-SRV record, the 200 OK response would happily come back. When changed do DNS-SRV record, no response would come back.
Even though we thought it was a temporary issue, sometimes the trunk would get a 200 OK response with the DNS-SRV record. It turned out that we configured a second trunk to connect via IP, and whenever the first trunk, configured to use the DNS-SRV record, would hit the same IP address as the second trunk, the 200 OK response would get through. Whenever the DNS-SRV record would hit another IP address the 200 OK response would not get through.
We contacted our provider, thinking it would be the issue as a probable cause mentioned before by @dirk2358:
There are providers out there, which rely on always using the same destination server which has been used for registration. Asterisk can’t handle this.
The response of our provider was that they were sending out the 200 OK responses and that they registered the SIP trunk at all their gateways, so the issue mentioned before was none issue.
So it had to be on our side and the first thing that I thought was that the firewall was playing up. We switched it off and suddenly the trunk remained up.
To cut a long story short - we had to add the domain nl.voip.cm.com to firewall exception list of freepbx and all 200 OK responses came through.