We have a FreePBX system with mostly S series phones connected. No issues with any S series phones. However we have one Digium D65 phone keeps unregistering every week or two. This happens to be the owners phone. Worst possible phone to have the issue with.
After a POE port cycle the phone connects right away. I believe we can ping the phone prior to reboot as well. So the D65 just randomly un-registers. Phone has latest firmware. Lastest FreePBX version.
Endpoint manager is setup to use provisioning url as the external pbx dns url. We have lets encrypt setup. The dhcp dns server for the voice vlan is using the local firewall. We have a local dns resolver record setup pointing to local IP of PBX.
The Preferred PJSIP Transport is currently set to UDP.
Is this a UDP timeout issue? Maybe changing to TLS will fix the issue? I noticed the post below:
Any help would be appreciated.
I read this article:
Mentioning If you are experiencing issues where either internal or external Digium phones become unreachable over time, the issue may be caused by ports being closed on networking equipment between the Switchvox and the phones. Many routers or firewalls that deal with NAT will close ports if there is no activity after a certain period of time. By enabling UDP Persist-Connection using the steps below, you can force the phone to periodically send a SIP Options message to Switchvox, keeping the port open:
- Navigate to Server → Networking → Phone Networks
- Select the network the phones are using and click Modify
- Click Advanced Settings
- Set Enable UDP Persist-Connection to Yes
- Set UDP Persist-Connection Interval to 30
- Click Save Phone Network
I’m just here to offer some thoughts, not to carry this across the finish line. Good luck.
I thought EPM enabled the UDP keepalive interval (its defaults), but I don’t know. Maybe @lgaetz does…
Anyway, the phone, by default, does not enable a UDP keep alive timer. It has to be turned on. If there’s no UDP keep alive timer, and the phone is separated from the SIP server with which it wishes to communicate by a NAT router, then it’s best to enable the keep alive timer. That’ll try to keep the NAT port mapping hole open on the router (by sending periodic bits of data - CRLF), so that Asterisk still has an active path back to the phone.
That said, the phone should be reregistering periodically, even if a NAT hole was closed. When that happens, you’d have a new mapping created, and all would be well again…you’d have to see what the config says (what did Asterisk suggest as the reregistration time in the response to the REGISTER…the phone should re-register at half that plus or minus 10 seconds)
You could look at TCP as a transport option. If the phone sees the connection torn down, it’ll immediately try to recreate it. TLS is also okay (it’s SSL on top of TCP), but you’ll need to deal with certs then - the phone will reject self-signed certs by default, so you’ll either have to tell the phone to “allow dangerous unsigned SSL” or load the self-signed cert in-line in the phone’s config (see the configuration element in the phones’ XML config guides on wiki.sangoma.com / wiki.freepbx.org)
If none of that’s working, then it’s probably time to look at logs. IF this is happening at a one-two week period, the phone won’t maintain internal logs for that long. So you’d need to setup a syslog server and turn the phone’s remote logging on, and send that logging to the syslog server. And you’d need to watch that to see what the phone says is going on. You’d want to set the log level to debug.
Also, make sure to use latest available firmwares, etc., etc.
Thanks for the info and helpful suggestions.
The phone template had UDP keep alives enabled. The setting was at 60 seconds and lowered it to 25 seconds. The default is zero. I’m not sure what zero would do? I also increased the UDP timeout’s on the firewall. Hopefully that will fix it. If not I guess sys logging…
60 seconds is good for lots of things, but there are other things out there, cough Adtran cough, that need something about twice that rate, i.e. 30 seconds.
Zero is equivalent to off.
You could go TCP for signaling pretty easily without syslogging for weeks on-end.