Dear all, I have a huge problem that I cannot seem to resolve, and I hope someone can at least guide me to find a starting point.
I have a production FreePBX 16 (Asterisk 20.0) to which 30 PJSIP extensions are connected on the same LAN. These extensions are Fanvil X4G phones, so there is no NAT or strange configurations; they are all in the same FreePBX network. The system also has 4 PJSIP trunks connected to a local VoIP operator. The FreePBX system is virtualized in Proxmox with 16GB of RAM, 8 processors, a virtio network card, CPU type set to host, and a 10G link to the switch. The issue is that seemingly at random, some extensions disconnect and start to lag, as if they are crashing or running out of memory. Additionally, the message “direct dialing only” appears on the display. For about 5 minutes, the phone is unstable; I cannot make calls, and the buttons feel slow. After a while, everything seems to return to normal. This problem is consistently present but increases exponentially when the phone traffic becomes heavier, as happened today due to a commercial offer. It’s as if there are packets being introduced into the network that cause some phones to malfunction. I cannot reproduce the anomaly alone, but I can provide the following information:
- All PCs on the same network and switches do not show any issues. The anomaly only affects the phones.
- During the phone disconnection, I can ping from the FreePBX system to the phones, so there don’t seem to be connectivity issues.
- The problem started to occur this summer when I upgraded my server and network infrastructure from HPE Gen8 and Qanta switches to HPE Gen10 Plus and Mikrotik CRS354-48P-4S+2Q+RM switches. At that time, FreePBX was at version 15, but I did a fresh installation to upgrade to version 16, restoring backups from version 15. Before this technological transition, I did not have any problems.
- I tried switching to Asterisk version 18, but the problem persists.
- During the anomaly, CPU, RAM, and network traffic are normal, and FreePBX responds perfectly to all commands.
- I disabled completely the firewall
I tried to log during the anomaly phase on extension 235 with IP 192.168.25.146. What I got on freepbx sides is just rows like this in the moment the phones stops to work
[2024-01-12 19:00:26] VERBOSE[31039] res_pjsip/pjsip_configuration.c: Endpoint 235 is now Unreachable
[2024-01-12 19:00:26] VERBOSE[31039] res_pjsip/pjsip_options.c: Contact 235/sip:[email protected]:5580 is now Unreachable. RTT: 0.000 msec
...
[2024-01-12 19:01:17] VERBOSE[14209] res_pjsip_registrar.c: Added contact 'sip:[email protected]:5528' to AOR '235' with expiration of 3600 seconds
[2024-01-12 19:01:17] VERBOSE[20295] res_pjsip/pjsip_configuration.c: Endpoint 235 is now Reachable
[2024-01-12 19:01:17] VERBOSE[20295] res_pjsip/pjsip_options.c: Contact 235/sip:[email protected]:5528 is now Reachable. RTT: 31.397 msec
...
[2024-01-12 19:11:52] VERBOSE[2327] res_pjsip/pjsip_options.c: Contact 235/sip:[email protected]:5710 has been deleted
[2024-01-12 19:14:08] VERBOSE[8103] res_pjsip/pjsip_configuration.c: Endpoint 239 is now Unreachable
[2024-01-12 19:14:08] VERBOSE[8103] res_pjsip/pjsip_options.c: Contact 239/sip:[email protected]:5991 is now Unreachable. RTT: 0.000 msec
and this i what I got on the phone side logging to a remote logserver
2024-01-12 18:59:57 User.Emerg 192.168.25.146 [MGR] | FATAL | memShow 1371888
2024-01-12 19:00:17 User.Emerg 192.168.25.146 [MGR] | FATAL | memShow 1373424
2024-01-12 19:00:19 User.Warning 192.168.25.146 [SIP] | WARNING| Transaction 16746 killed.
2024-01-12 19:00:19 User.Info 192.168.25.146 [MGR] | INFO | We receive a failure message,err:-60
2024-01-12 19:00:19 User.Notice 192.168.25.146 [SIP] | NOTICE | free transaction ressource 16746 59922744328159-248701028212249
2024-01-12 19:00:19 User.Notice 192.168.25.146 [SIP] | NOTICE | free nict ressource
2024-01-12 19:00:37 User.Emerg 192.168.25.146 [MGR] | FATAL | memShow 1379600
2024-01-12 19:00:50 User.Info 192.168.25.146 [SIP] | INFO | registration_cancel_all ,osip_dialog_release:80a823c0
2024-01-12 19:00:50 User.Warning 192.168.25.146 [SIP] | WARNING| Call leg is removed.
2024-01-12 19:00:50 User.Notice 192.168.25.146 [SIP] | NOTICE | allocating transaction ressource 16747 16082300349246-2558923023118
2024-01-12 19:00:50 User.Notice 192.168.25.146 [SIP] | NOTICE | allocating NICT context
2024-01-12 19:00:57 User.Emerg 192.168.25.146 [MGR] | FATAL | memShow 1374576
2024-01-12 19:01:15 User.Warning 192.168.25.146 [SIP] | WARNING| OnEvent_New_Incoming4xxResponse!
2024-01-12 19:01:15 User.Warning 192.168.25.146 [SIP] | WARNING| User need to authenticate to REGISTER!
2024-01-12 19:01:15 User.Notice 192.168.25.146 [SIP] | NOTICE | allocating transaction ressource 16748 16082300349246-2558923023118
2024-01-12 19:01:15 User.Notice 192.168.25.146 [SIP] | NOTICE | allocating NICT context
2024-01-12 19:01:15 User.Warning 192.168.25.146 [SIP] | WARNING| Call leg is removed.
2024-01-12 19:01:15 User.Notice 192.168.25.146 [SIP] | NOTICE | allocating transaction ressource 16749 8da6c9b5-e9d7-45d9-8390-b33f4ad01320
2024-01-12 19:01:15 User.Notice 192.168.25.146 [SIP] | NOTICE | allocating NIST context
2024-01-12 19:01:15 User.Warning 192.168.25.146 [SIP] | WARNING| nist_options_received(): not fully implemented.
Probably this means something
2024-01-12 19:00:19 User.Info 192.168.25.146 [MGR] | INFO | We receive a failure message,err:-60
I initially suspect a network issue, but honestly, I would have expected problems on PCs too at this point or connectivity issues with freepbx. Instead, it seems like connectivity is randomly blocked on some individual random extensions and nothing more.
This thing is making me crazy I cannot find a way out.