Instability issue on pjsip extensions

Dear all, I have a huge problem that I cannot seem to resolve, and I hope someone can at least guide me to find a starting point.

I have a production FreePBX 16 (Asterisk 20.0) to which 30 PJSIP extensions are connected on the same LAN. These extensions are Fanvil X4G phones, so there is no NAT or strange configurations; they are all in the same FreePBX network. The system also has 4 PJSIP trunks connected to a local VoIP operator. The FreePBX system is virtualized in Proxmox with 16GB of RAM, 8 processors, a virtio network card, CPU type set to host, and a 10G link to the switch. The issue is that seemingly at random, some extensions disconnect and start to lag, as if they are crashing or running out of memory. Additionally, the message “direct dialing only” appears on the display. For about 5 minutes, the phone is unstable; I cannot make calls, and the buttons feel slow. After a while, everything seems to return to normal. This problem is consistently present but increases exponentially when the phone traffic becomes heavier, as happened today due to a commercial offer. It’s as if there are packets being introduced into the network that cause some phones to malfunction. I cannot reproduce the anomaly alone, but I can provide the following information:

  • All PCs on the same network and switches do not show any issues. The anomaly only affects the phones.
  • During the phone disconnection, I can ping from the FreePBX system to the phones, so there don’t seem to be connectivity issues.
  • The problem started to occur this summer when I upgraded my server and network infrastructure from HPE Gen8 and Qanta switches to HPE Gen10 Plus and Mikrotik CRS354-48P-4S+2Q+RM switches. At that time, FreePBX was at version 15, but I did a fresh installation to upgrade to version 16, restoring backups from version 15. Before this technological transition, I did not have any problems.
  • I tried switching to Asterisk version 18, but the problem persists.
  • During the anomaly, CPU, RAM, and network traffic are normal, and FreePBX responds perfectly to all commands.
  • I disabled completely the firewall

I tried to log during the anomaly phase on extension 235 with IP 192.168.25.146. What I got on freepbx sides is just rows like this in the moment the phones stops to work

[2024-01-12 19:00:26] VERBOSE[31039] res_pjsip/pjsip_configuration.c: Endpoint 235 is now Unreachable
[2024-01-12 19:00:26] VERBOSE[31039] res_pjsip/pjsip_options.c: Contact 235/sip:[email protected]:5580 is now Unreachable.  RTT: 0.000 msec
...
[2024-01-12 19:01:17] VERBOSE[14209] res_pjsip_registrar.c: Added contact 'sip:[email protected]:5528' to AOR '235' with expiration of 3600 seconds
[2024-01-12 19:01:17] VERBOSE[20295] res_pjsip/pjsip_configuration.c: Endpoint 235 is now Reachable
[2024-01-12 19:01:17] VERBOSE[20295] res_pjsip/pjsip_options.c: Contact 235/sip:[email protected]:5528 is now Reachable.  RTT: 31.397 msec
...
[2024-01-12 19:11:52] VERBOSE[2327] res_pjsip/pjsip_options.c: Contact 235/sip:[email protected]:5710 has been deleted
[2024-01-12 19:14:08] VERBOSE[8103] res_pjsip/pjsip_configuration.c: Endpoint 239 is now Unreachable
[2024-01-12 19:14:08] VERBOSE[8103] res_pjsip/pjsip_options.c: Contact 239/sip:[email protected]:5991 is now Unreachable.  RTT: 0.000 msec

and this i what I got on the phone side logging to a remote logserver

2024-01-12 18:59:57	User.Emerg	192.168.25.146	[MGR]  | FATAL  | memShow 1371888
2024-01-12 19:00:17	User.Emerg	192.168.25.146	[MGR]  | FATAL  | memShow 1373424
2024-01-12 19:00:19	User.Warning	192.168.25.146	[SIP]  | WARNING| Transaction 16746 killed.
2024-01-12 19:00:19	User.Info	192.168.25.146	[MGR]  | INFO   | We receive a failure message,err:-60
2024-01-12 19:00:19	User.Notice	192.168.25.146	[SIP]  | NOTICE | free transaction ressource 16746 59922744328159-248701028212249
2024-01-12 19:00:19	User.Notice	192.168.25.146	[SIP]  | NOTICE | free nict ressource
2024-01-12 19:00:37	User.Emerg	192.168.25.146	[MGR]  | FATAL  | memShow 1379600
2024-01-12 19:00:50	User.Info	192.168.25.146	[SIP]  | INFO   | registration_cancel_all ,osip_dialog_release:80a823c0
2024-01-12 19:00:50	User.Warning	192.168.25.146	[SIP]  | WARNING| Call leg is removed. 
2024-01-12 19:00:50	User.Notice	192.168.25.146	[SIP]  | NOTICE | allocating transaction ressource 16747 16082300349246-2558923023118
2024-01-12 19:00:50	User.Notice	192.168.25.146	[SIP]  | NOTICE | allocating NICT context
2024-01-12 19:00:57	User.Emerg	192.168.25.146	[MGR]  | FATAL  | memShow 1374576
2024-01-12 19:01:15	User.Warning	192.168.25.146	[SIP]  | WARNING| OnEvent_New_Incoming4xxResponse!
2024-01-12 19:01:15	User.Warning	192.168.25.146	[SIP]  | WARNING| User need to authenticate to REGISTER!
2024-01-12 19:01:15	User.Notice	192.168.25.146	[SIP]  | NOTICE | allocating transaction ressource 16748 16082300349246-2558923023118
2024-01-12 19:01:15	User.Notice	192.168.25.146	[SIP]  | NOTICE | allocating NICT context
2024-01-12 19:01:15	User.Warning	192.168.25.146	[SIP]  | WARNING| Call leg is removed. 
2024-01-12 19:01:15	User.Notice	192.168.25.146	[SIP]  | NOTICE | allocating transaction ressource 16749 8da6c9b5-e9d7-45d9-8390-b33f4ad01320
2024-01-12 19:01:15	User.Notice	192.168.25.146	[SIP]  | NOTICE | allocating NIST context
2024-01-12 19:01:15	User.Warning	192.168.25.146	[SIP]  | WARNING| nist_options_received(): not fully implemented.

Probably this means something

2024-01-12 19:00:19	User.Info	192.168.25.146	[MGR]  | INFO   | We receive a failure message,err:-60

I initially suspect a network issue, but honestly, I would have expected problems on PCs too at this point or connectivity issues with freepbx. Instead, it seems like connectivity is randomly blocked on some individual random extensions and nothing more.

This thing is making me crazy I cannot find a way out.

I want to add a little more of log information, I found that this warning row

WARNING[19481][C-0000003b] taskprocessor.c: The 'stasis/p:channel:all-000007a0' task processor queue reached 500 scheduled tasks again.

is present over 20.000 thousand times in full.log of an entire day of work that is around 7milion rows, there are no other warnings around except this one

Does this happen with sip extensions too?

all my extensions are pjsip and the trunks too(they go down too randomly) but I don’t know if the issue is related to the taskprocessor error that I catch before, I tried to call the queue with id 2000 from 4 extensions and I had immediately this error, but trunks and extensions seem to go down randomly or probably at a specific period of timr

grep task /var/log/asterisk/full
[2024-01-13 21:36:22] WARNING[22103][C-00000008] taskprocessor.c: The 'stasis/m:manager:core-00000007' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:22] WARNING[22111][C-00000008] taskprocessor.c: The 'stasis/m:channel:all-000000b9' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:23] WARNING[22116][C-00000008] taskprocessor.c: The 'stasis/m:cel:aggregator-00000006' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:26] WARNING[22160][C-00000009] taskprocessor.c: The 'stasis/m:manager:core-00000007' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:30] WARNING[22197][C-0000000a] taskprocessor.c: The 'stasis/m:manager:core-00000007' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:30] WARNING[22204][C-0000000a] taskprocessor.c: The 'stasis/m:channel:all-000000b9' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:39] WARNING[22230][C-00000009] taskprocessor.c: The 'stasis/m:manager:core-00000007' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:39] WARNING[22230][C-00000009] taskprocessor.c: The 'stasis/m:channel:all-000000ba' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:39] WARNING[22232][C-00000009] taskprocessor.c: The 'stasis/m:channel:all-000000b9' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:43] WARNING[22238][C-0000000a] taskprocessor.c: The 'stasis/m:manager:core-00000007' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:43] WARNING[22237][C-0000000a] taskprocessor.c: The 'stasis/m:channel:all-000000ba' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:49] WARNING[22370][C-00000008] taskprocessor.c: The 'stasis/m:channel:all-000000ba' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:49] WARNING[22377][C-00000008] taskprocessor.c: The 'stasis/m:channel:all-000000ba' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:49] WARNING[22365][C-00000008] taskprocessor.c: The 'stasis/m:manager:core-00000007' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:49] WARNING[22367][C-00000008] taskprocessor.c: The 'stasis/m:channel:all-000000b9' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:50] WARNING[22374][C-00000008] taskprocessor.c: The 'stasis/m:channel:all-000000b9' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:50] WARNING[22378][C-00000008] taskprocessor.c: The 'stasis/m:cel:aggregator-00000006' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:52] WARNING[22399][C-00000009] taskprocessor.c: The 'stasis/m:channel:all-000000ba' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:52] WARNING[22391][C-00000009] taskprocessor.c: The 'stasis/m:manager:core-00000007' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:56] WARNING[22406][C-0000000a] taskprocessor.c: The 'stasis/m:manager:core-00000007' task processor queue reached 3000 scheduled tasks again.
[2024-01-13 21:36:56] WARNING[22418][C-0000000a] taskprocessor.c: The 'stasis/m:channel:all-000000ba' task processor queue reached 500 scheduled tasks again.
[2024-01-13 21:36:56] WARNING[22413][C-0000000a] taskprocessor.c: The 'stasis/m:channel:all-000000b9' task processor queue reached 500 scheduled tasks again.

grep "ANCONA_UFFICIO is" /var/log/asterisk/full
[2024-01-13 13:13:39] VERBOSE[11669] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 13:14:01] VERBOSE[4905] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 14:14:54] VERBOSE[4905] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 14:15:16] VERBOSE[30399] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 14:42:24] VERBOSE[32020] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 14:42:46] VERBOSE[11669] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 14:43:14] VERBOSE[15917] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 14:43:36] VERBOSE[19701] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 16:14:29] VERBOSE[9334] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 16:14:51] VERBOSE[27243] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 16:41:59] VERBOSE[32020] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 16:42:21] VERBOSE[15917] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 16:42:49] VERBOSE[27243] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 16:43:11] VERBOSE[9334] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 17:45:19] VERBOSE[30399] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 17:45:41] VERBOSE[30399] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 18:41:34] VERBOSE[22473] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 18:41:56] VERBOSE[11669] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 18:42:24] VERBOSE[28919] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 18:42:46] VERBOSE[22473] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 18:44:04] VERBOSE[15917] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 18:44:26] VERBOSE[9334] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 19:12:24] VERBOSE[14774] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 19:12:46] VERBOSE[27243] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 19:44:54] VERBOSE[2309] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 19:45:16] VERBOSE[11669] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 20:25:18] VERBOSE[3897] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 20:41:11] VERBOSE[8920] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 20:41:33] VERBOSE[8920] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 21:12:01] VERBOSE[8920] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 21:12:23] VERBOSE[8920] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable
[2024-01-13 22:13:16] VERBOSE[24246] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Unreachable
[2024-01-13 22:13:38] VERBOSE[24246] res_pjsip/pjsip_configuration.c: Endpoint ANCONA_UFFICIO is now Reachable


I found the queue with id 2000 that I’m direct calling from an extension to replicate the issue has 18 static agents inside with ringall strategy, agent timeout of 22 seconds and retry every 5, so if I try to call the 2000 queue I will receive the task processor error every 22 seconds, if I reduce from 18 agents to 5 it will not give errors.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.