Performance issue when Queue Autofill is enabled

We are running a server with Asterisk 18.20.0 and FreePBX 16 with below configuration;

  • There are 6 Queues and 2 out of the 6 queues get around 90% of the calls.
  • 160 extensions total. Around 80 - 110 allocated to each queue as Static agents.
  • Maximum SIP channel count : 160
  • Queue log and CDR and a Custom AMI application to capture call events to the database. There are no slow queries and all log tables are archived and flushed daily.
  • HW Spec: 45 vcores, 25GB RAM & SSD Storage (VMware Cloud)

When the Autofill option is enabled in the queues and concurrent call count is above 100, the CPU load average would reach over 50 and the system would start slowing down.
stasis/m:manager:core in task processor would start Queueing to around 2000000+. Memory usage will reach 90% in relation to the process queue.
If Autofill is off system would function without any issues with Load average of ~15.
We need autofill behavior as we need to distribute multiple calls all free extensions at the same time.

Any suggestions of what to look at to resolve this issue.
Thanks in advance for the help

Some ideas, not all of them may be for you.

Change your skilling strategy to include penalties. Grouping users at different penalty levels (2-4 levels), could still leave you with a large pool to answer right away, but not try to call every single extension at the same time.

Skip Busy Agents = Yes + Ringinuse = no. This would stop calling extensions that are not available, reducing the total extensions called at the same time.

You could also consider changing queue members from local to SIP tech.

/var/www/html/admin/modules/queues/page.queues.php

Changed from: (just search for “Local”)
$members[$key] = “Local/”.$members[$key].“@from-internal/n,”.$penalty_val;
to:
$members[$key] = “SIP/”.$members[$key].“,”.$penalty_val;

This would bypass alot of the agis/code written for local extensions, but also speed things up. There may be side effects that impact your system, depending on unique workflows. You’ll have to watch it.

After changing to member=PJSIP/… there is no task processor queuing.

This would bypass alot of the agis/code written for local extensions

What features might be lost when utilizing this method?

Hard to say what will be problematic for you, if anything. You’ll need to test.

Device & User Mode - PBX GUI - Sangoma Documentation (atlassian.net)

I did same testing with this method and the major change we identified is queue log now captures transfer events.
ATTENDEDTRANSFER is captured which is not causing any issue.

BLINDTRANSFER calls will not end with COMPLETEAGENT / COMPLETECALLER. , which is breaking the existing AMI app.

Is that supposed to be normal behavior ?

Likely. Any AMIs that references the “local” channel will likely have some issue (partial or full break).

First, this is wrong. The page.queues.php file has none of this in it. This is found in Queues.class.php for writing out the configs and members in the config.

Second, it doesn’t use from-internal as the context. The queue config is written out as Local/<extnumber>@from-queue/n,$penatly_val so changing it to from-internal will break how the call flows.

There are no AMI actions referencing Local for the queues. However, every other function in the dialplan such as logging an agent on, logging off, pausing an agent and others are based on Local/<extnumber>@from-queue/n being how agents are called.

When you’re doing PJSIP/<extnumber> as the dial string for the Agent, it doesn’t route the call back into the dialplan so all the options and features for calling an extension (ignoring call waiting, following or not following Call Forward or FollowMe settings, adding queue wait times or other data to the call). It just straight up dials out to the endpoint, ignoring everything else.

So the AMI event fires with these details when you use Local/<extnumber>@from-queue?

What I did was edit the queues_additional.conf directly without changing anything within FreePBX. This is just a temp workaround for testing purposes.

ex:
member=PJSIP/440,0,"Test",hint:440@ext-local

So the AMI event fires with these details when you use Local/@from-queue?

The AMI App is written in a way that it also refers queue log events from the database, so when a call ends at BLINDTRANSFER it affects the process.
What I need to know is whether this is normal queue log record ? From past experience all queue log records end at COMPLETEAGENT or COMPLETECALLER.

I raised this concern in Asterisk community also but didn’t get any response.