Queues - dialparties.agi & high CPU load

Fortel · June 10, 2009, 3:46pm

A friend of my has an inbound call center which now has 30 agents. Set him up with an older HP DL360, which has been working well for a couple of years. The only issue has been with the call distribution- there’s an IVR that times out to a Queue.

Despite the “Ring All” setting, the phones will ring at different times, and without any predictability. What caught my eye is the CPU starts working hard, and is pretty quickly maxed out with multiple dialparties.agi consuming the vast majority. With the high CPU load, the sound breaks down…

Anytime a Queue is invoked, the CPU is taxed heavily. Phones will ring sporadically, or not at all. Any Queue configuration I’ve tested has the problem. Ring Groups do not have the problem, so for now, our “solution” is to use groups, although we lose a lot of functionality.

I’ve done everything I could read about to fix this. There are no loops. All the phones are registered and set as static agents. I’ve cloned the setup to a much higher performance machine, and find the same problem. With fewer extensions, the problem diminishes.

Wondering if anyone that uses queues has luck with all phones ringing simultaneously (in a “ringall” config) and a more normal load on the CPU…

This is on a Asterisk 1.4.21.2 PIAF 1.3 FreePBX 2.5

Any ideas?

Thanks,

Peter

plindheimer · June 10, 2009, 4:12pm

ringall is going to be very taxing on your system as dialparties.agi will be called for everyone. There are some workarounds at different levels that you can use.

For starters, you probably want to get a hold of the 2.6 version of queues and use a setting which includes ringinuse as that will avoid attempting to call agents (members) that the queue knows are already taking a queue call. This does have some drawbacks as you will see in the tooltip when setting that setting.

This will still try to ring every agent that is not on a call form the queue, even if you have Skip Busy Agent set (dialparties will be called but not ring them). If you have this situation (a lot of agents on non-queue calls) they you can have a look at this patch:

https://issues.asterisk.org/view.php?id=15168

if you go that route you will need some additional patches to the queue code which I have but have not been checked into svn yet because the Asterisk patch above is still being discussed. You may alternatively want to have a look at #3562 and #3496 that can help if you are in standard FreePBX “extension” mode:

Alternatively, you may want to setup queue penalties to keep so many from ringing unless the lower penalties are all busy/unavailable. Just be aware that if you do this, and any single queue agent is available but does not answer their phone (it rings), then the higher penalties will not be called.

Fortel · June 10, 2009, 4:30pm

Thanks for the quick response, Philippe!

I will study the information- thanks for posting the links.

Not sure where to get Queues 2.6 though…

Peter

escape2mtns · July 7, 2009, 7:21pm

I hate to start arbitrarily swapping out software… but I’ll need a bulletproof vest to get past the call center director if I can’t come up with something.

Calls are being dropped and then callers call back and curse out our operators. It’s not pretty. Any idea/guess?

Thanks

escape2mtns · July 7, 2009, 3:04pm

We’re having a very similar issue.

We’re using Asterisk version 1.4.25.1 with FreePBX version 2.5.1.5. We have less than 10 queues, each with static agents. Three of the queues have 12 static members each, with all members using PolyCom SIP phones, set in ringall strategy.

Everything works great except that several times a day Asterisk crashes and restarts itself. Afterward I notice there are dialparties.agi processes stuck out there (this morning I noticed several 10 hours old) that consume cpu up to 99%.

The asterisk system ended with exit status 139 on signal 11.

I’m sure they’re probably related. Any idea if one is causing the other? Would the above fixes address my issues or is there something more fundamental going on?

Thanks

plindheimer · July 7, 2009, 4:32pm

I haven’t heard that one before, something is hanging those processes - how much memory do you have on your system?

escape2mtns · July 7, 2009, 4:40pm

We have 4GB of ram on that box.

escape2mtns · July 7, 2009, 5:11pm

BTW, I submitted a paid ticket with log files if that would help… ZPV-610725

Thanks

plindheimer · July 7, 2009, 7:32pm

escape2mtns,

if you submitted a paid ticket then that team will have a look and work with you as needed. The fact that dialparties is hanging is troublesome, there may be something with your php.ini but best to leave that discussion with the engineer who is going to help you.

escape2mtns · July 7, 2009, 7:33pm

I just noticed the following lines in /var/log/messages at 9:45:49 this morning…

Jul 7 09:45:49 freepbx php: /var/lib/asterisk/agi-bin/recordingcheck[114]: Unde
fined offset: 1
Jul 7 10:01:49 freepbx last message repeated 2 times

Could this be related?

plindheimer · July 7, 2009, 7:34pm

no - it’s probably an undefined variable in recordingcheck that may need to be fixed, but it would not be related to what you are seeing.

escape2mtns · July 7, 2009, 9:13pm

Is it possible that we’re using a version that’s too new? Should we downgrade?

escape2mtns · July 7, 2009, 9:39pm

Looking through the log files I noticed the following on multiple occasions:

app_queue.c: The device state of this queue member, Local/200@from-internal/n, is still ‘Not in Use’ when it probably should not be! Please check UPGRADE.txt for correct configuration settings.

[Jul 7 10:39:29] VERBOSE[27346] logger.c: dialparties.agi: Methodology of ring is ‘none’
[Jul 7 10:39:29] VERBOSE[27346] logger.c: – dialparties.agi: Added extension 200 to extension map
[Jul 7 10:39:29] VERBOSE[27346] logger.c: – dialparties.agi: Extension 200 cf is disabled
[Jul 7 10:39:29] VERBOSE[27346] logger.c: – dialparties.agi: Extension 200 do not disturb is disabled
[Jul 7 10:39:29] VERBOSE[27346] logger.c: dialparties.agi: Extension 200 has ExtensionState: 0
[Jul 7 10:39:29] VERBOSE[27346] logger.c: – dialparties.agi: Checking CW and CFB status for extension 200
[Jul 7 10:39:29] VERBOSE[27346] logger.c: – dialparties.agi: dbset CALLTRACE/200 to 6514427031
[Jul 7 10:39:29] VERBOSE[27346] logger.c: – dialparties.agi: Filtered ARG3: 200
[Jul 7 10:39:29] VERBOSE[27353] logger.c: == Manager ‘admin’ logged off from 127.0.0.1
[Jul 7 10:39:29] VERBOSE[27346] logger.c: – AGI Script dialparties.agi completed, returning 0
[Jul 7 10:39:29] DEBUG[27346] app_macro.c: Executed application: AGI
[Jul 7 10:39:29] VERBOSE[27346] logger.c: – Executing [s@macro-dial:7] Dial(“Local/200@from-internal-1eb8,2”, “SIP/200||trM(auto-blkvm)”) in new stack
[Jul 7 10:39:29] VERBOSE[27346] logger.c: – Called 200
[Jul 7 10:39:29] VERBOSE[27343] logger.c: – Local/200@from-internal-1eb8,1 is ringing
[Jul 7 10:39:30] VERBOSE[27346] logger.c: – SIP/200-082f0f98 is ringing
[Jul 7 10:39:31] VERBOSE[27346] logger.c: – SIP/200-082f0f98 answered Local/200@from-internal-1eb8,2
[Jul 7 10:39:31] VERBOSE[27346] logger.c: – Executing [s@macro-auto-blkvm:1] Set(“SIP/200-082f0f98”, “__MACRO_RESULT=”) in new stack
[Jul 7 10:39:31] DEBUG[27346] app_macro.c: Executed application: Set
[Jul 7 10:39:31] VERBOSE[27346] logger.c: – Executing [s@macro-auto-blkvm:2] DBdel(“SIP/200-082f0f98”, “BLKVM/191/SIP/SIP1-b6d083a8”) in new stack
[Jul 7 10:39:31] VERBOSE[27346] logger.c: – DBdel: family=BLKVM, key=191/SIP/SIP1-b6d083a8
[Jul 7 10:39:31] DEBUG[27346] app_macro.c: Executed application: dbDel
[Jul 7 10:39:31] DEBUG[27346] app_dial.c: Macro exited with status 0
[Jul 7 10:39:31] DEBUG[27343] app_queue.c: Dunno what to do with control type -1
[Jul 7 10:39:31] VERBOSE[27343] logger.c: – Local/200@from-internal-1eb8,1 answered SIP/SIP1-b6d083a8
[Jul 7 10:39:31] VERBOSE[27343] logger.c: – Stopped music on hold on SIP/SIP1-b6d083a8
[Jul 7 10:39:31] WARNING[27343] app_queue.c: The device state of this queue member, Local/200@from-internal/n, is still ‘Not in Use’ when it probably should not be! Please check UPGRADE.txt for correct configuration settings.

Could this be related? I looked in the UPGRADE.txt file and I don’t see anything.

In the UPGRADE.txt file I did see this:

The exit behavior of the AGI applications has changed. Previously, when
a connection to an AGI server failed, the application would cause the channel
to immediately stop dialplan execution and hangup. Now, the only time that
the AGI applications will cause the channel to stop dialplan execution is
when the channel itself requests hangup. The AGI applications now set an
AGISTATUS variable which will allow you to find out whether running the AGI
was successful or not.

Previously, there was no way to handle the case where Asterisk was unable to
locally execute an AGI script for some reason. In this case, dialplan
execution will continue as it did before, but the AGISTATUS variable will be
set to “FAILURE”.

A locally executed AGI script can now exit with a non-zero exit code and this
failure will be detected by Asterisk. If an AGI script exits with a non-zero
exit code, the AGISTATUS variable will be set to “FAILURE” as opposed to
"SUCCESS".

Could the new handling for AGI scripts be causing the dialparties to hang? I did see many log entries where the scripts exited with non-zero.

Thanks

warlock67 · July 8, 2009, 2:54am

I am no guru but this is what is happening to me here. I do a reboot and find CPU load on a dual core 3.04GHz system up to 30% in the morning and at some point in time during the morning the queues recieve calls but no longer ring the agents. After a reload it starts working again but CPU load goes up to 187% (dualcore) with various instances of dialparties.agi active and consuming everything there is. At night I can once again reset the box and everything returns back to normal untill the next day. I have the following setup:
Ast 1.4.25.1
FPBX: 2.5.1.5 100 Ext, 5 Queues, 2 SIP trunks
Server HP DL380, Dual 3.04GHz, 2GB mem, SCSI Raid 5
Traffic: 1000 calls a day incoming, 15000 calls a day outgoing

Any suggestions are very very welcome.

escape2mtns · July 8, 2009, 12:41pm

It’s probably a coincidence but we’re all 3 running on HP DL360/DL380’s.

We’ve dropped all our queues down to 5 or fewer static agents to see how it goes today.

escape2mtns · July 8, 2009, 3:45pm

I just spoke with one of the Digium Select distributors that installs many large (100+ seat) call centers. He said that if you’re doing a lot of calls or static queue members then you want to use the SIP device as a queue member instead of the Local channel. He explained that would avoid use of dialparties.agi which really doesn’t come into play much in these environments.

Fortel · July 8, 2009, 3:56pm

This is good information. I’d be interested to see how you implement the workaround, and what the result is. With the inbound call center we work with, we’ve tried more powerful hardware, different brand hardware, etc. Clearly, that’s not the right path.

Thanks,

Peter

escape2mtns · July 8, 2009, 4:11pm

He told me to edit the queues_additional.conf and change the member= lines from:

member=Local/102@from-internal/n,0

to

member=SIP/102,0

(using extension 102 as an example)

We tried this on a test queue and confirmed that it works but unfortunately it breaks assumptions in a Queue Monitoring app we developed. We’ve got to fix this monitoring app for the new style of members before we can fully test.

I guess we’ll need to put a sed search/replace script in /etc/amportal.conf as a POST_RELOAD script to set these members after FreePBX queue changes.

He said that they absolutely have these issues when using the default local channel members and running them through dialparties.agi and that they absolutely don’t have an issue when they set members as the direct sip device.

warlock67 · July 8, 2009, 10:24pm

Sounds extremely logical to me. The problem I have is that we have changes to the Queues on a regular basis from FPBX. Where could we “fix” this in the scripts that make the modification in the queues_additional.conf file?

Is FPBX develpoment considering this change for call center solutions like ours?

plindheimer · July 8, 2009, 10:43pm

There are a LOT of implications if you just go and change those to SIP devices that the Digium Select distributor likely has no idea about.

He is correct wrt to a heavy call center environment is not at all what FreePBX is tuned for. However, the numbers you are mentioning here are not heavy.

There are also new abilities in queues in 1.6 that have found there way into 1.4.26 (I think) that can pass on device state information to the queue. (The issue being, there are short comings in what they implemented. I have a patch in the asterisk bug tracker to do it ‘right’ using hints, that works against 1.4. I have tested it but not stress tested it. I also have patches against the queues module in 2.6 to take advantage of these changes. Problem is, none of it is checked into svn because unless that patch finds its way into 1.4, or at least a version of it into 1.6, there is no point in getting the changes into FreePBX. There are a couple bugs in our tracker with information about all these patches also.)