Asterisk dies after odd error messages related to voice queue length

I have a production system that was installed a few weeks ago. Its the latest versions with all patches and upgrades. The customer has purchased and is using the call queue modules. There are about 44 physical Polycom phones and 144 extensions. Its a Dell physical server with 16gb of RAM and is usually idling. they have 40 channels on a SIP trunk.

We noticed a few days ago the web interface was VERY slow. 15-20 seconds per page/change/tab. The system was still processing calls fine. Then, all at once, the system stopped taking any calls, and al the phones in the office began to show as not registered. The web interface and CLI were responsive and seemed fine, TOP showed no abnormal memory usage or issue at the time. A restart of the box brought everything back to normal, and the web interface is fast again as well. Going back through the logs, the problem starts here, and then it is several pages of the same voice queue length error, and each phone becoming unreachable and being “deleted”. Web searching shows some of this sort of behavior, but from several years ago and no real definitive cause. What are the things I should look at or log to try and capture what is happening. None of my other customers are exhibiting this issue, and many have much larger systems than this - but this is the only customer using call queues to this extent.image

Note - since the reboot it has been almost 24 hours and no issues… so far.

From the bash prompt top and htop utilities will show you what’s eating cpu and memory resources. With luck you can narrow it down to something other than just Asterisk.

