High load average and call quality issue

We are running a server with Asterisk 18.17.0 and FreePBX 16.The system would show high load average (25 - 50) randomly and call quality will decrease as a result (IVR, connected calls etc). However this would resolve temporary on fwconsole restart but starts recurring after 3-4 days.

There are 40 Queues,120 extensions total. Around ~90 allocated to each queue.
Maximum SIP channel count : 70
Queue log and CDR is captured to the database. There are no slow queries in the from the database.
Arround 446 taskprocessors, no queued processes.

HW Spec.
32 vcores, 30GB RAM & SSD Storage

Any suggestions of what to look at to resolve this issue.
Thanks in advance for the help

Edit : There are about 450 task processors, is that normal ?

  • Htop to look at processes
  • Are you having to transcode?
  • How is the memory?
  • Any lagging on the DB read/writes?
  • Are you running any custom dialplan?
  • Are you using any additional AMI applications that could be stalling dialplan?

Had a similar issue a little while ago, had to do an asterisk backtrace to figure out what was causing it (custom AMI application).

1 Like

I had a problem with chan_sip a long time ago where call quality would fall apart after some time running.

I would do a module reload chan_sip.so to address. But that was prior to migration to chan_pjsip.
If your still on chan sip give it try.

1 Like
  • Htop to look at processes

  • Are you having to transcode?

Both endpoints and trunks are using g711.

  • How is the memory?

Total of 30GB, usage is around 50%.

  • Any lagging on the DB read/writes?

We optimized the database for slow queries, cleared any large log tables. nothing of concern.

  • Are you running any custom dialplan?

We used a custom agi to randomize the outbound CID, removing it didn’t have any effect.

  • Are you using any additional AMI applications that could be stalling dialplan?

We are using a php script to capture AMI events and insert to the database. We are using the same script across multiple servers and non of them have this issue.

We are running PJSIP

On the task processor is max depth anywhere near close to the low water mark? I know you aren’t seeing queued.

You might need to do an Asterisk Backtrace to see where the dial plan is stalling.

Getting a Backtrace - Asterisk Project - Asterisk Project Wiki

Whats the timeout on the AMI account you are using for your script?

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.