Disk or Database Bottleneck

I have a Virtual Server (FreePBX distro 16/Asterisk 18.9), 15 cores 16 GB ram 90GB Free. Looks very good resource-wise.

I am seeing the performance bottom (delays in call processing, can answer phone calls, apply config takes 10 minutes, etc.) out when I get to 30 active channels (call recording on), but I am not close to maxing out the system.

Is there a way to see if the MySQL or SQLite databases, or the disk might be the problem?

Clearly it is choaking on something, but I don’t know what it could be.

Install sysstat for a long term overview of system resources.

1 Like

You can also look more at mysql,
show processlist
Will tell you the queries running which can help identify if you have something long running. Most queries are short lived.
Is he mqsql local or remote?
At the command line

also iotop will help to identify disk usage.
vmstat 1 will help show you wait time on the cpu along with cpu time. wait time is frequently wait on io time.

Mysql can have a lot of different issues,
insall mytop and run to get some additional insight.

Some things that go wrong can be sometimes resolved with rebuilding indexes, like doing an optimize table on tables.
Some things might be able to be resloved with allocating more memory to mysql. but the first steps are to identify what is he consumption of resources.

1 Like

I installed sysstat and will try mytop now as well. Post reboot things are working well, but it seems to happen every Thursday. At least I’ll be ready this time. I did upgrade from Asterisk 16.9 to 18 LTS certified. I’ll report back when the next incident hits.

Does any of that look problematic (doesn’t look like it to me)? I’ll be sure to run it if the system starts grinding again.

MariaDB [(none)]> show processlist;
+---------+-------------+-----------------+---------------+---------+------+-------+------------------                                                +----------+
| Id      | User        | Host            | db            | Command | Time | State | Info                                                             | Progress |
+---------+-------------+-----------------+---------------+---------+------+-------+------------------                                                +----------+
|      16 | qstatsUser  | localhost       | qstats        | Sleep   |  140 |       | NULL                                                             |    0.000 |
|     115 | freepbxuser | localhost       | asterisk      | Sleep   |    0 |       | NULL                                                             |    0.000 |
|     238 | freepbxuser | localhost:50016 | asterisk      | Sleep   |  609 |       | NULL                                                             |    0.000 |
|     239 | freepbxuser | localhost       | asterisk      | Sleep   |    0 |       | NULL                                                             |    0.000 |
|    1248 | freepbxuser | localhost       | asterisk      | Sleep   |   10 |       | NULL                                                             |    0.000 |
|  957695 | freepbxuser | localhost       | asteriskcdrdb | Sleep   |  170 |       | NULL                                                             |    0.000 |
| 1091714 | freepbxuser | localhost       | asterisk      | Sleep   |  103 |       | NULL                                                             |    0.000 |
| 1091906 | lsuser      | localhost       | callback      | Sleep   |    2 |       | NULL                                                             |    0.000 |
| 1092005 | root        | localhost       | NULL          | Query   |    0 | NULL  | show processlist                                                 |    0.000 |
| 1092014 | qstatsUser  | localhost       | qstats        | Sleep   |    0 |       | NULL                                                             |    0.000 |
| 1092015 | qstatsUser  | localhost       | qstats        | Sleep   |    0 |       | NULL                                                             |    0.000 |
+---------+-------------+-----------------+---------------+---------+------+-------+------------------                                                +----------+
11 rows in set (0.00 sec)

Local

I don’t see a problem in that screen shot. Nohing is running. you may look at enablly the slow query log as well in mysql but right now its not know if the issue is disk subsystem or the database engine.

From running mysql and asterisk realtime, I can tell you that the fastest way to make asterisk fall over is to couple it to the database and have 400 phones continually registering there by generating a lot of small queries. the query time isn’t what kills you its the backup in processing queries. Phones will drop off and on and is because mysql can’t keep up with the writes.

Good to know, but as fate would have it this is just a bunch (70 queues, 1-2 agents per, about 40 channels max at one time, Asternic Call Center Stats installed) of custom extensions, that are all tied to mobile phones (extension 555.555.5555, FMFM is 1.555.555.5555#). These extensions take queue calls 99% of the time, and use DISA 1% of the time. Virtual 14 cores, 16gb ram, 1Gbps connection, 80GB free disk, the box doesn’t look stressed at all, but something is happening that causes it to gridlock. Thanks to you guys, hopefully I’ll be able to see it this time.

I doubt the call load is the issue. Its likely something more. Any cron jobs running Thursdays? Is the vm inside of environment where something else happens? ie it runs great until the accounting dept. runs billing job on thursdays…

Started seeing long pauses when I call a number and when the IVR greeting starts playing. Looks like it is happening earlier this week. I notice this in the logs:
WARNING[26840][C-00000fd3]: channel.c:1080 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/1XXXXXXXXXX@from-internal-00006749;1

The resources look okay:


Here is process list:

MariaDB [(none)]> show processlist;
+---------+-------------+-----------------+---------------+---------+------+-------+------------------+----------+
| Id      | User        | Host            | db            | Command | Time | State | Info             | Progress |
+---------+-------------+-----------------+---------------+---------+------+-------+------------------+----------+
|      16 | qstatsUser  | localhost       | qstats        | Sleep   |   21 |       | NULL             |    0.000 |
|     115 | freepbxuser | localhost       | asterisk      | Sleep   |    2 |       | NULL             |    0.000 |
|     238 | freepbxuser | localhost:50016 | asterisk      | Sleep   | 8328 |       | NULL             |    0.000 |
|     239 | freepbxuser | localhost       | asterisk      | Sleep   |    4 |       | NULL             |    0.000 |
|    1248 | freepbxuser | localhost       | asterisk      | Sleep   |   27 |       | NULL             |    0.000 |
| 1170495 | freepbxuser | localhost:55690 | asterisk      | Sleep   |    8 |       | NULL             |    0.000 |
| 1172432 | freepbxuser | localhost:35508 | asterisk      | Sleep   |   12 |       | NULL             |    0.000 |
| 1174552 | freepbxuser | localhost       | asteriskcdrdb | Sleep   |   47 |       | NULL             |    0.000 |
| 1175873 | freepbxuser | localhost:49792 | asterisk      | Sleep   |    5 |       | NULL             |    0.000 |
| 1176432 | freepbxuser | localhost:52110 | asterisk      | Sleep   |    4 |       | NULL             |    0.000 |
| 1181775 | freepbxuser | localhost       | asterisk      | Sleep   |  109 |       | NULL             |    0.000 |
| 1182169 | lsuser      | localhost       | callback      | Sleep   |    2 |       | NULL             |    0.000 |
| 1182182 | qstatsUser  | localhost       | qstats        | Sleep   |    6 |       | NULL             |    0.000 |
| 1182185 | qstatsUser  | localhost       | qstats        | Sleep   |    6 |       | NULL             |    0.000 |
| 1182186 | qstatsUser  | localhost       | qstats        | Sleep   |    6 |       | NULL             |    0.000 |
| 1182187 | qstatsUser  | localhost       | qstats        | Sleep   |    6 |       | NULL             |    0.000 |
| 1182188 | qstatsUser  | localhost       | qstats        | Sleep   |    6 |       | NULL             |    0.000 |
| 1182189 | qstatsUser  | localhost       | qstats        | Sleep   |    6 |       | NULL             |    0.000 |
| 1182190 | qstatsUser  | localhost       | qstats        | Sleep   |    6 |       | NULL             |    0.000 |
| 1182191 | qstatsUser  | localhost       | qstats        | Sleep   |    6 |       | NULL             |    0.000 |
| 1182192 | qstatsUser  | localhost       | qstats        | Sleep   |    6 |       | NULL             |    0.000 |
| 1182193 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182194 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182195 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182196 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182197 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182198 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182199 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182200 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182201 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182202 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182203 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182204 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182205 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182206 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182207 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182208 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182209 | qstatsUser  | localhost       | qstats        | Sleep   |    5 |       | NULL             |    0.000 |
| 1182210 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182211 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182212 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182213 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182214 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182215 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182216 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182217 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182218 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182219 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182220 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182221 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182222 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182223 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182224 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182225 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182226 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182227 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182228 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182229 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182230 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182231 | qstatsUser  | localhost       | qstats        | Sleep   |    4 |       | NULL             |    0.000 |
| 1182232 | root        | localhost       | NULL          | Query   |    0 | NULL  | show processlist |    0.000 |
| 1182233 | qstatsUser  | localhost       | qstats        | Sleep   |    3 |       | NULL             |    0.000 |
| 1182234 | qstatsUser  | localhost       | qstats        | Sleep   |    3 |       | NULL             |    0.000 |
| 1182235 | freepbxuser | localhost       | asterisk      | Sleep   |    2 |       | NULL             |    0.000 |
| 1182237 | qstatsUser  | localhost       | qstats        | Sleep   |    2 |       | NULL             |    0.000 |
| 1182238 | qstatsUser  | localhost       | qstats        | Sleep   |    1 |       | NULL             |    0.000 |
| 1182239 | qstatsUser  | localhost       | qstats        | Sleep   |    1 |       | NULL             |    0.000 |
| 1182240 | qstatsUser  | localhost       | qstats        | Sleep   |    1 |       | NULL             |    0.000 |
| 1182241 | qstatsUser  | localhost       | qstats        | Sleep   |    1 |       | NULL             |    0.000 |
| 1182243 | qstatsUser  | localhost       | qstats        | Sleep   |    0 |       | NULL             |    0.000 |
+---------+-------------+-----------------+---------------+---------+------+-------+------------------+----------+
70 rows in set (0.00 sec)

iotop:
image

Any ideas on what else I can look at?

This general means a channel blocking issue (system overload) or a deadlock. You will need to get a backtrace for this.

Get Backtrace

I only see about 30 channels up (that is as close to as high as it gets):

*CLI> core show channels
Channel              Location             State   Application(Data)
Local/1#########1264@fr s@macro-dialout-trun Up      Dial(SIP/AVAYA-1/#########1264,30
Local/1#########1264@fr s@macro-dial:1       Up      AppDial((Outgoing Line))
Local/1#########1349@fr s@macro-dialout-trun Up      Dial(SIP/AVAYA-1/#########1349,30
Local/1#########1349@fr s@macro-dial:1       Up      AppDial((Outgoing Line))
SIP/AVAYA-1-000028d3 s@macro-dialout-trun Up      AppDial((Outgoing Line))
SIP/AVAYA-1-000028d5 (None)               Up      AppDial((Outgoing Line))
SIP/AVAYA-1-000028d4 (None)               Up      AppDial((Outgoing Line))
SIP/AVAYA-1-000028a6 2010@ext-queues:67   Up      Queue(2010,t,,custom/health-fa
SIP/AVAYA-1-000028c7 2012@ext-queues:67   Up      Queue(2012,t,,custom/health-fa
SIP/AVAYA-1-000028c2 2014@ext-queues:67   Up      Queue(2014,t,,,,,,,0,)
SIP/AVAYA-1-000028c3 5033@ext-queues:67   Up      Queue(5033,t,,,,,,,0,)
SIP/AVAYA-1-000028ba 2018@ext-queues:67   Up      Queue(2018,t,,custom/#######
SIP/AVAYA-1-000028cf 2027@ext-queues:67   Up      Queue(2027,t,,custom/MLT_CT,,,
SIP/AVAYA-1-000028cb s@macro-dialout-trun Up      AppDial((Outgoing Line))
SIP/AVAYA-1-000028b5 s@macro-dialout-trun Up      Dial(SIP/AVAYA-1/7001018831,30
SIP/AVAYA-1-000028b3 2014@ext-queues:67   Up      Queue(2014,t,,custom/health-fa
SIP/AVAYA-1-000028b9 (None)               Up      AppDial((Outgoing Line))
Local/#########3245@fro 2018@from-queue:1    Up      AppQueue((Outgoing Line))
Local/#########3245@fro s@macro-dial:29      Up      Dial(Local/FMPR-#########3245@fro
Local/#########3277@fro 2012@from-queue:1    Up      AppQueue((Outgoing Line))
Local/#########3277@fro s@macro-dial:29      Up      Dial(Local/FMPR-#########3277@fro
Local/1#########1484@fr s@macro-dial:1       Up      AppDial((Outgoing Line))
Local/1#########1484@fr s@macro-dialout-trun Up      Dial(SIP/AVAYA-1/#########1484,30
Local/1#########1832@fr s@macro-dial:1       Up      AppDial((Outgoing Line))
Local/1#########1832@fr s@macro-dialout-trun Up      Dial(SIP/AVAYA-1/#########1832,30
Local/#########3220@fro s@macro-dial:29      Up      Dial(Local/FMPR-#########3220@fro
Local/#########3220@fro 2014@from-queue:1    Up      AppQueue((Outgoing Line))
Local/#########1349@fro s@macro-dial:29      Up      Dial(Local/FMPR-#########1349@fro
Local/#########1349@fro 5033@from-queue:1    Up      AppQueue((Outgoing Line))
Local/#########3228@fro #########3228@followme- Ring    GotoIf(1?DIALGRP)
Local/#########3228@fro #########3228@from-queu Down    AppQueue((Outgoing Line))
31 active channels
17 active calls

The system is still processing some calls, so it would be ideal to not to suspend/restart asterisk. IS there anything else I can look at prior to the backtrace that might help me identify the root cause?

When I do a test call, I have about 10 seconds of silence. On the call log I can see it going step by step, really slowly. Until the 10th second, then it blasts though (and the IVR audio begins). Nothing exceptional is happening at this time, it’s just setting variables. Almost bogged down.

Logs with timestamps from ‘full’ ?

Is there a way to tell if it is ulimit?

 asterisk]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63426
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63426
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Ill pastebin

Here is me making a call to a queue. It rings once, there is deadair for over 10 seconds, then the queue message begins playing.

The names have been changed to protect the innocent.

@dicko I notice it doesn’t create the file until the next day sar06 so I can provide that too, but the output is 50ish pages long, is there a subset to put in the pastebin, or the whole thing?

This would mean the audio frames are queueing up and the thread isn’t reading them fast enough. Still requires a backtrace.

The only other thing you can do is update Asterisk. You are on 18.9 and the current release is 18.18. Even if the distro is at 18.17 or something more current you can try that and see if it resolves the issue.

A few days ago, I tried asterisk-version-switch and selected Press 8 for Asterisk 18 Certified (LTS) I also ran yum update. How do I get to 18.17?