SNG7-FPBX-64bit-1710-1 Load and slowdown issues

Hi all,
Downloaded SNG7-FPBX-64bit-1710-1 (Release Date: October 2017 FreePBX 14.0.1.20 • Linux 7.4 • Asterisk 13, 14 or 15) and had it running around a week but noticing that it gets slower and slower by the day.
A few days ago I noticed that logging into the web interface was slow, taking around 60 seconds, checking the system with top and I could see http and php processes hitting 100% CPU utilisation for a while, so previous system load was around 0.10 and after a few minutes trying to login load went up and got as high as 10.0 which then started to affect calls, jittering, etc.
I rebooted the box and immediately it was fine once again, load stayed low, web interface responsive, etc.

Move forward to now (after the weekend) and again very slow, logging into the web interface took a long time, too long as I then got a DHTML warning banner saying it couldn’t connect to the backend, checking the server and load was climbing rapidly and calls were timing out (I tried dialling voicemail and my handset just hung trying to get a reply from the server).
Again I rebooted and all was fine once again, low load, web interface fine and calls fine. Right now after around 1hr the system still looks good, load at 0.02, 0.04, 0.05 and calls processing happily.

This system was replacing a much older trixbox/freepbx system, so load, etc, hasn’t gone up and resources almost identical.
I suspected hardware issues also, this runs under vmware esxi (5.5.0, 2068190) on a Dell PowerEdge R310 4x2.393Ghz Xenon CPUs, 16Gb RAM.
The VM running SNG7 has all 4 CPUs and 10Gb RAM allocated to it, there is only one other machine on the vm box running less resources and that does NOT have the same issues, so I suspect ruling out hardware (Disk arrays fine, all disks fine, etc)
I’ve allocated 2x200Gb disks to the VM and the SNG7 install automatically has put those into a raid array:
md0 : active raid1 sda1[0] sdb1[1]
2046976 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[0] sdb2[1]
207535104 blocks super 1.2 [2/2] [UU]
bitmap: 1/2 pages [4KB], 65536KB chunk
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/SangomaVG-root 193G 5.5G 188G 3% /
devtmpfs 4.8G 0 4.8G 0% /dev
tmpfs 4.9G 16K 4.9G 1% /dev/shm
tmpfs 4.9G 8.6M 4.8G 1% /run
tmpfs 4.9G 0 4.9G 0% /sys/fs/cgroup
/dev/md0 1.9G 62M 1.8G 4% /boot
tmpfs 984M 0 984M 0% /run/user/0

BUT when there are issues with high CPU, etc, the processes aren’t in “D” state so it doesn’t appear to be the disk array/raid slowing it down.
I’m a bit stumped to be honest what is causing the slowdown as it doesn’t appear to be physical, nor resources on the machine but it just keeps slowing down.

I’ve checked the modules update and it is saying a module update:

sangoma-pbx.noarch 1710-2.sng7 1710.1.sng7
sangoma-release.x86_64 7-4.1710.02.el7.sangoma 7.4.1710.01.el7.sangoma

But I can’t find any release info on 1710-2.sng7 to find what difference there is or bug fixes, etc, so I’m reluctant to do the upgrade without more info (BTW this box is now live unfortunately so I need to figure the fix out quick!)

Can anyone suggest other diagnostic steps, how to resolve, etc please?

Thank you!

I’ve been doing more debugging, I’m finding I have to reboot the server each morning to make it usable to avoid problems straight away (when phones aren’t even used).

I’ve done a comparison of processes and memory usage, one from yesterday afternoon, one from this morning.
14:45

ps -e -o pid,vsz,comm= | sort -n -k 2

  PID    VSZ 
12717 922884 PM2 v2.6.1: God
12757 1089452 node /var/www/h
1400 1628656 mysqld
12382 3029848 asterisk

ps aux  | awk '{print $6/1024 " MB\t\t" $11}'  | sort -n

36.7031 MB		/usr/sbin/httpd
39.7031 MB		/usr/sbin/httpd
44.6211 MB		/usr/sbin/httpd
46.3438 MB		node
50.5664 MB		/usr/sbin/httpd
50.668 MB		/usr/sbin/httpd
53.3516 MB		/usr/bin/mongod
115.828 MB		/usr/local/fop2/fop2_server
130.246 MB		/usr/sbin/asterisk
246.16 MB		/usr/libexec/mysqld

Then again this morning:
08:04

13075 743936 httpd
12717 923056 PM2 v2.6.1: God
26984 1083820 node /var/www/h
 1400 2023624 mysqld
12382 3029848 asterisk

29.6172 MB		PM2
36.7383 MB		/usr/sbin/httpd
38.9531 MB		/usr/bin/mongod
39.7383 MB		/usr/sbin/httpd
40.7734 MB		node
44.6211 MB		/usr/sbin/httpd
50.5664 MB		/usr/sbin/httpd
50.6914 MB		/usr/sbin/httpd
117.008 MB		/usr/local/fop2/fop2_server
131.391 MB		/usr/sbin/asterisk
259.293 MB		/usr/libexec/mysqld


 1523 657992 httpd
16583 725252 httpd
13075 743936 httpd
12717 923056 PM2 v2.6.1: God
26984 1084332 node /var/www/h
 1400 2023624 mysqld
12382 3029848 asterisk


40.6094 MB		node
44.6211 MB		/usr/sbin/httpd
50.6914 MB		/usr/sbin/httpd
59.3828 MB		/usr/sbin/httpd
117.023 MB		/usr/local/fop2/fop2_server
131.434 MB		/usr/sbin/asterisk
259.293 MB		/usr/libexec/mysqld

Nothing of huge consequence, I then accessed the web interface, that took a few minutes before it populated the extensions list, and whilst it did that CPU of HTTP and PHP went to 100%, asterisk then closely followed at around 80% and I made a test call to voicemail, the call was choppy and in parts you couldn’t hear anything.

I then rebooted the machine.
My suspicion? The node /var/www/html/admin/modules/ucp/node/index.js or PM2 code.
I’ve therefore uninstalled the UCP now, so node isn’t running, I’ve also noticed since the reboot the “PM2 v2.6.1: God Daemon (/home/asterisk/.pm2)” isn’t running either, can anyone identify that one?

So will wait and test again, see if that resolves the load problems.

Nobody else is reporting any issues like this. I would start with disabling all services not included in FreePBX like FOP and see