Freepbx 16 with Sangoma Connect is saturated

cursor · September 27, 2024, 5:33pm

For the past few days we have been having a problem where Sangoma Connect takes a long time to respond and the clients get disconnected frequently. What I notice is that all http related traffic is slowed down considerably, even the web interface is very slow on the server.

We have around 300 users and most of them use the Sangoma connect desktops client. I can see there are hundreds of httpd processes running on the server (over 250). I guess that since Sangoma Connect and Zulu also use the apache server to communicate with Asterisk they may be causing the slow down. This is a very robust server with 16 cores and 32gb of memory, using an SSD as the main drive, cpu utilization rarely goes over 2 and memory is around 18GB.

How can I know if Sangoma Connect or Zulu (we are migrating from Zulu to SC at the moment) are saturating Apache? When SC refuses to connect I have to stop Apache and restart it. After that I restart all other services lice Sangoma connect, zulu, ucp, etc. That clears the problem for a while but it comes back randomly.

This server has been in operation for months without any trouble with the around the same amount of users. Why are we seeing this now? Is there a way to optimize Apache for the high load? Any pointers on where to look for the bottleneck? Thanks

jfinstrom · September 27, 2024, 6:32pm

I would try an fwconsole restart or possibly pkill node if something has gone rogue it may just be an edge case but I would just start the daemons over clean

SamShomi · September 28, 2024, 3:03pm

If you run top do you see a bunch HTTP processes running? We noticed this running sangomaconnect on a busy server. fwconsole restart resolved it until it happened again a few days later. Looks like maybe old HTTP processes were getting stuck and not shutting down for some reason, so it keeps forking more until it maxes out the server limit. It may be related to SMS messages, which seem to generate a lot of HTTP activity, but not sure.

cursor · September 28, 2024, 4:02pm

There are 260 apache processes running at peak times. I will try to do a full shutdown today when there are no calls (24 hour operation). I do think this may be that we are hitting a resource limit somewhere but we have been working with the same amount of users for some time now and the problem just started showing up this week.

kgupta · September 30, 2024, 3:25am

Hi @cursor Can you please check if you have Haproxy running in your system?

We have added haproxy to handle the desktop client connections, if not running then I would suggest raise support ticket so our team can help you out to configure/run the haproxy.

ps fax|grep haproxy

cursor · September 30, 2024, 4:21pm

It is not running at the moment

How do I start it?

cursor · September 30, 2024, 4:31pm

I restarted HAPROXY using “systemctl start haproxy” and it is now running but Sangoma Phone stills takes a long time to connect (and the web page to reload). I have a ticket open with Sangoma since last friday but no answers yet.

I can see that there are 258 http processes running when we have the slow down. The number only goes down after office hours when there are less users. Are we hitting a limit on Apache? How can we increase the number of users it can handle at once?

Only http related services are slow, you can ssh into the server and everything is responsive. Since Sangoma Phone, Zulu and the web page depend on Apache this is why I am focused there.

cursor · October 4, 2024, 3:42pm

I think I finally found a solution to this problem. It seems that we were hitting the MaxRequestWorkers limit in Apache (256 by default). This is why during heavy traffic times anything that depends on Apache was very slow or not connecting at all. I have increased the limit to 450 for the moment and I can see that we are using around 350 processes during peak times.

This is the modification I made to Apache:

> <IfModule mpm_prefork_module>
>     StartServers             5
>     MinSpareServers          5
>     MaxSpareServers         10
>     ServerLimit            450
>     MaxRequestWorkers      450
>     MaxConnectionsPerChild   0
> </IfModule>

I do not know why this only started happening a few days ago but the only thing I can think of is that we are moving users from Zulu to Sangoma Phone. Both use Apache on the back end but I guess maybe Sangoma Connect makes extra connections? We probably hit the number of sangoma phone users that went over the limit.

Sangoma support has been useless during this. They simply logged into the server via ssh and ran htop, then they told me that they did not see any spikes in cpu. They simply ignored everything I wrote about cpu and memory behavior, haproxy not running and my focus on Apache.

penguinpbx · October 4, 2024, 4:43pm

Thank you for sharing the solution.

What does your apache memory utilization look like now with MaxRequestWorkers=450 ?

Average memory per apache process is … ?

cursor · October 4, 2024, 8:08pm

HttpdRealAvg : 17.81 MB [excludes shared]
HttpdSharedAvg : 5.37 MB
HttpdRealTot : 7007.41 MB [excludes shared]
HttpdRunning : 332

Server Memory

Cached : 6074.35 MB
MemFree : 358.94 MB
MemTotal : 31815.11 MB
SwapFree : 15968.75 MB
SwapTotal : 16000.00 MB

Calculations Summary

OtherProcsMem : 18369.04 MB (MemTotal - Cached - MemFree - HttpdRealTot - HttpdSharedAvg)
FreeMemNoHttpd : 13446.07 MB (MemFree + Cached + HttpdRealTot + HttpdSharedAvg)
MaxLimitHttpdMem : 4564.73 MB (HttpdRealAvg * MaxRequestWorkers + HttpdSharedAvg)
AllProcsTotalMem : 22933.77 MB (OtherProcsMem + MaxLimitHttpdMem)

Maximum Values for MemTotal (31815.11 MB)

system · November 3, 2024, 8:09pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.