How fast does it return to read the next event?
I donāt know what app or part of Asterisk isnāt keeping up, I can only tell based on the name that it is for channel events. It could certainly be your usage of ARI in combination with the number of channels and what their activities are.
Thatās all Iāve got.
To further elaborate on why I canāt go further - I would have to completely replicate your environment and usage patterns, and then spend days to understand precisely how youāre placing load on the system and where the bottlenecks are.
I should also add, that since one of the messages is for ARI - then presumably it would be your ARI subscription that is triggering it and thus why noone else sees it. Could be the result of your over all system usage and what is going on/usage patterns/what is producing the channel events.
To respond to myself, this may even be in combination with FreePBX - its dialplan and other things may produce a ton of events.
@david55 To add clarity; this stasis app registers with the server, subscribes to all channels, filters on only the ChannelHold and ChannelUnhold events. Calls are never placed into the stasis app, weāre simply monitoring for Hold and Unhold events so we can log them since they are not naturally captured in the CEL logs. So this stasis app never sends any commands back to the PBX in response to what itās receiving. However, all that said the latency between the the PBX and ARI app is about <1ms and so I have to imagine that entire time it could possibly take for a TCP packet to go round trip with processing is <10ms. Does that answer your question?
New information found! So today there is a clear āstand outā in the backed up taskprocessors:
Processor Processed In Queue Max Depth Low water High water
stasis/m:manager:core-00000007 595760 391831 391847 2700 3000
almost 400K queued tasks. No other taskprocessors have anything queued. Iām not sure if that actually refers to the AMI but operating on the theory that it did I took a look at the manager event queue with manager show eventq
which revealed that there are of 30K messages in that queue. The oldest one was from this morning 1 minute after this server was rebooted. I donāt use the AMI TCP interface but I think FreePBX does. @jcolp does this new information trigger any thoughts for you that might help me? Would seeing the messages that are in the event queue be helpful? Does anyone know what the possible reasons are that messages could be āstuckā in this queue?
Not really, Iāve given as much as I can to this post.
Not directly relatedā¦ Do you have FastAGI enabled on your FreePBX? FastAGI makes a significant difference in performance.
Something is subscribed to them but is failing to read them.
@PitzKey In the Advanced Settings I see Launch local AGIs through FastAGI Server is to NO
. I know nothing about FastAGI. When would you want to enable it and when wouldnāt you want to? Are there additional steps to āenableā it or is it as simple as setting that to YES
?
I definitely learning things hereā¦ but still donāt have my head around this yet. I do not use the traditional asterisk manager interface at all. So, if there is something connected to that it must be FreePBX. HOWEVER, I do use the Asterisk HTTP XML Interface. Which got me thinking could that be an issue? So I ran manager show connected
and got this back:
Username IP Address Start Elapsed FileDes HttpCnt Read Write
admin 127.0.0.1 1645522103 37848 46 0 1073750015 1073750015
firewall 127.0.0.1 1645528089 31862 83 0 2147483647 00064
my_custom_user xxx.xx.x.xxx 1645528749 31202 -1 0 08191 08191
[...thousands of similar lines omitted...]
my_custom_user xxx.xx.x.xxx 1645554560 5391 -1 0 08191 08191
admin 127.0.0.1 1645559951 0 82 0 1073750015 1073750015
67425 users connected.
The I do recognize āmy_custom_userā (not real) as something I created and that we use to connect to the aforementioned HTTP Manager.
My understanding of how that manager works is that you send it a simple command and it gives you the responseā¦ there is no mechanism to get a stream of events that Iām aware of. So are these messages stacking up because you make a single request to that interface and the server is keeping that connection around for forever?!? Or, am I missing something that I need to be doing to ācloseā that connection?
This is a good read: Performance Improvements in FreePBX
I enable it on all systems, IIRC itās enabled by default in FreePBX 15+
Youāll need to reboot your PBX afterwards.
I have a feeling that after enabling FastAGI the task processor will be able to handle all of your requests without any issues.
I would suggest eliminating usage of the HTTP interface and seeing if it changes. That interface is ancient, used by few, barely maintained, and likely not understood by many people. If eliminating its usage results in manager working as expected then it would point to that.
I see your using the default port. I know it worked fine for a long time but I would suggest using a different non standard port. I didnt see in the question if the phones were co-located with the pbx? Is this going through a firewall. Not sure if it applies but I have had horrendous problems with the standard port 5060 (pjsip) and some routers/ISPās, works fine for weeks and then just stops (especially FiOS quantum and Spectrum routers). Rebooting the router didnāt help, felt like the mac was stuck in an arp cache or SIG ALS, donāt know but changing to a new non standard port cleared it up, havenāt had an issue in months. Additionally, if the PBX is on a public network its possible hackers are constantly trying to connect or process calls (even though they fail). If its public facing turn on intrusion detection and set the limits fairly low so bad actors get blocked for a significant amount of time. That could reduce the workload on the server if its processing lots of calls. Also, whatās the CPU, how much RAM?
The port shouldnāt matter with the firewall configured the way it is, we only allow traffic from branch offices.
The server has 8 cores and 16 GB memory. Which are both way under utilized as far as I can see.
@PitzKey This system is a fresh FreePBX 15 distro install and itās not enabled so I donāt think itās enabled by default. Weāll try it in our lab and if it seems to work well weāll give it a try but I still think that this is related the to the HTTP manager.
did you restart asterisk without your ami client running yet ?
Does your client conclude with a proper Action: Logoff\n\n
?
@dicko Again, we donāt have an AMI client we only use the HTTP XML Manager Interface so there is no opportunity to send subsequent messages such as āLogoffā. On Monday we didnāt have our services running that make those HTTP calls and we were still seeing taskprocessor warnings. However, I canāt say for certain if the same exact taskprocessor was backing up or if there were manage events backed up like weāre seeing now.
Right now weāre going to try and replicate this in our lab by making a huge amount of Manager HTTP requests.
Unless Iām missing something, is there a way to force a logoff when using the HTTP interface?
How Asterisk returns the results is not the same as how the request is made, with AMI I believe that if you donāt have two CRās at the end of a transaction, (either over TCP or wrapped in HTML ) , the connection is considered incomplete .
I think you may not understand how the Manager HTTP XML Interface works. You make a single HTTP GET request such as https://YOURSERVER:8089/amxml?action=QueueStatus
and it returns an XML document such as:
<ajax-response>
<response type='object' id='unknown'>
<generic response='Success' eventlist='start' message='Queue status will follow' />
</response>
<response type='object' id='unknown'>
<generic event='QueueParams' queue='default' max='0' strategy='ringall' calls='0' holdtime='0' talktime='0' completed='0' abandoned='0' servicelevel='0' servicelevelperf='0.0' servicelevelperf2='0.0' weight='0' />
</response>
<response type='object' id='unknown'>
<generic event='QueueParams' queue='5000' max='0' strategy='ringall' calls='0' holdtime='0' talktime='0' completed='0' abandoned='0' servicelevel='60' servicelevelperf='0.0' servicelevelperf2='0.0' weight='0' />
</response>
<response type='object' id='unknown'>
<generic event='QueueMember' queue='5000' name='Test Agent 1' location='PJSIP/510' stateinterface='PJSIP/510' membership='static' penalty='0' callstaken='0' lastcall='0' lastpause='0' incall='0' status='5' paused='0' pausedreason='' wrapuptime='0' />
</response>
<response type='object' id='unknown'>
<generic event='QueueStatusComplete' eventlist='Complete' listitems='21' />
</response>
</ajax-response>
Unless Iām mistaken Iām not sure where the double \r\n
which is required in the standard AMI connection comes into play here.
Quite likely you are right, But you are probably an outlier here, but let us know how it works out for you after closing the connections.