[SOLVED] PJSIP Service Unavailable

How fast does it return to read the next event?

I donā€™t know what app or part of Asterisk isnā€™t keeping up, I can only tell based on the name that it is for channel events. It could certainly be your usage of ARI in combination with the number of channels and what their activities are.

Thatā€™s all Iā€™ve got.

To further elaborate on why I canā€™t go further - I would have to completely replicate your environment and usage patterns, and then spend days to understand precisely how youā€™re placing load on the system and where the bottlenecks are.

I should also add, that since one of the messages is for ARI - then presumably it would be your ARI subscription that is triggering it and thus why noone else sees it. Could be the result of your over all system usage and what is going on/usage patterns/what is producing the channel events.

To respond to myself, this may even be in combination with FreePBX - its dialplan and other things may produce a ton of events.

@david55 To add clarity; this stasis app registers with the server, subscribes to all channels, filters on only the ChannelHold and ChannelUnhold events. Calls are never placed into the stasis app, weā€™re simply monitoring for Hold and Unhold events so we can log them since they are not naturally captured in the CEL logs. So this stasis app never sends any commands back to the PBX in response to what itā€™s receiving. However, all that said the latency between the the PBX and ARI app is about <1ms and so I have to imagine that entire time it could possibly take for a TCP packet to go round trip with processing is <10ms. Does that answer your question?

New information found! So today there is a clear ā€œstand outā€ in the backed up taskprocessors:

Processor                               Processed   In Queue  Max Depth  Low water High water
stasis/m:manager:core-00000007          595760      391831    391847     2700      3000

almost 400K queued tasks. No other taskprocessors have anything queued. Iā€™m not sure if that actually refers to the AMI but operating on the theory that it did I took a look at the manager event queue with manager show eventq which revealed that there are of 30K messages in that queue. The oldest one was from this morning 1 minute after this server was rebooted. I donā€™t use the AMI TCP interface but I think FreePBX does. @jcolp does this new information trigger any thoughts for you that might help me? Would seeing the messages that are in the event queue be helpful? Does anyone know what the possible reasons are that messages could be ā€œstuckā€ in this queue?

Not really, Iā€™ve given as much as I can to this post.

Not directly relatedā€¦ Do you have FastAGI enabled on your FreePBX? FastAGI makes a significant difference in performance.

Something is subscribed to them but is failing to read them.

@PitzKey In the Advanced Settings I see Launch local AGIs through FastAGI Server is to NO. I know nothing about FastAGI. When would you want to enable it and when wouldnā€™t you want to? Are there additional steps to ā€œenableā€ it or is it as simple as setting that to YES?

I definitely learning things hereā€¦ but still donā€™t have my head around this yet. I do not use the traditional asterisk manager interface at all. So, if there is something connected to that it must be FreePBX. HOWEVER, I do use the Asterisk HTTP XML Interface. Which got me thinking could that be an issue? So I ran manager show connected and got this back:

Username         IP Address                                               Start       Elapsed     FileDes   HttpCnt   Read   Write
admin            127.0.0.1                                                1645522103  37848       46        0         1073750015  1073750015
firewall         127.0.0.1                                                1645528089  31862       83        0         2147483647  00064
my_custom_user   xxx.xx.x.xxx                                             1645528749  31202       -1        0         08191  08191
[...thousands of similar lines omitted...]
my_custom_user   xxx.xx.x.xxx                                             1645554560  5391        -1        0         08191  08191
admin            127.0.0.1                                                1645559951  0           82        0         1073750015  1073750015
67425 users connected.

The I do recognize ā€œmy_custom_userā€ (not real) as something I created and that we use to connect to the aforementioned HTTP Manager.

My understanding of how that manager works is that you send it a simple command and it gives you the responseā€¦ there is no mechanism to get a stream of events that Iā€™m aware of. So are these messages stacking up because you make a single request to that interface and the server is keeping that connection around for forever?!? Or, am I missing something that I need to be doing to ā€œcloseā€ that connection?

This is a good read: Performance Improvements in FreePBX

I enable it on all systems, IIRC itā€™s enabled by default in FreePBX 15+

Youā€™ll need to reboot your PBX afterwards.

I have a feeling that after enabling FastAGI the task processor will be able to handle all of your requests without any issues.

I would suggest eliminating usage of the HTTP interface and seeing if it changes. That interface is ancient, used by few, barely maintained, and likely not understood by many people. If eliminating its usage results in manager working as expected then it would point to that.

1 Like

I see your using the default port. I know it worked fine for a long time but I would suggest using a different non standard port. I didnt see in the question if the phones were co-located with the pbx? Is this going through a firewall. Not sure if it applies but I have had horrendous problems with the standard port 5060 (pjsip) and some routers/ISPā€™s, works fine for weeks and then just stops (especially FiOS quantum and Spectrum routers). Rebooting the router didnā€™t help, felt like the mac was stuck in an arp cache or SIG ALS, donā€™t know but changing to a new non standard port cleared it up, havenā€™t had an issue in months. Additionally, if the PBX is on a public network its possible hackers are constantly trying to connect or process calls (even though they fail). If its public facing turn on intrusion detection and set the limits fairly low so bad actors get blocked for a significant amount of time. That could reduce the workload on the server if its processing lots of calls. Also, whatā€™s the CPU, how much RAM?

The port shouldnā€™t matter with the firewall configured the way it is, we only allow traffic from branch offices.

The server has 8 cores and 16 GB memory. Which are both way under utilized as far as I can see.

@PitzKey This system is a fresh FreePBX 15 distro install and itā€™s not enabled so I donā€™t think itā€™s enabled by default. Weā€™ll try it in our lab and if it seems to work well weā€™ll give it a try but I still think that this is related the to the HTTP manager.

did you restart asterisk without your ami client running yet ?

Does your client conclude with a proper Action: Logoff\n\n ?

1 Like

@dicko Again, we donā€™t have an AMI client we only use the HTTP XML Manager Interface so there is no opportunity to send subsequent messages such as ā€œLogoffā€. On Monday we didnā€™t have our services running that make those HTTP calls and we were still seeing taskprocessor warnings. However, I canā€™t say for certain if the same exact taskprocessor was backing up or if there were manage events backed up like weā€™re seeing now.

Right now weā€™re going to try and replicate this in our lab by making a huge amount of Manager HTTP requests.

Unless Iā€™m missing something, is there a way to force a logoff when using the HTTP interface?

How Asterisk returns the results is not the same as how the request is made, with AMI I believe that if you donā€™t have two CRā€™s at the end of a transaction, (either over TCP or wrapped in HTML ) , the connection is considered incomplete .

I think you may not understand how the Manager HTTP XML Interface works. You make a single HTTP GET request such as https://YOURSERVER:8089/amxml?action=QueueStatus and it returns an XML document such as:

<ajax-response>
    <response type='object' id='unknown'>
        <generic response='Success' eventlist='start' message='Queue status will follow' />
    </response>
    <response type='object' id='unknown'>
        <generic event='QueueParams' queue='default' max='0' strategy='ringall' calls='0' holdtime='0' talktime='0' completed='0' abandoned='0' servicelevel='0' servicelevelperf='0.0' servicelevelperf2='0.0' weight='0' />
    </response>
    <response type='object' id='unknown'>
        <generic event='QueueParams' queue='5000' max='0' strategy='ringall' calls='0' holdtime='0' talktime='0' completed='0' abandoned='0' servicelevel='60' servicelevelperf='0.0' servicelevelperf2='0.0' weight='0' />
    </response>
    <response type='object' id='unknown'>
        <generic event='QueueMember' queue='5000' name='Test Agent 1' location='PJSIP/510' stateinterface='PJSIP/510' membership='static' penalty='0' callstaken='0' lastcall='0' lastpause='0' incall='0' status='5' paused='0' pausedreason='' wrapuptime='0' />
    </response>
    <response type='object' id='unknown'>
        <generic event='QueueStatusComplete' eventlist='Complete' listitems='21' />
    </response>
</ajax-response>

Unless Iā€™m mistaken Iā€™m not sure where the double \r\n which is required in the standard AMI connection comes into play here.

Quite likely you are right, But you are probably an outlier here, but let us know how it works out for you after closing the connections.

1 Like