[SOLVED] PJSIP Service Unavailable

david55 · February 21, 2022, 8:20pm

How fast does it return to read the next event?

jcolp · February 21, 2022, 8:22pm

I don’t know what app or part of Asterisk isn’t keeping up, I can only tell based on the name that it is for channel events. It could certainly be your usage of ARI in combination with the number of channels and what their activities are.

That’s all I’ve got.

To further elaborate on why I can’t go further - I would have to completely replicate your environment and usage patterns, and then spend days to understand precisely how you’re placing load on the system and where the bottlenecks are.

I should also add, that since one of the messages is for ARI - then presumably it would be your ARI subscription that is triggering it and thus why noone else sees it. Could be the result of your over all system usage and what is going on/usage patterns/what is producing the channel events.

jcolp · February 21, 2022, 8:27pm

To respond to myself, this may even be in combination with FreePBX - its dialplan and other things may produce a ton of events.

darrenhollick · February 21, 2022, 9:25pm

@david55 To add clarity; this stasis app registers with the server, subscribes to all channels, filters on only the ChannelHold and ChannelUnhold events. Calls are never placed into the stasis app, we’re simply monitoring for Hold and Unhold events so we can log them since they are not naturally captured in the CEL logs. So this stasis app never sends any commands back to the PBX in response to what it’s receiving. However, all that said the latency between the the PBX and ARI app is about <1ms and so I have to imagine that entire time it could possibly take for a TCP packet to go round trip with processing is <10ms. Does that answer your question?

darrenhollick · February 22, 2022, 6:33pm

New information found! So today there is a clear “stand out” in the backed up taskprocessors:

Processor                               Processed   In Queue  Max Depth  Low water High water
stasis/m:manager:core-00000007          595760      391831    391847     2700      3000

almost 400K queued tasks. No other taskprocessors have anything queued. I’m not sure if that actually refers to the AMI but operating on the theory that it did I took a look at the manager event queue with manager show eventq which revealed that there are of 30K messages in that queue. The oldest one was from this morning 1 minute after this server was rebooted. I don’t use the AMI TCP interface but I think FreePBX does. @jcolp does this new information trigger any thoughts for you that might help me? Would seeing the messages that are in the event queue be helpful? Does anyone know what the possible reasons are that messages could be “stuck” in this queue?

jcolp · February 22, 2022, 6:36pm

Not really, I’ve given as much as I can to this post.

PitzKey · February 22, 2022, 7:28pm

Not directly related… Do you have FastAGI enabled on your FreePBX? FastAGI makes a significant difference in performance.

david55 · February 22, 2022, 7:46pm

Something is subscribed to them but is failing to read them.

darrenhollick · February 22, 2022, 8:05pm

@PitzKey In the Advanced Settings I see Launch local AGIs through FastAGI Server is to NO. I know nothing about FastAGI. When would you want to enable it and when wouldn’t you want to? Are there additional steps to “enable” it or is it as simple as setting that to YES?

darrenhollick · February 22, 2022, 8:13pm

I definitely learning things here… but still don’t have my head around this yet. I do not use the traditional asterisk manager interface at all. So, if there is something connected to that it must be FreePBX. HOWEVER, I do use the Asterisk HTTP XML Interface. Which got me thinking could that be an issue? So I ran manager show connected and got this back:

Username         IP Address                                               Start       Elapsed     FileDes   HttpCnt   Read   Write
admin            127.0.0.1                                                1645522103  37848       46        0         1073750015  1073750015
firewall         127.0.0.1                                                1645528089  31862       83        0         2147483647  00064
my_custom_user   xxx.xx.x.xxx                                             1645528749  31202       -1        0         08191  08191
[...thousands of similar lines omitted...]
my_custom_user   xxx.xx.x.xxx                                             1645554560  5391        -1        0         08191  08191
admin            127.0.0.1                                                1645559951  0           82        0         1073750015  1073750015
67425 users connected.

The I do recognize “my_custom_user” (not real) as something I created and that we use to connect to the aforementioned HTTP Manager.

My understanding of how that manager works is that you send it a simple command and it gives you the response… there is no mechanism to get a stream of events that I’m aware of. So are these messages stacking up because you make a single request to that interface and the server is keeping that connection around for forever?!? Or, am I missing something that I need to be doing to “close” that connection?

PitzKey · February 22, 2022, 8:17pm

This is a good read: Performance Improvements in FreePBX

I enable it on all systems, IIRC it’s enabled by default in FreePBX 15+

You’ll need to reboot your PBX afterwards.

I have a feeling that after enabling FastAGI the task processor will be able to handle all of your requests without any issues.

jcolp · February 22, 2022, 8:55pm

I would suggest eliminating usage of the HTTP interface and seeing if it changes. That interface is ancient, used by few, barely maintained, and likely not understood by many people. If eliminating its usage results in manager working as expected then it would point to that.

dvsatech · February 23, 2022, 1:20am

I see your using the default port. I know it worked fine for a long time but I would suggest using a different non standard port. I didnt see in the question if the phones were co-located with the pbx? Is this going through a firewall. Not sure if it applies but I have had horrendous problems with the standard port 5060 (pjsip) and some routers/ISP’s, works fine for weeks and then just stops (especially FiOS quantum and Spectrum routers). Rebooting the router didn’t help, felt like the mac was stuck in an arp cache or SIG ALS, don’t know but changing to a new non standard port cleared it up, haven’t had an issue in months. Additionally, if the PBX is on a public network its possible hackers are constantly trying to connect or process calls (even though they fail). If its public facing turn on intrusion detection and set the limits fairly low so bad actors get blocked for a significant amount of time. That could reduce the workload on the server if its processing lots of calls. Also, what’s the CPU, how much RAM?

darrenhollick · February 23, 2022, 2:52am

The port shouldn’t matter with the firewall configured the way it is, we only allow traffic from branch offices.

The server has 8 cores and 16 GB memory. Which are both way under utilized as far as I can see.

darrenhollick · February 23, 2022, 2:58am

@PitzKey This system is a fresh FreePBX 15 distro install and it’s not enabled so I don’t think it’s enabled by default. We’ll try it in our lab and if it seems to work well we’ll give it a try but I still think that this is related the to the HTTP manager.

dicko · February 23, 2022, 3:08am

did you restart asterisk without your ami client running yet ?

Does your client conclude with a proper Action: Logoff\n\n ?

darrenhollick · February 23, 2022, 3:29am

@dicko Again, we don’t have an AMI client we only use the HTTP XML Manager Interface so there is no opportunity to send subsequent messages such as “Logoff”. On Monday we didn’t have our services running that make those HTTP calls and we were still seeing taskprocessor warnings. However, I can’t say for certain if the same exact taskprocessor was backing up or if there were manage events backed up like we’re seeing now.

Right now we’re going to try and replicate this in our lab by making a huge amount of Manager HTTP requests.

Unless I’m missing something, is there a way to force a logoff when using the HTTP interface?

dicko · February 23, 2022, 3:41am

How Asterisk returns the results is not the same as how the request is made, with AMI I believe that if you don’t have two CR’s at the end of a transaction, (either over TCP or wrapped in HTML ) , the connection is considered incomplete .

darrenhollick · February 23, 2022, 4:51am

I think you may not understand how the Manager HTTP XML Interface works. You make a single HTTP GET request such as https://YOURSERVER:8089/amxml?action=QueueStatus and it returns an XML document such as:

<ajax-response>
    <response type='object' id='unknown'>
        <generic response='Success' eventlist='start' message='Queue status will follow' />
    </response>
    <response type='object' id='unknown'>
        <generic event='QueueParams' queue='default' max='0' strategy='ringall' calls='0' holdtime='0' talktime='0' completed='0' abandoned='0' servicelevel='0' servicelevelperf='0.0' servicelevelperf2='0.0' weight='0' />
    </response>
    <response type='object' id='unknown'>
        <generic event='QueueParams' queue='5000' max='0' strategy='ringall' calls='0' holdtime='0' talktime='0' completed='0' abandoned='0' servicelevel='60' servicelevelperf='0.0' servicelevelperf2='0.0' weight='0' />
    </response>
    <response type='object' id='unknown'>
        <generic event='QueueMember' queue='5000' name='Test Agent 1' location='PJSIP/510' stateinterface='PJSIP/510' membership='static' penalty='0' callstaken='0' lastcall='0' lastpause='0' incall='0' status='5' paused='0' pausedreason='' wrapuptime='0' />
    </response>
    <response type='object' id='unknown'>
        <generic event='QueueStatusComplete' eventlist='Complete' listitems='21' />
    </response>
</ajax-response>

Unless I’m mistaken I’m not sure where the double \r\n which is required in the standard AMI connection comes into play here.

dicko · February 23, 2022, 5:31am

Quite likely you are right, But you are probably an outlier here, but let us know how it works out for you after closing the connections.