FreePBX goes down once a week

Our business uses the FreePBX distro (12.0.43) for our phone system, and about once per week it goes “down”, meaning that it can no longer accept incoming calls or make outgoing calls.

When this happens, I can still log in to the web gui and it shows active extensions and SIP trunks.

I can still SSH in and run the Asterisk CLI.

The error(s) in the Asterisk console is:

channel.c:4874 ast_prod: Prodding channel 'SIP/400-00000cc8' failed

…for each extension, but not all at once. They come periodically.

The error on the phones when trying to dial out is:

Calling (out INV)

It starts working again if I log in to the web gui and reboot the machine. Restarting phones, routers, or anything else has no effect. I have to restart the server.

I tried “amportal restart” but that seemed to hang forever “waiting for Asterisk to gracefully exit”.

Restarting the server fixes the problem immediately.

Does anyone have any suggestions?

That to me does not sound like anything directly to do with Asterisk/FreePBX, but more an underlying OS problem, I would install sysstat and explore your sar logs daily and as time goes on, there is no one way to identify your problem from what you anecdoted, but concentrating on excessive memory usage/leaks over time, other parts of the sar logs might possibly give clues as to where to start as it logs usage by all the undeerlying subsytems that your linux system is composed of.

More pragmatically, install the same software on alternate hardware and “compare and contrast”

Thanks @dicko,

I’m happy to try that.

This is a vanilla, unmodified FreePBX distro, not something I’ve cobbled together, though.

To be clear, it’s not the whole computer that locks up; I log in to the FreePBX web GUI to reboot. In fact Asterisk doesn’t even lock up, as I can log in to the CLI.

-Wes

Exactly, a vanilla FreePBX distro , un-cobbled (or many other distros, VOIP or otherwise, Debian, Centos etc.) , just do NOT behave that way, they work for years I can assure you, YOUR hardware system however does CRASH, does that not add some logic into your thinking ?

1 Like

OK, I’ve got sysstat collecting entries in SAR logs, so we’ll see how it goes.

Keep an eye on the network stuff . . . My guess it just “throws a wobbly” at about the same time.

Probably the cleaner unplugging the server to plug the hoover in :laughing:

1 Like

Probably the cleaner unplugging the server to plug the hoover in

Normally that’s what I’d think, too!

We actually had a restaurant client who kept complaining that the network switch we installed was taking their Point-of-sale system offline. They had it plugged in to a power strip, and they were using the strip to turn their beer sign on and off. Beer sign off → switch off → POS down.

The SAR logs all look fine; nothing over 3% or so for the past few days.

This is on an AWS c3 instance, so no one’s unplugging it. (Making an AMI from the FreePBX distro was an adventure, but I wanted to make sure I had something I could get support for if I needed it). I think the Amazon cleaner may unplug the free instances from time to time, but she only unplugs the paid ones when you’re at an important plot twist in your favourite Netflix show.

It’s such a strange situation. The phones stop working, but nothing “goes down”. Web interface is up, Asterisk is up, I can SSH in and poke around.

Is there any specific test I can run while it’s down that may shed some light on the subject?

-Wes

I do see a slight contradiction here :sunglasses:

I don’t want to delve into this, because SOMETHING is breaking Asterisk, and it’s unlikely to be asterisk itself.

There needs to be more debugging provided before anyone will be able to give you much help. But things like network captures, process dumps, and a pile of other things are going to be useful.

Or, you could just hit up some of our hosting partners, which care deeply about making this stuff work reliably and stably. Also, they help pay our wages, which is nice.

1 Like

I was expecting that comment, which is why I didn’t mention it in the first place.

I took the FreePBX full distro iso from the Schmooze website, installed it on a PC, and used the CentOS tools to turn it into an AMI image.

I’m paying Schmooze for support, but they can’t seem to figure it out, either, which is why I’m posting here.

I was just wondering if there was some command I can run from the Asterisk CLI while the phone calls aren’t going through to give better diagnostic information.

Can you message me your support ticket number and I’ll investigate it further, please?

Edit: If it’s the one ending in 233, then that looks about right. Asterisk is locking up for some reason. That could be hardware related, but I don’t know. It’s hard.

Yes, the one ending in 233.

Thanks for taking a look. I’m happy to run whatever tests.

I’m quite comfortable with the *nix CLI, but I don’t know the first thing about Asterisk.

(that’s why I’m running FreePBX :smile: )

-Wes

This is a deep system level issue. Asterisk is hanging, and we don’t know why. All we know is that it works fine on normal hardware, and we also know that AWS has all sorts of strange and unexplainable errors with Asterisk. Again, we don’t know why.

This is (one of the reasons) why we don’t have a proper FreePBX AMI. It’s broken, somehow. and we don’t want to spend the time trying to fix it, when it’s going to turn out to be some unfixable kernel change, or bizarre networking issue, that’s not fixable anyway.

Seriously, use one of the hosting partners. We get to tell THEM what hardware and software to use, so we know it’s going to work properly.

Well caught Rob. . .

@waldo22 , You can hide, but you can’t then run :wink:

How is it helpful to anyone to state one thing having done another ? ( No mind readers here . . . )

As a JFWIW thing, using an Amazon instance as a host puts you on the same “Network” as many really “bad-guys” with no reasonably controllable firewall between you and the Palestinians and Amazon really doesn’t GAF, you have added a level of insecurity you are not even aware of perhaps?

1 Like

OK, thank you both for the advice.

Agree with Rob.

I’ve seen this before in early XEN VM platforms…

To help the community, can provide more about the platform?

Thanks for all the posts.

It’s just so strange that nothing actually crashes.

Both the Asterisk CLI and the FreePBX gui are still up and I can log in and run commands. I just can’t place or receive calls. How is PIAF not having these problems with their AMI? Or are they?

Anyway, how scale-able is the Cyberlynk stuff? It’s definitely big enough for us now, but can we stay with them and expand to hundreds of simultaneous calls? Even the Diamond VPS is very reasonably priced, especially compared to AWS.

-Wes

I will surmise by saying that unless you have total control over your kernel and unimpeded access to the internet then you will likely continue to have problems, many “cloud” services just don’t give you that so you will continue to be working in the dark. Given those two needs are satisfied by your vendor however, then you should not have a problem. Choose a vendor that has a proven record with FreePBX/Asterisk, Rob gave you a lead . . it’s as simple as that.

Yes, and Cyberlynk (www.freepbxhosting.com) is one of the vendor partners on the FreePBX home page.

Well, I have say the pricing although “simple and straightforward” is a little beyond my comprehension :wink:

http://thundercloudpbx.com/packages-pricing/

Perhaps the IPOD nano version is for you :slight_smile:

1 Like