Asterisk and/or FreePBX will crash several times a week

Every day for the past three or four days, we’ve had to restart Asterisk manually from the command line (using /etc/init.d/asterisk) , because it had stopped taking calls. This has also been a problem in the past, but this is the last straw.

Restarting Asterisk restores normal service, but this is extremely disruptive, as calls get dropped, or new calls get rejected until someone takes the time to fix this.

The only logs I’ve seen that occur around this time are certain PJSIP users repeatedly attempting to log in and failing. Like so:

[2018-07-05 09:52:26] NOTICE[4984]: res_pjsip/pjsip_distributor.c:649 log_failed_request: Request ‘REGISTER’ from ‘“1050” sip:[email protected]’ failed for ‘192.168.0.117:5060’ (callid: [email protected]) - Failed to authenticate
[2018-07-05 09:52:26] NOTICE[4984]: res_pjsip/pjsip_distributor.c:649 log_failed_request: Request ‘REGISTER’ from ‘“1050” sip:[email protected]’ failed for ‘192.168.0.117:5060’ (callid: [email protected]) - Failed to authenticate
[2018-07-05 09:52:26] NOTICE[4984]: res_pjsip/pjsip_distributor.c:649 log_failed_request: Request ‘REGISTER’ from ‘“1050” sip:[email protected]’ failed for ‘192.168.0.117:5060’ (callid: [email protected]) - Failed to authenticate
[2018-07-05 09:52:26] NOTICE[4984]: res_pjsip/pjsip_distributor.c:649 log_failed_request: Request ‘REGISTER’ from ‘“1050” sip:[email protected]’ failed for ‘192.168.0.117:5060’ (callid: [email protected]) - Failed to authenticate
[2018-07-05 09:52:29] NOTICE[4984]: res_pjsip/pjsip_distributor.c:649 log_failed_request: Request ‘REGISTER’ from ‘“1049” sip:[email protected]’ failed for ‘192.168.0.77:5060’ (callid: [email protected]) - Failed to authenticate
[2018-07-05 09:52:29] NOTICE[4984]: res_pjsip/pjsip_distributor.c:649 log_failed_request: Request ‘REGISTER’ from ‘“1049” sip:[email protected]’ failed for ‘192.168.0.77:5060’ (callid: [email protected]) - Failed to authenticate
[2018-07-05 09:52:29] NOTICE[4984]: res_pjsip/pjsip_distributor.c:649 log_failed_request: Request ‘REGISTER’ from ‘“1049” sip:[email protected]’ failed for ‘192.168.0.77:5060’ (callid: [email protected]) - Failed to authenticate
[2018-07-05 09:52:29] NOTICE[5173][C-00000000]: func_audiohookinherit.c:64 func_inheritance_write: AUDIOHOOK_INHERIT is deprecated and now does nothing.
[2018-07-05 09:52:29] NOTICE[4984]: res_pjsip/pjsip_distributor.c:649 log_failed_request: Request ‘REGISTER’ from ‘“1049” sip:[email protected]’ failed for ‘192.168.0.77:5060’ (callid: [email protected]) - Failed to authenticate

On a possibly related note, when starting Asterisk, I also see several errors about certain Asterisk modules failing to load. Like this:

[2018-07-05 09:52:25] ERROR[4812]: cdr_syslog.c:145 load_config: Unable to load cdr_syslog.conf. Not logging custom CSV CDRs to syslog.

[2018-07-05 09:52:25] ERROR[4812]: app_amd.c:445 load_config: Configuration file amd.conf missing.
[2018-07-05 09:52:25] ERROR[4812]: chan_unistim.c:6757 reload_config: Unable to load config unistim.conf

This should be routine, but in the past I’ve resolved some Asterisk stability issues on other Asterisk servers by setting most modules to not load in /etc/asterisk/modules.conf. Since our FreePBX server is close to stock, I expect this is something that needs to be fixed for the benefit of other FreePBX users.

Other things to note: We currently have 32 hotdesk users and 21 SIP phones on our network. I expect this isn’t an unreasonable strain on a single FreePBX server.

What is your asterisk version?

Asterisk version:

Asterisk 13.17.2 built by mockbuild @ jenkins7

There’s newer Asterisk versions, try upgrading.
Also, restart asterisk by running

fwconsole stop
fwconsole start

I did not have that issue with that version. If you are using the distro try doing a “yum update” just to see if its a problem that was already fixed.

These aren’t modules tailing to load - they are config files that aren’t available and are (therefore) disabling the loaded modules. If you want to not load the modules, you need to set them to noload.

I’ve got servers running in call centers that have multi-hundred day uptimes, so the stability of the FreePBX system isn’t really an everyday issue. To me, what you are describing doesn’t sound like a FreePBX problem - it sounds more like hardware to me.

I’ve seen this problem before on other systems. On one, it was a typo in a config file that wasn’t caught during boot-up (which isn’t a problem any more) and the others were all bad hardware (a bad CPU in one machine and bad RAM in the rest). Could you run a low-level system test (like a stand-alone RAM test) and see if you are getting a stuck bit or something is losing it’s mind when it gets hot?

Another place to look would be in /var/log/messages and make sure you aren’t getting any hardware errors (CRC errors on a drive, for example) that might contribute to the machine losing it’s mind.

Also, as noted: ‘fwconsole reload’ should be your first choice for refreshing the system. The second is ‘fwconsole restart’, which will stop and start the server. The first shouldn’t dump your calls but might not solve your problem. The second one will dump your calls and probably solve the problem temporarily.

Perhaps it’s possible that Asterisk or FreePBX don’t play well with KVM virtual machines?

I have several other servers on this one physical machine, and they don’t have trouble like this.

Just to clarify: Is it Asterisk that’s been running for many days, such that when you issue ‘core show uptime’ on its command line, that it shows it’s been up for that long? Or is it the machine that’s been up many days, and say, if you run the Linux “uptime” command, it shows that you’ve been up 100+ days?

We haven’t had much need to reboot the machine. It’s only Asterisk that has had this trouble.

That would be Asterisk uptime

Are these running on any kind of virtual machines, or is it bare metal?

All of my installations are on servers. I resist using VMs at all cost, mostly because my experience with VMs runs long and disappointing.

My development machine has been up for 37 weeks, with my Asterisk uptime at almost 6 days. It’s my most volatile machine since it’s the one I mess with the most.

What hardware are you using then?

In the meantime, we’re going to look into doing our BIOS updates and a RAM/CPU test just to see if this is the issue.

There’s plenty of users using FreePBX on VM’s. Did you try upgrading asterisk?

We have done the FreePBX upgrade procedure, and so far we haven’t had any more show-stopping failures.

But it’s only been a week. I suppose that’s still better than deadlocking/crashing/just stopping taking calls 3 times in that time?

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.