FreePBX HA will not start - unknown error

Hi,

We have a FreePBX HA solution which has been running great for a while, however today we had a power failure and the HA solution will not start.

This is the output of pcs status

Cluster name: freepbx-ha
Last updated: Fri May 18 18:05:19 2018
Last change: Fri May 18 18:03:44 2018
Stack: cman
Current DC: freepbx-a - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured
22 Resources configured


Node freepbx-b: OFFLINE (standby)
Online: [ freepbx-a ]

Full list of resources:

 spare_ip       (ocf::heartbeat:IPaddr2):       Started freepbx-a
 floating_ip    (ocf::heartbeat:IPaddr2):       Started freepbx-a
 Master/Slave Set: ms-asterisk [drbd_asterisk]
     Masters: [ freepbx-a ]
     Stopped: [ freepbx-b ]
 Master/Slave Set: ms-mysql [drbd_mysql]
     Masters: [ freepbx-a ]
     Stopped: [ freepbx-b ]
 Master/Slave Set: ms-httpd [drbd_httpd]
     Masters: [ freepbx-a ]
     Stopped: [ freepbx-b ]
 Master/Slave Set: ms-spare [drbd_spare]
     Masters: [ freepbx-a ]
     Stopped: [ freepbx-b ]
 spare_fs       (ocf::heartbeat:Filesystem):    Started freepbx-a
 Resource Group: mysql
     mysql_fs   (ocf::heartbeat:Filesystem):    Started freepbx-a
     mysql_ip   (ocf::heartbeat:IPaddr2):       Started freepbx-a
     mysql_service      (ocf::heartbeat:mysql): Started freepbx-a
 Resource Group: asterisk
     asterisk_fs        (ocf::heartbeat:Filesystem):    Started freepbx-a
     asterisk_ip        (ocf::heartbeat:IPaddr2):       Started freepbx-a
     asterisk_service   (ocf::heartbeat:freepbx):       Stopped
 Resource Group: httpd
     httpd_fs   (ocf::heartbeat:Filesystem):    Stopped
     httpd_ip   (ocf::heartbeat:IPaddr2):       Stopped
     httpd_service      (ocf::heartbeat:apache):        Stopped
 Clone Set: ClusterMon-SMTP-clone [ClusterMon-SMTP]
     Started: [ freepbx-a ]
     Stopped: [ freepbx-b ]

Failed actions:
    asterisk_service_start_0 on freepbx-a 'unknown error' (1): call=223, status=Timed Out, last-rc-change='Fri May 18 18:03:53 2018', queued=0ms, exec=30001ms

freepbx-b is currently powered off and I have also run /usr/local/asterisk/fixcluster but it does not help

It looks like the issue is with Asterisk not being able to start - I thought this could be due to the fact that /var/www is not mounted in drbd but checking the logs after a reboot, it looks like httpd starts then asterisk attempts to start but then fails and then it unmounts /var/www

Unsure where else to look for the issue, so any help appreciated?

Try pcs resource cleanup httpd and pcs resource cleanup asterisk

https://wiki.freepbx.org/display/FPG/IPMI+Failure

Thanks - I tried those but with no luck

Things appear to start but then the starting of asterisk seems to timeout and so fails and then httpd is unmounted again.

Here are some of the logs

May 19 00:22:05 freepbx-a pacemakerd[1310]:   notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log
May 19 00:22:07 freepbx-a lrmd[25206]:  warning: child_timeout_callback: asterisk_service_start_0 process (PID 367) timed out
May 19 00:22:07 freepbx-a lrmd[25206]:  warning: operation_finished: asterisk_service_start_0:367 - timed out after 30000ms
May 19 00:22:07 freepbx-a crmd[25209]:    error: process_lrm_event: Operation asterisk_service_start_0: Timed Out (node=freepbx-a, call=167, timeout=30000ms)
May 19 00:22:07 freepbx-a crmd[25209]:  warning: status_from_rc: Action 145 (asterisk_service_start_0) on freepbx-a failed (target: 0 vs. rc: 1): Error
May 19 00:22:07 freepbx-a crmd[25209]:  warning: update_failcount: Updating failcount for asterisk_service on freepbx-a after failed start: rc=1 (update=INFINITY, time=1526710927)
May 19 00:22:07 freepbx-a crmd[25209]:   notice: abort_transition_graph: Transition aborted by asterisk_service_start_0 'modify' on freepbx-a: Event failed (magic=2:1;145:13:0:98e031de-471b-44f6-b803-50c7e99f5e5f, cib=0.331.15, source=match_graph_event:344, 0)
May 19 00:22:07 freepbx-a crmd[25209]:  warning: update_failcount: Updating failcount for asterisk_service on freepbx-a after failed start: rc=1 (update=INFINITY, time=1526710927)
May 19 00:22:07 freepbx-a crmd[25209]:  warning: status_from_rc: Action 145 (asterisk_service_start_0) on freepbx-a failed (target: 0 vs. rc: 1): Error
May 19 00:22:07 freepbx-a crmd[25209]:  warning: update_failcount: Updating failcount for asterisk_service on freepbx-a after failed start: rc=1 (update=INFINITY, time=1526710927)
May 19 00:22:07 freepbx-a crmd[25209]:  warning: update_failcount: Updating failcount for asterisk_service on freepbx-a after failed start: rc=1 (update=INFINITY, time=1526710927)
May 19 00:22:07 freepbx-a crmd[25209]:   notice: run_graph: Transition 13 (Complete=11, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-831.bz2): Stopped

Don’t know if this is because asterisk takes too long to actually start or something else?

Looks like the issue was with Asterisk taking too long to actually start and the timeout for starting the service was 30 seconds.

I adjusted the default timeout to 240 seconds and this solved the problem.

Should this not be a standard setting on HA setups?

Asterisk should never take 240 seconds to start! This is probably a DNS issue on your end. If your machine has no DNS, then you need to make sure there’s nothing in /etc/resolv.conf, otherwise it will hang on startup.

It doesn’t take 240 seconds to start, probably more like 40 - 60 seconds, which is still longer than the default timeout of 30 seconds.

DNS is also fine and can resolve addresses. There is no internet access though, so unsure if this makes a difference to startup time (wouldn’t think so)

Disable DNS totally, or, give the machine internet access. Asterisk hates only having half of it.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.