High Availability and GUI down after Module Upgrade

avayax · February 18, 2015, 12:53pm

I updated several modules today including freePBX framework from the latest to the newest version.
That worked but “Apply Changes” took forever to load.
Now I can’t login into the GUI of my floating IP anymore and trying to do amportal restart from the CLI returns:
Fetching settings from amportal.conf file…
/etc/amportal.conf: line 760: reg: command not found
/etc/amportal.conf: line 925: http://feeds.feedburner.com/InsideTheAsterisk: No such file or directory
FATAL: can not find freepbx_engine to start Asterisk

Can you help?

avayax · February 18, 2015, 1:02pm

[root@freepbx-b ~]# pcs status
Cluster name:
Last updated: Wed Feb 18 07:57:12 2015
Last change: Tue Feb 17 20:15:07 2015 via crm_attribute on freepbx-a
Stack: cman
Current DC: freepbx-b - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured
22 Resources configured

Online: [ freepbx-a freepbx-b ]

Full list of resources:

spare_ip (ocf:IPaddr2): Started freepbx-b
floating_ip (ocf:IPaddr2): Started freepbx-b
Master/Slave Set: ms-asterisk [drbd_asterisk]
Masters: [ freepbx-a ]
Slaves: [ freepbx-b ]
Master/Slave Set: ms-mysql [drbd_mysql]
Masters: [ freepbx-a ]
Slaves: [ freepbx-b ]
Master/Slave Set: ms-httpd [drbd_httpd]
Masters: [ freepbx-a ]
Slaves: [ freepbx-b ]
Master/Slave Set: ms-spare [drbd_spare]
Masters: [ freepbx-a ]
Slaves: [ freepbx-b ]
spare_fs (ocf:Filesystem): Started freepbx-a
Resource Group: mysql
mysql_fs (ocf:Filesystem): Started freepbx-a
mysql_ip (ocf:IPaddr2): Started freepbx-a
mysql_service (ocf:mysql): FAILED freepbx-a (unmanaged)
Resource Group: asterisk
asterisk_fs (ocf:Filesystem): Started freepbx-a
asterisk_ip (ocf:IPaddr2): Started freepbx-a
asterisk_service (ocf:freepbx): FAILED freepbx-a (unmanaged)
Resource Group: httpd
httpd_fs (ocf:Filesystem): Started freepbx-a
httpd_ip (ocf:IPaddr2): Started freepbx-a
httpd_service (ocf:apache): FAILED freepbx-a (unmanaged)
Clone Set: ClusterMon-SMTP-clone [ClusterMon-SMTP]
Started: [ freepbx-a freepbx-b ]

Failed actions:
mysql_service_stop_0 on freepbx-a ‘unknown error’ (1): call=1481, status=Timed Out, last-rc-change=‘Wed Feb 18 07:32:06 2015’, queued=41397ms, exec=0ms
httpd_service_stop_0 on freepbx-a ‘unknown error’ (1): call=1487, status=Timed Out, last-rc-change=‘Wed Feb 18 07:32:06 2015’, queued=42377ms, exec=0ms
asterisk_ip_monitor_30000 on freepbx-a ‘unknown error’ (1): call=1423, status=Timed Out, last-rc-change=‘Wed Fe b 18 07:31:47 2015’, queued=0ms, exec=0ms
asterisk_service_stop_0 on freepbx-a ‘unknown error’ (1): call=1484, status=Timed Out, last-rc-change=‘Wed Feb 18 07:32:06 2015’, queued=42379ms, exec=0ms
floating_ip_monitor_30000 on freepbx-a ‘unknown error’ (1): call=1298, status=Timed Out, last-rc-change=‘Wed Fe b 18 07:31:47 2015’, queued=0ms, exec=0ms
mysql_ip_monitor_30000 on freepbx-a ‘unknown error’ (1): call=1410, status=Timed Out, last-rc-change=‘Wed Feb 1 8 07:31:47 2015’, queued=0ms, exec=0ms
spare_ip_start_0 on freepbx-a ‘unknown error’ (1): call=1494, status=Timed Out, last-rc-change=‘Wed Feb 18 07:3 2:17 2015’, queued=36722ms, exec=0ms
httpd_ip_monitor_30000 on freepbx-a ‘unknown error’ (1): call=1406, status=Timed Out, last-rc-change=‘Wed Feb 1 8 07:31:47 2015’, queued=0ms, exec=0ms

PCSD Status:
Error: no nodes found in corosync.conf

avayax · February 18, 2015, 1:47pm

I had to clear httpd and asterisk errors manually via the CLI and now I am back up.
Curious as to why a mere module update would bring my whole cluster down and kill my system.

Any ideas?

Thank you guys for your help and advice.

xrobau · February 18, 2015, 7:33pm

I explicitly make it hard for people to set stuff to Unmanaged. I’m guessing that, at some point, you set a bunch of service to be unmanaged, and never removed them.

You need to fix that, and set them back to Managed.

avayax · February 18, 2015, 7:58pm

Thanks Rob.

I don’t know what you mean by setting stuff to unmanaged. I didn’t take a deliberate action to set something to unmanaged.
In fact before I did the module upgrade, everything was fine and nothing was at unmanaged.
So I am wondering what happened.
The modules I upgraded from the GUI were framework, core and two others.

When I cleared the errors manually (the GUI was not accessible), things came back.

avayax · February 18, 2015, 8:16pm

We’re a new HA user.
We know nothing about “managed” vs “unmanaged” in this context, and wouldn’t even know where to go change that. These are virgin FreePBX distro + HA machines with nothing else on them. Any hint as to what managed vs unmanaged means, and where to change/check it would be very helpful.

Thank you.

xrobau · February 18, 2015, 11:40pm

http://wiki.freepbx.org/display/FCM/FreePBX+HA-Setting+the+cluster+to+maintenance+mode