FreePBX HA status not showing correct information

avayax · March 29, 2015, 7:38pm

I upgraded the RAM of one of our HA servers to 4GB. I did a normal shutdown and restarted with the new RAM. All appears fine from the console. pcs status shows the cluster working fine as master, and the secondary down as we’d expect. However, the FreePBX GUI thinks that the master node is also down, despite the GUI working fine from the floating IP address. HA status says:

<img

And the Manage form is mostly blank, there are no status check options.

But pcs status shows:

[root@freepbx-a ~]# pcs status
Cluster name:
Last updated: Sun Mar 29 14:50:19 2015
Last change: Sun Mar 29 14:40:42 2015 via crmd on freepbx-a
Stack: cman
Current DC: freepbx-a - partition WITHOUT quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured
22 Resources configured

Online: [ freepbx-a ]
OFFLINE: [ freepbx-b ]

Full list of resources:

spare_ip (ocf:IPaddr2): Started freepbx-a
floating_ip (ocf:IPaddr2): Started freepbx-a
Master/Slave Set: ms-asterisk [drbd_asterisk]
Masters: [ freepbx-a ]
Stopped: [ freepbx-b ]
Master/Slave Set: ms-mysql [drbd_mysql]
Masters: [ freepbx-a ]
Stopped: [ freepbx-b ]
Master/Slave Set: ms-httpd [drbd_httpd]
Masters: [ freepbx-a ]
Stopped: [ freepbx-b ]
Master/Slave Set: ms-spare [drbd_spare]
Masters: [ freepbx-a ]
Stopped: [ freepbx-b ]
spare_fs (ocf:Filesystem): Started freepbx-a
Resource Group: mysql
mysql_fs (ocf:Filesystem): Started freepbx-a
mysql_ip (ocf:IPaddr2): Started freepbx-a
mysql_service (ocf:mysql): Started freepbx-a
Resource Group: asterisk
asterisk_fs (ocf:Filesystem): Started freepbx-a
asterisk_ip (ocf:IPaddr2): Started freepbx-a
asterisk_service (ocf:freepbx): Started freepbx-a
Resource Group: httpd
httpd_fs (ocf:Filesystem): Started freepbx-a
httpd_ip (ocf:IPaddr2): Started freepbx-a
httpd_service (ocf:apache): Started freepbx-a
Clone Set: ClusterMon-SMTP-clone [ClusterMon-SMTP]
Started: [ freepbx-a ]
Stopped: [ freepbx-b ]

PCSD Status:
Error: no nodes found in corosync.conf

No error indicated. I ran the fixcluster script twice, also no errors reported. drdb-overview shows normal status (for a simplex master).

When I click on the update available on the FreePBX HA status page, nothing happens.

Can you advise?

avayax · March 29, 2015, 8:03pm

There is also a message, that "Alerting is not available due to the ‘clustermon sh’ script not being present at the time.

Yet, the script seems to be there:

[root@freepbx-a /]# find . -name “clustermon.sh”
./var/backup.www20150216/html/admin/modules/freepbx_ha/files/clustermon.sh
./drbd/httpd/www/html/admin/modules/freepbx_ha/files/clustermon.sh
./usr/local/asterisk/clustermon.sh

And /var/log/cluster/corosync.log reports:

Mar 29 16:10:43 [1998] freepbx-a pengine: info: determine_online_status: Node freepbx-a is online

In /var/log/asterisk/freepbx.log, when viewing the HA Status page:

[2015-Mar-29 16:32:16] [PHP-WARNING]
(/drbd/httpd/www/html/admin/libraries/BMO/GuiHooks.class.php:270) - in_array() expects parameter 2 to be array, boolean given
[2015-Mar-29 16:32:16] [SECURITY] (BMO/Notifications.class.php:463) - [NOTIFICATION]-[freepbx]-[FW_UNSIGNED] - You have 1 unsigned modules (Module “Config Editor (Advanced)” is unsigned and should be re-downloaded)
[2015-Mar-29 16:32:16] [PHP-WARNING] (/drbd/httpd/www/html/admin/libraries/BMO/GuiHooks.class.php:40) - Invalid argument supplied for foreach()
[2015-Mar-29 16:32:17] [PHP-WARNING] (/drbd/httpd/www/html/admin/libraries/BMO/GuiHooks.class.php:40) - Invalid argument supplied for foreach()
[2015-Mar-29 16:32:17] [WARNING] (libraries/modulefunctions.legacy.php:7) - Depreciated Function module_getinfo detected in /drbd/httpd/www/html/admin/libraries/featurecodes.functions.php on line 42

xrobau · March 30, 2015, 12:23am

The ‘DOWN’ message means that when it tried to ask for status, it received an error. This is also why you’re not getting any other information on that page.

There may be additional errors in /var/log/httpd/error.log?

The clustermon.sh check does this:

Does the file /usr/local/asterisk/clustermon.sh exist? No? Well, put it in its place.
Are the file permissions right? Fix them. If you can’t, error.

One of those two things are failing. I’d start by checking the ownership of those files. You should get something similar (the uid, guid, size and mode should match, don’t worry about anything else) when you run this.

[root@freepbx-a ~]# php -r "print_r(stat(’/usr/local/asterisk/clustermon.sh’));" Array ( [0] => 64768 [1] => 1835811 [2] => 33261 [3] => 1 [4] => 499 [5] => 498 [6] => 0 [7] => 5334 [8] => 1427067548 [9] => 1427077946 [10] => 1427077946 [11] => 4096 [12] => 16 [dev] => 64768 [ino] => 1835811 [mode] => 33261 [nlink] => 1 [uid] => 499 [gid] => 498 [rdev] => 0 [size] => 5334 [atime] => 1427067548 [mtime] => 1427077946 [ctime] => 1427077946 [blksize] => 4096 [blocks] => 16 ) [root@freepbx-a ~]#

avayax · March 30, 2015, 12:43am

This is what I got:

[root@freepbx-a ~]# php -r "print_r(stat(’/usr/local/asterisk/clustermon.sh’));"
Array
(
[0] => 64768
[1] => 2371010
[2] => 33277
[3] => 1
[4] => 499
[5] => 498
[6] => 0
[7] => 5334
[8] => 1427663577
[9] => 1424998007
[10] => 1427662744
[11] => 4096
[12] => 16
[dev] => 64768
[ino] => 2371010
[mode] => 33277
[nlink] => 1
[uid] => 499
[gid] => 498
[rdev] => 0
[size] => 5334
[atime] => 1427663577
[mtime] => 1424998007
[ctime] => 1427662744
[blksize] => 4096
[blocks] => 16

xrobau · March 30, 2015, 12:48am

~~Well there’s nothing wrong there… Has your machine randomly had selinux re-enabled?~~

Edit. Ooh. Wait. The mode is wrong. It’s set to 775, instead of 755. Someone’s been fiddling with file permissions on that machine

You fix the error by ‘chmod 755 /usr/local/asterisk/clustermon.sh’

Edit: I still want to know what’s happening in /var/log/httpd/error.log with the status failing.

avayax · March 30, 2015, 2:34pm

Thanks Rob.

It’s a month old virgin out-of-the-box FreePBX distro HA install. Only two people have logins, and neither mucked with file perms. Perms on /usr/local/asterisk/clustermon.sh are 775, because that’s how the 6.12.65-25 distro first installed them. Changing it from 775->755 is only a security measure to prevent members of gid asterisk from being able to write the file, not that any should be trying. It can’t be causing this problem, unless something’s specifically aborting the rest of the status check because the perms aren’t as expected.

Anyway, I changed the perms and amportal reloaded, and HA status page still reports master node down.

The apache error log (/var/log/httpd/error_log) says:

[Mon Mar 30 10:20:43 2015] [error] [client 10.1.7.121] PHP Warning: Duplicate license for product PBXact[Schmooze Communications] (license file: /etc/schmooze/license-xxxxxxxxxx.zl). Since valid license for this product has already been loaded, this license file will be ignored. in /drbd/httpd/www/html/admin/modules/freepbx_ha/license.php on line 22, referer: http ://10.1.1.182/admin/config.php?display=freepbx_ha
[Mon Mar 30 10:20:43 2015] [error] [client 10.1.7.121] PHP Fatal error: Class ‘FreePBX’ not found in /drbd/httpd/www/html/admin/modules/sysadmin/functions.inc/Schmooze.class.php on line 89, referer: http ://10.1.1.182/admin/config.php?display=freepbx_ha

Maybe Zend ID is now upset by the RAM upgrade, but the GUI is happy with the deployment ID and license?!?

xrobau · March 30, 2015, 6:24pm

That’s correct. You have two problems. One was clustermon, which is now fixed. The other is the status stuff.

THAT is what’s causing status to not work. Ooooh. OK, yep. Found the bug. I’ll release a fix shortly.

Edit 1 - Sysadmin 12.0.20.1 fixes the causes-it-to-crash problem. I also need to fix the Zending stuff in HA, which will remove that warning. Go upgrade sysadmin to 12.0.20.1.

Edit 2 - FreePBX HA release 12.0.5.1 removes that warning. Sorry about that!

–Rob

avayax · March 30, 2015, 7:35pm

We got it working now, thanks.

avayax · March 30, 2015, 8:08pm

That was a little premature.
We still got a problem:

We want to join our new second machine to the clustering, and have installed updates on both machines.
When trying to join the new server onto the existing, the join screen is blank.
Don’t know if this is related to module updates, but this worked before.

We had an old slave die and we replaced it with new secondary node. Successfully moved the deployment ID and license over. The process as far as I know is while the master thinks the old slave has died, you just add the new secondary onto the existing cluster, and it should magically replace the dead secondary. At least, that’s what this link implies.

Looks like the cluster-join pre-check (shown below) is what’s missing…

There’s a gnarly error in the new secondary node’s /var/log/httpd/error_log:

[Mon Mar 30 16:02:24 2015] [error] [client 10.1.7.121] PHP Fatal error: Call to undefined function zend_loaded_install_license() in /var/www/html/admin/modules/freepbx_ha/license.php on line 21, referer: http ://10.1.1.181/admin/config.php?display=freepbx_ha&action=join

Even though the System Admin | Activation screen shows the HA license, and the /etc/schmooze/license-.zl file is there.

xrobau · March 30, 2015, 8:45pm

I’m not having a good day, am I

HA 12.0.5.2 is being published right now. (Edit: Complete. Upgrade to 12.0.5.2 on both nodes then join them)

avayax · March 31, 2015, 12:23pm

We are good now. All working, thanks.

mattsl · April 1, 2015, 12:26am

I’m on 12.0.5.2.

B was the master and I had an error stating that clustermon.sh was missing. It wasn’t.

Before I was able to look much further into that, I found another bug (IPMI fencing killed both machines instead of one) when I brought them back online, A was the master and it says:

Warning: Clustermon.sh on the other machine is different to the version on this machine.

I scp’d it’s clustermon.sh over to B and did a diff and found no differences.