Advanced recovery was working fine but has stopped

Struggling here.
Adv Rec was OK on FreePBX 16.0.40.13, no email being sent to me about failure, but the service has stopped on the primary while the heartbeat is still active and verified on the secondary.
The adv_recovery.log on the primary reads:

2025-05-09 11:28:33 - 1746786672213 running on primary and status of standby =dead
2025-05-09 11:28:33 - checkweReachedThresholdTime ? 1
2025-05-09 11:28:33 - primary-> checking last notification time (> 1800) found entries 1
2025-05-09 11:28:43 - Self system status {“asterisk”:“OK”}
2025-05-09 11:29:22 - Self system status {“asterisk”:“OK”}
2025-05-09 11:30:00 - Self system status {“asterisk”:“OK”}
2025-05-09 11:30:38 - Self system status {“asterisk”:“OK”}
2025-05-09 11:31:17 - Self system status {“asterisk”:“OK”}
2025-05-09 11:31:18 - Advance Recovery Error There is some error in Standby Server
2025-05-09 11:31:18 - Found one of the critical service not running on Pair server, going to declare pair server down
2025-05-09 11:31:18 - Pair server return status =Array
(
[mysql] => OK
[http] => OK
)

And the adv_recovery.log on the secondary reads:

2025-05-09 11:28:05 - 1739602052681 running on standby and status of primary =alive
2025-05-09 11:28:05 - Pair server is alive now 1739602052681
2025-05-09 11:28:43 - Pair server is alive
2025-05-09 11:28:43 - 1739602052681 running on standby and status of primary =alive
2025-05-09 11:28:43 - Pair server is alive now 1739602052681
2025-05-09 11:29:22 - Pair server is alive
2025-05-09 11:29:22 - 1739602052681 running on standby and status of primary =alive
2025-05-09 11:29:22 - Pair server is alive now 1739602052681

In the advrecovery_out.log on the primary reads: (there is no advrecovery_out.log on the secondary)

1 2024-01-22 10:35 +00:00: Whoops\Exception\ErrorException: Illegal string offset ‘restorestatus’ in file /var/www/html/admin/modules/adv_recovery/functions.inc/functions.inc.php on line 92
2 2024-01-22 10:35 +00:00: Stack trace:
3 2024-01-22 10:35 +00:00: 1. Whoops\Exception\ErrorException->() /var/www/html/admin/modules/adv_recovery/functions.inc/functions.inc.php:92
4 2024-01-22 10:35 +00:00: 2. Whoops\Run->handleError() /var/www/html/admin/modules/adv_recovery/functions.inc/functions.inc.php:92
5 2024-01-22 10:35 +00:00: 3. advr_service_daemon() /var/www/html/admin/modules/adv_recovery/adv_recovery_service.php:19
6 2025-05-07 14:57 +01:00: Whoops\Exception\ErrorException: Illegal string offset ‘restorestatus’ in file /var/www/html/admin/modules/adv_recovery/functions.inc/functions.inc.php on line 92
7 2025-05-07 14:57 +01:00: Stack trace:
8 2025-05-07 14:57 +01:00: 1. Whoops\Exception\ErrorException->() /var/www/html/admin/modules/adv_recovery/functions.inc/functions.inc.php:92
9 2025-05-07 14:57 +01:00: 2. Whoops\Run->handleError() /var/www/html/admin/modules/adv_recovery/functions.inc/functions.inc.php:92
10 2025-05-07 14:57 +01:00: 3. advr_service_daemon() /var/www/html/admin/modules/adv_recovery/adv_recovery_service.php:19

7th May is when I noticed something was wrong while on in the server GUI. When I try to start the service on the primary it eventually fails and I do get email alerts. Access to the AdvRec GUI page is incredibly slow.

Anyone got a pointer or two that might help out here?
Thank you.
Nathan.

If I select the button on the primary to stop the service I get an error

I’m also unable to ssh to the secondary from the primary with the command
ssh -i /home/asterisk/.ssh/id_rsa root@SecondaryServerIP

If I attempt to re-run the command to copy the ssh key from the primary to the secondary I get the same timeout as if running the above command to just log in.
sudo -u asterisk ssh-copy-id -i /home/asterisk/.ssh/id_rsa.pub root@SecondaryServerIP

/bin/ssh-copy-id: ERROR: ssh: connect to host 10.42.18.96 port 22: Connection timed out

The servers are on the same subnet, with adjacent IP addresses 10.42.18.97 (pri) and 10.42.18.96 (sec)

I’ve rebooted the secondary, but that had no effect.
Rebooting the primary is not something I can do during regular hours.

the default cert on the primary has expired. I let it expire because I have a CA signed cert on there. Is the default cert required to be in-date for Adv Rec to work?

Turns out the firewall entries in the secondary server had changed.
The entry on the secondary that allowed the primary server to connect was showing its own IP address instead of that of the primary server.
image

I can’t understand how that came to be the case, but it’s now sorted and the ssh login via the key is working again. I’ve used the Stop button on both the primary and the secondary and then started the service from the primary.

Seem to have a restore transaction going on presently with no status. Will monitor that to see how it pans out over the next hour or so.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.