[FALSE SENSE OF SECURITY] Automatic FreePBX 16 Raid install /boot/efi

Hi there,

During the automatic installation, FREEPBX sets up RAID using two identical drives.

However, the /boot/efi partition is only created on one of them.

The main question is: what happens if the drive containing /boot/efi fails, and how should this situation be handled?


[root@freepbx ~]# df -h
Filesystem                  Size  Used Avail Use% Mounted on
devtmpfs                     32G     0   32G   0% /dev
tmpfs                        32G     0   32G   0% /dev/shm
tmpfs                        32G  234M   32G   1% /run
tmpfs                        32G     0   32G   0% /sys/fs/cgroup
/dev/mapper/SangomaVG-root  421G  7.8G  414G   2% /
/dev/md0                    1.9G   57M  1.8G   4% /boot
/dev/nvme0n1p1              200M   12M  189M   6% /boot/efi
tmpfs                       6.3G     0  6.3G   0% /run/user/0
tmpfs                       6.3G     0  6.3G   0% /run/user/999

This looks like a v16 or earlier install ?

Please try the latest v17 ISO available at Download | FreePBX - Let Freedom Ring – this version solves the problem by utilizing some additional shell scripts to automatically synchronize EFI partition contents as-needed between drives, allowing you to disconnect either disk and boot up your system on the remaining one.

This isn’t a proper answer since v15 and v16 are supported versions still. Even moreso on v16. So the OP needs an answer in regards to their currently supported setup.

Sangoma made a timeline on what they wanted to support so telling them the answer is to upgrade to v17…then just EOL the other versions.

2 Likes

Please consider this a gentle reminder that these are the free community forums for users to help each other out and not the official paid Sangoma support system. Users on v15 and v16 continue to receive paid support from Sangoma.

Where did OP state what version they are using ?

Got me, I might have jumped the gun on the version. Then again, you also didn’t verify before offering a solution.

As you saying, if they are on v15 or v16, this question requires paid support to get an answer?

yes it is FreePBX 16 clean install

From Official Doc:

INSTALLATION OPTIONS: 

    Full Install: This is the usual option. If the computer has two hard drives installed, both drives will be wiped and FreePBX Distro will automatically set up a RAID 1 mirror using the two drives. Having two drives is recommended. as this will allow the PBX to continue normal operation if one of the drives has a failure.

So the question I can’t understand is how the system is supposed to “continue normal operation if one of the drives has a failure.”

Which ISO did you use ? Are you able to test a v16 install in non-EFI mode ?

Either way, the word “normal” should probably be removed, because it is literally a degraded RAID array you’d be running on, so, not “normal”.

the official one
https://downloads.freepbxdistro.org/ISO/SNG7-PBX16-64bit-2302-1.iso

“normal” from doc propably means that it will continue to work – but i suppose it will not

Would you consider it “normal” if once you boot up on the single working drive, the data is recovered fully and your PBX is again taking calls as before the RAID array failure ?

If so, you might try booting into a Linux distro’s rescue mode – using a separate USB disk. Then you could recover the data from the non-failed hard disk that lacks the proper EFI partition, possibly restoring/rebuilding the partition. But caution is warranted: if one disk failed, then how soon for the other especially if they are identical make/model ?

RAID > 0 isn’t designed to cope with old age failures. It’s designed to cope with failures in the period when MTBF is meaningful. That’s generally true of resilience by redundancy.

Thanks for the reply.

I just want to clarify something — I’m not asking this because I currently have a RAID issue, but rather for future reference in case a disk fails.

The reason I bring this up is because I assumed that if RAID1 is configured, and the documentation states that the system should still operate even if one disk fails — that should be true. However, as shown in my df -h output, this is not the case.

So now I’m wondering:

  • Should the documentation be updated to clarify that this isn’t a true RAID1 setup, and that losing the EFI partition on one disk can break the system?
  • Or should a proper ISO be created that truly supports full RAID1, including the /boot/efi partition?

That’s not the case — I’m simply trying to highlight a RAID EFI issue here.

I was responding to the comment that the second disk was likely to fail soon after the first, which wasn’t your comment.

I’m not sure that you can configure a true RAID 1, in that sense, at the the OS level. I think there would need to be hardware or BIOS involvement. Without that, the boot process is not going to see the RAID on the EFI partition. (I suppose it is possible that, with RAID 1, cloning the EFI partition will be enough, but you would definitely have a catch 22 with higher order RAID, as you would need RAID active to boot the OS, but you would need a running OS, to have the RAID available.)

I’m sure that RAID 1 + efi can be created
Here is from another OS install

df -h |grep efi
/dev/md127p4                                                                                               512M  3.0M  509M   1% /boot/efi
lsblk -f |grep /boot/efi
  ├─md127p4                                                                                         vfat              BOOT-ZBHRZC   61CA-5ED5                              /boot/efi
  ├─md127p4                                                                                         vfat              BOOT-ZBHRZC   61CA-5ED5                              /boot/efi

You can do this live if you are feeling adventurous.

Run lsblk or fdisk -l to identify the second drive, probably something like /dev/nvme1n1 based on your df above. It’s likely part of the RAID array for /boot (/dev/md0) and other partitions.
Confirm it doesn’t already have an EFI partition. If it does, note its identifier (e.g., /dev/nvme1n1p1).

Next up you will need 200MB of space… doesn’t look like an issue.

Next up two options, Backup your partition table or go full send. If you don’t FA how do you FO

sfdisk -d /dev/nvme1n1 > /root/nvme1n1-partition-backup.txt

Partition time…

fdisk /dev/nvme1n1
# Press n to create a new partition.
# Select partition number (e.g., 1 if no EFI partition exists).
# Set size to +200M.
# Change type to EFI System (type t, then select 1 or ef).
# Write changes with w.
# New partition something like /dev/nvme1n1p1, adjust accordingly in commands that follow

Format, mount, sync…

mkfs.vfat -F 32 /dev/nvme1n1p1
mkdir /mnt/second_esp
mount /dev/nvme1n1p1 /mnt/second_esp
rsync -av /boot/efi/ /mnt/second_esp/

Now the opposite of mount…

umount /mnt/second_esp
rmdir /mnt/second_esp

Next step is on you, configure uefi in the bios, Probably a boot menu somewhere. Add a new boot entry pointing to the second drive’s EFI partition (/dev/nvme1n1p1). Specify the bootloader path, typically /EFI/grub/grubx64.efi? (match the structure in /boot/efi).

YOLO time

Reboot and select the second boot entry from the UEFI boot menu to verify it works. If successful, the system should boot normally.

YOLO part 2

Power off unplug a drive… see comment about the scientific method above mentioned with backing up…

Finally…
You will need a script/cron or some other fancyness to keep them in sync like rsync -av --delete /boot/efi/ /mnt/second_esp/ you need to mount etc too…

1 Like

I’d rather not experiment :slightly_smiling_face: — most likely all this magic would break things, since the entire disk space was allocated by the FreePBX automatic installer.

[root@freepbx ~]# parted /dev/nvme1n1 print
Model: NVMe Device (nvme)
Disk /dev/nvme1n1: 512GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2098MB  2097MB                     raid
 2      2098MB  512GB   510GB                      raid

[root@freepbx ~]# parted /dev/nvme0n1 print
Model: NVMe Device (nvme)
Disk /dev/nvme0n1: 512GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name                  Flags
 1      1049kB  211MB   210MB   fat16        EFI System Partition  boot
 2      211MB   2308MB  2097MB                                     raid
 3      2308MB  512GB   510GB                                      raid

[root@freepbx ~]#

The reason I bring this up is because I assumed that if RAID1 is configured from autoinstall, and the documentation states that the system should still operate even if one disk fails — that should be true. However, as shown in my df -h output, this is not the case.