I am currently using FreePBX 16.0.40.8 (standard FreePBX distro). It works fine.
It is running on VM. The VM is backed-up every week by stopping the VM, making a snapshot and then starting it.
I noticed, that sometimes Asterisk is not running after the machine is brought back. The quick fix was to reboot the machine again, but this is obviously not a solution.
The /var/log/messages shows:
Aug 17 02:21:51 pbx php: Unparseable output from getservices - ["Exception: Asterisk is not connected in file \/var\/www\/html\/admin\/libraries\/php-asmanager.php on line 248","Stack t
race:"," 1. Exception->() \/var\/www\/html\/admin\/libraries\/php-asmanager.php:248"," 2. AGI_AsteriskManager->send_request() \/var\/www\/html\/admin\/modules\/firewall\/Smart.class.p
hp:509"," 3. FreePBX\\modules\\Firewall\\Smart->getPjsipContacts() \/var\/www\/html\/admin\/modules\/firewall\/Smart.class.php:499"," 4. FreePBX\\modules\\Firewall\\Smart->getRegistrations() \/var\/www\/html\/admin\/modules\/firewall\/Smart.class.php:89"," 5. FreePBX\\modules\\Firewall\\Smart->getAllPorts() \/var\/www\/html\/admin\/modules\/firewall\/Firewall.class.php:2209"," 6. FreePBX\\modules\\Firewall->getSmartPorts() \/var\/www\/html\/admin\/modules\/firewall\/bin\/getservices:22"] - returned 1
I’ve got the same in firewall.log.
I guess it prevents the asterisk from being launched, but it can be started if I reboot the VM, so it is not permanent.
I haven’t tried yet to manually start the asterisk from fwconsole.
Any clue what the root cause may be?
I think it showed itelf a few weeks ago, maybe, as I run Freepbx 16 for a few years now (this machine is like 2 months old).
It is QEMU/KVM under Proxmox VE.
Well, it does not need stopping the VM per se.
It is just that making a machine backup in Stopped mode should be safer for database than making a backup in Snapshot mode. At least this is what people say.
I can switch it to Snapshot, but this shouldn’t relate.
Not sure if it is related to the ssh service as 22 in the above file is a line that getSmartPorts() resides in.
I’ll try to make some more digging the next time it happens.
From reading numerous posts I can see that there were a few issues mentioning the getSmartHost() function in different versions of FreePBX.
You should only be backing up the VM when you make a change to the OS that FreePBX is running on. Which shouldn’t be happening weekly.
And when you do back it up, shut it down and off. Then make a copy of it. Then start it up.
You should be making a regular backup from within FreePBX using the FreePBX backup. Also back up any custom system prompts you might have recorded.
A mistake that a great many people make with VM’s is not understanding that a backup of the hypervisor of the VM is a separate backup from backing up data within the VM. These are 2 separate things and backup schemes need to be different for each of them.
I have some legacy VM’s that literally have image backups of the VM that are 7 years old and have never been re-imaged since. However the data in the VM is backed up nightly.
Would you really want to restore an entire VM just to copy ONE file out of it?
I always used proxmox with zfs storage for that reason, in which case copies are done readonly and finish almost instantaneously, it is never too late to add zfs storage with the current price of ssd’s. I never had a problem with many dozens of Servers of various flavors.
A Caveat, I no longer have a need for hardware VM’s so my experience was based on Prosmox 3.something hosting many dozen Asterisk servers of different flavor.
FreePBX VM is working in the HA cluster. So it can be live migrated, snapshoted, etc. the way all other VMs are. Same with the backup.
When the backup is done, the VM is shutdown, copied and started again.
I described this process in order to give better picture of the environment in the context the issue happens.
This is the process that is really not relevant here as the data backup within the VM is a separate one that does not influence the behavior of the FreePBX VM instance.
I have it on ZFS, as it allows me to migrate the VM between Proxmox nodes.
As I mentioned earlier, I don’t think that the VM setup is causing the issue as the Asterisk itself is a child thing to the all above hypervisor stuff.
The relation is only that once a week the VM is stopped and then started. After it is started, sometimes Asterisk does not start as it should.
But still the issue happens, so I am looking for the way to find its root cause.
Of course, I could implement some cron script that will be checking if the Asterisk process is running and start it (I have a similar one that restarts FOP2 every day to make sure it works), but I would prefer to fix it than workaround it
Frankly speaking, at the beginning I discarded the possibility that it has anything to do with the virtualization. But since you mention it, I’ll clone a test machine to check that as well.
The point you are either missing or ignoring is you DON’T need to make weekly backups of the VM off the hypervisor because you are already backing up the data inside the VM (that is, you said you were, later)
You only need to make a backup of the VM if for example you do an update of the operating used in the VM or an update to FreePBX’s modules, etc.
And, for a proxmox server that isn’t under a contract with proxmox, I’d live with it and just put it down to some quirk or other of either the Centos or Debian VM instance you are running.
Ok, I understand your point now.
But what I can’t grasp is the purpose of not having scheduled backups of the VM. It may be unneeded, but shouldn’t cause anything inside the VM.
If there’s no other option, I will have to live with it.
But at the moment it is hard for me to understand why this would be caused by snapshoting (or stoping, backing-up, starting, whatever is done to it) the VM if the guest filesystem shouldn’t be touched during that operation.
I agree that in theory, it should not. The entire purpose of virtualization is isolation - the guest shouldn’t “know” it’s being virtualized.
Except, I have to wonder if you are running FreePBX 16 - then you are running the distro - the sangoma OS/Centos 7 platform - so is there possibly an -old- version of the quemu-tools loaded and running in the guest OS that isn’t completely talking properly to the Proxomix hypervisor?
I haven’t studied how virtual guest tools communicate with a hypervisor - the reality is, that in a completely “perfect” hypervisor not only SHOULDN’T the guest OS “know” it’s running virtualized, it should be -impossible- for it to find out. Clearly, the designers of hypervisors, be they Microsoft or VMWare or quemu or whoever - “bend the rules” and create hypervisors that are imperfect - that have a “crack” of some kind that the hypervisor guest tools can commuicate through. This is understandable since there’s useful and valuable information that can be communicated back and forth from the hypervisor to the guest OS - but it ALSO is a possible point of incompatability.
If you had a support contract with Proxomox you could ask them - but I suspect they would not be able to assist because the Centos7 that is part of the FreePBX distro is a modified distro and my guess is that it’s NOT been “certified” by Proxomox even if they have in fact certified Centos7.
If you don’t want to hack it like I suggested, the other option is moving to FreePBX 17 running under Debian 12 which would have a more modern quemu tools available. And that move is NOT really very difficult at all. If you have a vanilla FPBX 16 system it’s a snap, if you have modified the system greetings then you will need to save off your modifications but it’s still pretty easy - not a whole lot has changed in the Asterisk filesystem layout, they didn’t radically reverse course with Asterisk 21 or FreePBX 17.
Many thanks for you comments. I’m always finding the interesting side-discussion very inspiring to get a different angle of view.
At the moment I don’t plan to jump into Debian based installation (although I have already another one in home) as I already migrated FreePBX16 → FreePBX16 half a year ago due to some other issue that I was chasing. I prefer to wait a bit longer until FreePBX 17 will become a bit more mature
I am testing it right now on a copy next to the production one. After 3 days of restarts I haven’t noticed the behavior yet, which probably means that there is still something missing in the test environment. I’ll need to wait until it happens again on the production and then try to investigate more.
If any new findings, I’ll comment on it, hopefully before the topic is closed automatically.
So, after a month of restarting, stopping and backing-up the test machine - the issue did not appear (at least the script checking the connection always claimed that the Asterisk is running).
I’ll need to wait till it happens on the production then.