Another upgrade, another debacle

FreerPBXer · December 5, 2024, 2:05am

Needed to go from a physical PC to a VM this time, so the ‘restore from backup’ procedure was the only real option.

Issues:

Sangoma Realtime API ended up broken and could not be installed/fixed from the GUI.
SangomaConnect ended up not installed, somehow.
SIPStation trunks showed as registered with SIP Ping “OK”, but calls in dropped without ringing.
The backup config is reset to nothing, even though it was configured and working.
“FreePBX Statistics” on the dashboard is just blank; the graph is entirely missing.

At this point, hours later, only the last issue remains.

Every time we go into one of these, we have to allow three times as much time as the formal procedure takes, with 2/3 of that being to fix stuff that the upgrade process broke. And while there are some common issues, there is also something new that breaks every time.

This is a frustrating, frankly frightening process. And it’s been broken like this for years; literally the ‘restore from backup’ option is years old now and has never worked right once for us. It’s so consistently bad that we usually opt for the upgrade process instead. I don’t know how we are going to get systems to v17.

kgupta · December 5, 2024, 4:18am

Hello @FreerPBXer

Is it possible for you to raise support ticket against any commercial module and provide ssh to your system so I can login to your system to check/understand what is going on?

also, regarding below also , as B&R is the only way to upgrading to v17 so interested to know more details about this also, like what kind of issues you are facing etc.

FreerPBXer · December 5, 2024, 4:26pm

No. We do not allow unattended remote access to systems we manage. The last time we did this for Sangoma Support changes were made to the system despite and agreement that would not happen. That trust is broken for the foreseeable future.

On top of the issues noted above, the handsets cannot place calls “Please try your call again later”.

They can receive calls, audio works, etc. But outbound calls don’t work.

We’ve rebuilt configs, updated firmware, reset the phones to defaults. No success.

You have to be better than this.

FreerPBXer · December 5, 2024, 4:46pm

After the restore, inbound calls didn’t work either. I removed the SIPStation trunks and re-added them using the Keycode. That fixed inbound calling.

What I just found is that while the standard outbound route was configured for the SIPStation trunks, the routes were disabled from using the SIPStation gateways. All were set to “No”.

I’ve enabled them and calls seem to be going out. We are remote and no one is in the office, so I’m not 100% certain it’s working; I have no way to confirm audio, just that Zoiper can make outbound calls now.

This puts us in a very bad situation with customers to whom we’ve sold this product; we tell them an upgrade takes an hour or so, then it takes more (four hours on this now), plus they’ve had an outage. I’d feel differently if this was a rare thing, but it is literally EVERY time. I cannot fathom how you’ve tested this and found it to entirely work.

jtbayly · December 11, 2024, 1:47pm

It’s scary because I’ve not been around a long time, but I’ve tried twice to upgrade from 16 to 17. Both attempts failed utterly, but I don’t want to remain on 16, as I understand the OS it is running on is EOL.

BlazeStudios · December 11, 2024, 2:41pm

Have you opened any topics on your updates failing? Provide any details on it?

franckdanard · December 11, 2024, 3:00pm

Well, version 16 has been end of life for a few months, maybe, but version 16 works fine.
I know version 17 looks better, but it’s new, and like most people, you’re eager to upgrade to version 17 because it’s new.

When it’s new, that means it may also have bugs. So if you find any, open a ticket to fix them.
If the support team and development team don’t know there’s a bug, they certainly can’t fix it.

It’s true as every else, Looking at Windows, I imagine when Windows 12 will be out, there will be some bugs too.

For my part, if I’m still on 16 and I’m testing 17 on another server, I’ll wait to migrate to it for a while.

jtbayly · December 11, 2024, 3:11pm

Is the OS still receiving security updates, though? I don’t have any desire really to move to 17. My system is up and stable. My concern is the OS potentially being compromised.

As for whether I have reported my issues, yes, here and then my second attempt here.

The advice I got was to manually reconfigure the whole system on 17, rather than attempting to restore from backup, presumably because backup and migrate is not reliable.

BlazeStudios · December 11, 2024, 3:15pm

No, v15 is EOL. The rule is that the last two current releases are supported, therefore v16 and v17 are both supported versions of FreePBX.

franckdanard · December 11, 2024, 3:21pm

On github you can open an issue on the Backup/Restore module and the team can take a look and check if there is a problem or not.
Writing an article here is a good start, but sometimes you need to post your issue somewhere else. Usually on github.

When I was working with the development team and there was an issue like yours, I used to use a backup file to check what is going on and fix the issue if I found one.

franckdanard · December 11, 2024, 3:22pm

Yes, thanks to correcting me.

Charles_Darwin · December 11, 2024, 4:04pm

Are you asking for help? Really? @kgupta offered to help…and what was your response? Holy moly

FreerPBXer · December 11, 2024, 4:20pm

I have, literally, reported problems with the upgrade processes for a decade. I stopped flogging myself with bug reports because the treatment was consistently “if you can’t prove it, it doesn’t exist”, and the reports didn’t result in fixes. Add to that the current demand of unattended remote access to the system, and it’s a flat no-go.

My posts here are to help others dig out of the same messes and have realistic expectations of the process.

Either upgrade process has 100% resulted in issues with the upgraded system, across dozens of upgrades that we’ve done. Given that, I struggle to believe that internal testing consistently results in fully functional upgraded systems.

BlazeStudios · December 11, 2024, 5:34pm

I was asking the person that did the “me too” on this topic. I know you have submitted tickets

BlazeStudios · December 11, 2024, 5:36pm

Out of curiosity, have you done partial backups to see if there are failures? Such as just moving extensions or voicemail, etc.

franckdanard · December 11, 2024, 5:52pm

I agree with you regardind the result. Sometimes it doesn’t work. That means there is something which is messe up somewhere. (any specific system, config, module or whatever).
Anyway, if you don’t want allow a remote access for support, how do you think the team could fixe your issue?

So, ok you don’t want use the VPN. In this case, you can use a rebound server to see what the technician does.

Just an idea like that.

FreerPBXer · December 11, 2024, 6:39pm

The ongoing problem is that support uses that access to make changes without approval. Most recently I went to great pains to clearly state and get agreement that they would not do this. Then they did anyway. So now we cannot allow unattended access.

Besides that, it’s not a specific instance that is the problem. The problem is that both upgrade procedures consistently result in partially broken systems. It’s not a one-off problem and it’s not a problem that gets fixed by remoting into one system.

So either fix the upgrade scripts/processes, or clearly state what is known to break and how to fix. Either would be better than indicating that it all works great, causing admins major headaches when it doesn’t.

franckdanard · December 11, 2024, 8:03pm

Yeah.

Usually the codes are executed like this.
They save the data in a tarball file and all are compressed to create an archive.
Then the inversion is executed to restore this data as before.
But, sometimes, if there is new data in a new version and the part of the code is not updated, then you can end up in this kind of situation.
It may be that there is only one module that breaks everything.

Regarding support. It’s not cool to apply changes without talking to a customer. I agree with you.
If you want to keep control of the situation, a bounce server is useful.

I worked in the support team for “Alcatel”, I never did this kind of stuff in past. Unthinkable.

Then best way to know where the issue comes from is to save a part of data, maybe 10 first modules, and 10 by 10. And restore the backup one by one till replicate the issue and find it.
Not easy to find an issue with a global backup.

FreerPBXer · December 18, 2024, 12:18am

It’s not sometimes. It breaks something EVERY time with either procedure in our experience.

I’ve just discovered that a system we upgraded in November is now failing to send some faxes, where it was working before the ‘upgrade’.

franckdanard · December 18, 2024, 7:25am

Well. If you have enabled EDGE modules in advanced settings, you might end up in this situation. Because EDGE doesn’t mean Stable.
EDGE is used only for testing.
Check if this option is enabled or not.

Say, it breaks something every time, I think you are painting a dark picture. I’m using a system for many years, no major problem. Maybe one ot two, that’s all.

If you really had a problem, it can be on a specific module and not all. If the problem is still present, you’ve got lot of somutions; Create a ticket or declare an issue on github. Contact the support team. Contact Kapil here too…Usually, @kgupta is the good person for helping us here.
But please, stop to say, nothing work on FreePBX, I don’t believe you. I can believe there is one module having problems and which is not fixed yet., Yes. Perhaps It could be the case for Fax module or Fax Pro