Disappearing Modules

We have encountered a strange problem. First we are running FreePbx 2.8.0.3 upgraded from 2.6.

We have a problem where intermittently all the modules except the core modules that are enabled by default to run FreePbx will dissappear, and freepbx will treat them as not installed. To re-instate them we simply need to go back to the module manager and toggle “Check For Updates” and all of the modules will suddenly re-appear fully installed.

Now even though the fix is fairly simple, several system operators access the system at any given time. If one of the sysops applies changes when the modules are not enabled the system will break. As these are production systems this can be a catastrophe.

We have as of yet been unable to find what triggers this event, or why it happens. We are vigorously investigating but I thought I would throw it out to the experts to point us in the right direction.

Thanks In Advance

bcmike,

I have never heard of all the modules simply disappearing.

The one thing that raises the alarm bells from your note is that you have multiple admins accessing and making changes at the same time.

FreePBX is not really safe to be access by multiple admins at the same time. Although 99.99% of the time it is probably pretty safe, I am positive that there are corruptions that could occur in such circumstances.

Whether or not that is the cause, the only way to determine what is going on with your systems would be to have someone very familiar with the internals of FreePBX and the DB schema and expectations to have a look at the system at a point where it gets into one of these corrupted states.

If you want a place to start, it’s probably the module_getinfo() call which returns a large array of module information as to what is installed and not, disabled and note, etc. This information comes from a couple places. The modules table in SQL contains some of it, but most of the information comes from a scan of the module.xml files that each module includes. These are only scanned when you go into Module Admin since that is the only place where changes can be made (whether the GUI or the CLI version). Once scanned, the data is serialized and cached in the SQL db. Each time you do a page load, the serialized array is used (which is a significant performance boost to rescanning every module directory for a modules.xml file and then parsing it into an array).

That’s a basic background and starting point to dig deeper into what is going on if you want. Of course you are always welcome to engage one of the more knowledgeable developers through the paid support system if you are so inclined but as mentioned, you would have to have the system in the broken state for someone to start poking around and see what might be broken internally.

the data is not stored in a session. The online data that is pulled down is cached in the database (and in fact, if you continually check for updates online, it will not keep going back to the internet until the cached data has expired, which is 10 minutes if I recall correctly).

However - that is not what is used to see if you have a newer version.

The way this is checked is to compare what is actually on your system (e.g. what we find in the module.xml files in the module directories - or more appropriately in the cached serialized copy) with what is listed in the database in the modules table, which represents both what we believe was last installed and the state of that module (disabled, etc.).

Each time you navigate in FreePBX it does this check and if it finds a discrepancy between what is in the modules table in the database and what it thinks is in the actual module.xml file (again, from the cached information in the serialized copy) then it will disable the module.

The only way that you should be able to fool this is to go mucking around in the actual modules directory making changes or exploding tarballs with new module versions etc, bypassing Module Admin and thus bi-passing the cached data being brought current. However, even in that case, this should not happen because the cached data should be consistent with what is in the modules table since the only thing that should change the actual version numbers in that table is installing a module. Where as the only thing that should change what’s in the module.xml file, is actually mucking around in there (or installing a new module).

data…

Because we can see that data not changing - even when the problem is seen.

But the cached module info that is referenced seems to get some modules set to “3” / MODULE_STATUS_NEEDUPGRADE somehow. Because it uses this cache that has been already unserialized the problem persists once it’s happened.

I’ve looked at the code and the only line I can find assigning a status like that is here:
// check if file and registered versions are the same
// version_compare returns 0 if no difference
if (version_compare_freepbx($row[‘version’], $modules[ $row[‘modulename’] ][‘version’]) == 0) {
$modules[ $row[‘modulename’] ][‘status’] = MODULE_STATUS_ENABLED;
} else {
$modules[ $row[‘modulename’] ][‘status’] = MODULE_STATUS_NEEDUPGRADE;

I’ve seen some people reporting bugs to do with the underlying version compare itself, so I added some logging here to see if when the problem is trigged these line are somehow executing.

I’m not sure but annecdotally I think it seems to occur when we least expect it - ie after the session has been idle for a while - maybe the unserialized data is stored as a session variable and we’re stumbling into a php bug.

We only had the problem on a single system until the systems were identical - and now that they are in sync both do the same thing…

Thanks for your continued insight.

you can always file a bug report, however if it can’t be reproduced, as I suspect is the case here, it will probably be closed.

I still think it’s unlikely that this is a bug for the sole reason that this code has been in place since some time around 2.5 if I recall correctly (the caching of the serialized data). From my experience, if this was a bug, even an odd ball bug, there would have been signs of it by now.

That of course does not mean it can’t be a bug. I’m just suggesting, from my experience of interpreting of tracking the forums and tickets which included ‘odd ball’ behavior, that I suspect there would have been other signs of it by now.

The $modulelist->invalidate() simply tell the function to invalidate the cached data and go read all the module.xml files again. If you are tracing it back to possibly corrupted data, then I would instrument the part of the code where the cached data is both saved in its serialized form, and read back when called upon. You can log all that information to a file and then when you run into the issue you should have some idea of where/how it got corrupted.

You may even want to go as far as to turn on mysql logging which will log every single query that is hitting the database. Then when you run into the issue, you can see what MySQL is actually seeing.

Doing the above will go a lot further to providing information in a bug report that someone might be able to act upon if you do find out there is some odd ball bug lurking in the code.

Does this qualify as a bug report. There seems to be something wrong in the code.

inside functions.inc.php
inside fucntion_module_getinfo…

We did add a lot of error_log calls, and tried to understand what this function does… it seems when we have our problems, all non-core modules seem to have status 3 (MODULE_STATUS_NEEDUPGRADE). But they do not show this status in the unserialized XML or in the modules data table.

In this function (which seems to be called every few seconds while you are viewing the status page) we see this bit of code:

            if ($forceload) {
                    $modulelist->invalidate();
            }

On a hunch I commented out the if and close block so this modulelist is always invalidated, forcing the code to look at the read values in DB and in the file system. It rebuilds properly, re-enabling the modules - which I suspect is what the “check for upgrades” has been doing for us.

Can you help us find how this function or another function could be mistakenly setting these modules in the cache structure to have this incorrect status?

Any more pointers greatly appreciated.

Thanks!

m/

I see:

The following modules are disabled because they need to be upgraded:
irc, findmefollow, directory, phonebook, javassh, framework, ringgroups, fw_ari, music, backup, recordings, core, featurecodeadmin, callforward, customappsreg, timeconditions, disa

You should go to the module admin page to fix these.
Added 0 minutes ago
(freepbx.modules_disabled)

Any ideas?

So here’s the thing - I started logging the contents of the database info you pointed us to - and when the problem happens, this information does NOT change.

But it’s now happened again, and I noticed when I click on “FreePBX System Status” it shows 17 disabled modules - all I did to cause this was click on the extensions menu - many of the left hand menu’s disappeared. My next click was the system status - and I can repeatedly click this and see that there are modules disabled.

So what next? I know how I can fix this - I can check for updates which will cause it to read / unserialize from the DB - which will work - but where is this data stored when it’s unserialized? That seems to be where this is getting broke… I’d like to check the values there - is there a temporary file or something that stores module status?

Thanks again for the help!

As long as I can avoid applying any update to this system I can leave it broken. Hopefully we will know what to look for before then. It has started to happen to a second server now since we updated it to current status (it was a little behind before) so although it may be unique to something we are doing, it doesn’t seem to be related to the hardware of the first system any more.

Thanks!

"Maybe what we could ask is if there is a way or where we would try to intercept what it thinks should be enabled?

Or where in the db we should look for this serialized information - what if it was missing the result of the query from the DB - maybe it would default to “no modules” and keep on running?

We need to know where that code is and how to add some logging around it. I think that makes sense."

I know we are edging into paid support territory , but maybe one more nudge in the right direction might do it.

Thanks Again for your help!

“it’s complicated and messy”

config.php is the main file that orchestrates everything, though some of it indirectly happens in the freepbx_admin.php view (or maybe it is freepbx.php view, they work together).

Beyond that, you’ll have to look through the code to try and find where you would intercept it because anyone tasked with this will do the same as I don’t think any of us have it all memorized off the top of our heads, give it is not a very common place where we have to go to do anything these days.

I think I may have overstated the part about multiple system operators, there are three of us that make occasional changes throughout the day, but if the modules break and one person doesn’t know and applies changes… disaster!!!

Thank you for the info we will dig deeper.

One possible workaround I would like to run past you; until we get to the root of the problem could we safely add the following directives to module.xml of the modules we need enabled:

no
no

I guess the bigger question is would they get respected?

Thanks again for the help!

I was talking about these Directives:

cnadisable
canuninstall

the only thing that does is to hide the disable and uninstall options in the GUI, purely in javascript - they are not even respected (by design) if you use the CLI version of module_admin.

Those certainly don’t have anything to do with your issue (unless you have some rogue colleague messing with your head).

This time only time conditions and DISA broke.

Can someone give us a brief primer on how modules are enumerated and applied, that might point us in the right direction.

Also I’m not sure if this should make a difference but I have “extended repository” choosen the drop down on the module administration page.

Thanks

I checked and the modules are asterisk:asterisk which has been the case in the past.
We still can’t find what triggers the event. Thanks for the info, I will reply if we find a cause/solution

Thanks

Sounds like you have some file system problem and/or a database problem.

FreePBX scans the directory where the modules are and looks for the file module.xml, then it check in the database for the same, if these differs, then there is an upgrade or a downgrade of the module.

Are you sure that all directories for FreePBX are owned by apache?

Is this a distro of some sort?