Double checking on possible hard drive crash

johnjces · July 22, 2016, 6:06pm

Good day!

Early this AM at 0100, 0200 and again at 0300, my system emailed me storage alerts advising that sda2 was 67, 86 and then 96 percent full. Filled pretty quickly! When I logged in via the web GUI, I had a whoops that stated

file_put_contents(): Only 0 of 189 bytes written, possibly out of free disk space

obviously.

I forced an fsck and and no help.

Logging into the CLI, I see a lot of errors in . file_put_contents/var/www/html/admin/libraries/utility.functions.php151

I am pretty certain the drive failed. It still boots but I am not really certain where or how to check and see what actually occurred. FULL log just dies at 0326 hours.

Just asking for confirmation that my drive crashed and any hints to double check or see what file may have filled up. Would love to try and get one more real recent backup… Glad I have a fairly recent backup, however.

Thanks for any tips.

John

dicko · July 22, 2016, 6:37pm

No reason to suspect your hardware , I would look first in the /var/log/asterisk directory.

johnjces · July 22, 2016, 7:05pm

Thanks Dicko! I have rummaged around that entire directory, unmounted sda2, tried an fsck it says;

“The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and…” more info.

Should it not be a screwed up partition, which I believe it is, do you have a specific file or directory I should look under?

Thanks for the help.

John

johnjces · July 22, 2016, 7:07pm

Also used du to and find to search for huge files and found nothing out of the ordinary or “too huge”.

dicko · July 22, 2016, 7:12pm

Well, that DOES suggest a hardware problem. I would replace it and restore your last good backup.

cynjut · July 22, 2016, 7:12pm

The place to check for hardware failures would be /var/log/messages.

If this isn’t a software bug or bad spot on the drive crashing something, the /var/log/messages file should tell you what’s what with your hardware.

If I was a betting man, I’d guess that you’ve got hundreds of core dumps somewhere.

johnjces · July 22, 2016, 7:20pm

Thanks you two! I overlooked the messages file and it is 69,600,468,000 bytes in size! Taking a while to view it. The prior rotation was big compared to the others and a lot of loop exceptions.

Anxious to view the current log!

JJ

cynjut · July 22, 2016, 7:23pm

As they say over in Council Bluffs - “Winner, winner, chicken dinner.”

dicko · July 22, 2016, 7:23pm

No need to wait,

tailf /var/log/messages

johnjces · July 22, 2016, 7:52pm

Ya know, one of these days I will remember the tail commands as I did get tired of waiting. I deleted that gigabyte messages file, touched it, gave it the proper permissions, did another fsck and I am back up and running.

In the messages file I found a million:

Jul 2 03:09:59 jcits kernel: ACPI Exception: AE_AML_INFINITE_LOOP, while evaluating GPE method [L00] (20090903/evgpe-568)
Jul 2 03:10:01 jcits kernel: ACPI Error (psparse-0537): Method parse/execution failed [_SB.PCI0.LPC_.SMBR] (Node f70526e0), AE_AML_INFINITE_LOOP
Jul 2 03:10:01 jcits kernel: ACPI Error (psparse-0537): Method parse/execution failed [_SB_.PCI0.LPC_.INIT] (Node f70526f8), AE_AML_INFINITE_LOOP
Jul 2 03:10:01 jcits kernel: ACPI Error (psparse-0537): Method parse/execution failed [_GPE._L00] (Node f704a938), AE_AML_INFINITE_LOOP
Jul 2 03:10:01 jcits kernel: ACPI Exception: AE_AML_INFINITE_LOOP, while evaluating GPE method [_L00] (20090903/evgpe-568)

and in my web gui I see a critical error, 9 hours ago of

Cron manager encountered 1 Errors
The following commands failed with the listed error
/var/lib/asterisk/bin/module_admin listonline > /dev/null 2>&1 (1)

So, now I am not 100% certain I have a hard drive issue, BUT, I will get one ready.

Any other thoughts?

and thanks a bunch for your help.

John

Overkill · July 22, 2016, 8:36pm

If your drive failed, you’d likely be unable to boot the OS.

What’s your output on the CLI with this command? It seems more like something overflowed and killed your disk space, like a log file or a voicemail if you don’t have time limits set and the call didn’t properly disconnect or something.

df -h

dicko · July 22, 2016, 8:59pm

From what the logs suggest you have flakey hardware, a workaround might be to add

acpi=off

To your kernel arguments in your grub configuration

Be careful with log files, best to " logrotate -f /etc/logrotate.conf " them out if you can (you might need to make some space first to allow the creation of new files) , THEN examine/delete them for their intended diagnostic content, at a pinch just truncate the file

echo -n > /var/log/messages

johnjces · July 22, 2016, 9:32pm

@Overkill, disk usage is now fine after I deleted the 69 gig messages file. I have plenty of free space. Since I deleted, did an fsck, all seems “normal”. The messages file has not grown other than 50kb after each reboot.

@dicko, I will check my grub conf and will force a logrotate and have a “look see”.

JJ

cynjut · July 22, 2016, 9:45pm

He’s dead, Jim.

New power supply and motherboard, I’m guessing, unless you have something connected to one of your comm ports. Disabling ACPI may help, but it’s only a matter of time before what broke breaks something else.

Something took a hard dump in your machine and I don’t predict good things for this mobo. You might also want to look at the power supply, just to make sure it wasn’t a transient voltage spike that just freaked the system out.

johnjces · July 22, 2016, 9:57pm

@cynjut, Dave… I will watch it very closely… A bit of history might make some sense here. I was gone for a week and we had a major power failure. UPS died and of course all my stuff dies suddenly. I get home, fire up the UPSes, they say all charged and I thought I replaced the batteries in the one powering my FreePBX “not that long ago”. I push test… bee…blrtt, dead. Replace batteries and go live again. This the beginning of July. No checking of the disk was done at that time.

I forced a log rotate and everything in the logs, all logs I can dig up seem fine. I am however seeing in freepbx.log the following at the time of failure at 0305 in the AM.

[2016-Jul-22 03:05:04] [WARNING] (libraries/modulefunctions.legacy.php:7) - Depreciated Function module_getinfo detected in /var/www/html/admin/modules/cxpanel/functions.inc.php on line 52
[2016-Jul-22 03:05:04] [WARNING] (libraries/modulefunctions.legacy.php:7) - Depreciated Function module_getinfo detected in /var/www/html/admin/modules/cxpanel/functions.inc.php on line 61
[2016-Jul-22 03:05:11] [INFO] (bin/module_admin:443) - no repos specified, using: [standard,commercial] from last GUI settings
[2016-Jul-22 03:05:11] [INFO] (bin/module_admin:444) -
[2016-Jul-22 03:05:12] [WARNING] (core/functions.inc.php:6609) - Depreciated Function core_users_[2016-Jul-22 12:33:19] [CRITICAL] (admin/bootstrap.php:258) - Connection attmempt to AMI failed

I bunch of lines above say the same thing;
[WARNING] (libraries/modulefunctions.legacy.php:7)…

and several aftyer I got up and running stating:
[CRITICAL] (admin/bootstrap.php:258) - Connection attmempt to AMI failed

(Attempt is misspelled in the log).

Captain, I’m giving her all I can!

Thanks you guys! Any other thoughts are appreciated as I actually like this stuff!

JJ

johnjces · July 22, 2016, 10:06pm

Stupid question, but I add the acpi=off in

/boot/grub/gub.conf

Never played in that file before.

JJ

dicko · July 22, 2016, 10:12pm

well it depends on your OS {debian,redhat} probabbly {/etc/default/grub,/etc/grub2.conf} as to how to set that properly {grub-update,mkinit-blah-blah}

but generally it ends up in the definitive /boot/grub/grub.{cfg,conf}

johnjces · July 22, 2016, 11:27pm

Again, thanks Dicko! I learned quite a bit today!!