Disk i/o errors on FreePBX distro

FreePBX distro 14.0.3.10.
[2018-07-29 08:24:56] WARNING[3102] db.c: Error executing SQL (COMMIT): disk I/O error
[2018-07-29 08:24:56] WARNING[3102] db.c: Error executing SQL (ROLLBACK): cannot rollback - no transaction is active

Immediately after that the call drops.

The system is on a RAID server, and I checked everything I could think of. Raid is green. The host server checks out, disk system says no errors. I can’t find anything in the host hardware logs. I also ran checkdb on the MariaDB databases. No errors found.

I would like to do something like fsck on the partition, but I have not been able to figure out what to do, because FreePBX installs a LVM volume and I’m not familiar with the commands.

This is what my drive looks like.

lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
NAME FSTYPE SIZE MOUNTPOINT LABEL
sda 128G
├─sda1 ext4 2G /boot bootvol
└─sda2 LVM2_member 126G
├─SangomaVG-root xfs 122.1G /
└─SangomaVG-swaplv1 swap 3.9G [SWAP]

I was able to run fsck on /dev/sda1, but that doesn’t tell me what I need to know.
Any suggestion on next steps?

By the way, this system has run flawlessly for years. The problem just cropped up recently.

Hmmm… not to ask the obvious (but sometimes people over look it)
… is the disk getting full?

Disk is at 13%.

It’s a Dell R710 server, and all tests pass. It’s like there is latency in the disk section, but I haven’t been able to prove what is causing it.

OK, let me ask this a different way.
Is it possible to configure or start Asterisk so that it uses RAM for SQL, instead of reading it from disk? I have 100 gigs of RAM in that server.

We are dropping multiple calls a day due to “disk I/O error”, this is really intolerable. And it worked for over 12 years before with no problem. All the way back to early Trixbox. Again, there is no problem on the disks.

I don’t know the answer to that question off the top of my head… BUT, I do know it’s a bad idea.

Without any sort of write-back/write-through, a power flicker would cause havoc on a system running the database in RAM.

does

lvdisplay 

… show you any errors or give any hints as to why you’re having issues?

Here is an article on how to detach, scan, and re-attach LVM volumes.

https://www.linuxtechi.com/fixing-lvm-io-errors/

And here’s one for the underlying file system

https://docs.oracle.com/cd/E37670_01/E37355/html/ol_repair_xfs.html

1 Like

My disk controller has it’s own built in battery. Also, the whole rack is 100% protected on dedicated power. And there’s a generator behind that. It can’t go down.

All these suggestions are great and I’m taking notes. I’ll try all these things when I can get some down-time.

Thanks.

I was excited to try xfs_check, but it does not exist.
xfs_repair is there, but no xfs_check.

Yes, it seems redhat removed it.

I tried xfs_repair too. It just says unable to find superblock, and quits.

Probably time to back up your system and rebuild the disk. I would suggest moving to zfs.

I just found something new in the logs.
Calls are dropping and I missed this key error in the log:

WARNING[6716][C-0000004e] res_rtp_asterisk.c: RTP Read error: Bad file descriptor. Hanging up.

This is the FreePBX distro. I don’t remember any options for file systems. Were there?

If you need to rebuild a disk partition , then theoretically you could make that choice

Oh my GOD.

OK, here’s where I’m at now.
I built an entirely new FreePBX 14 on a new VM.
Then, I did a full backup from my production system and restored it to the new server.
Then I shut everything down. I moved the newly created disk over to the production server and started it up. I thought that would be the end of my problems.

To my surprise, I have a ton of errors now.
One of the main problems is that the new system refuses to use my Letsencrypt certificate from the previous installation. I even deleted the certificate and completely regenerated it. Nope. FreePBX insists on using a local temporary cert that is not trusted. Every page is full of errors because of this.

Second, every time I try to configure anything on the new installation, I get a big red banner on the top of the screen that says UNDEFINED. And I cannot click on “Apply Config”. I just go through a cycle of Apply Config and UNDEFINED errors.

I think i’m down to this one issue now.
FreePBX is not using the correct certificate for TLS.
I’ve tried deleting the certificate and regenerating on from Letsencrypt. It made no difference. FreePBX still serves an invalid certificate.

If I go to /etc/asterisk/keys and run openssl x509 -in mycert.pem -text, it is perfectly valid. The problem is that FreePBX is not using it. It keeps serving an invalid temp certificate from when it the system was in testing.

from bash

updatedb

locate *.pem

they probably should be in the asterisk users home directory or linked there