Page 1 of 2

RESOLVED: HDA went offline

Posted: Tue Mar 03, 2015 6:32 am
by kikkegek
he guys,

I have been getting messages recently that one of my storagepool drives was going missing and then 3 minutes later get another email that is back online again.

So I payed no attention to it, because I also didnt get any SMART errors during boot.

but now my HDA has went completely offline.

I hooked up a monitor and took some photos of the errors I found in the log.

hope somebody can help, I seriously have no clue where to start looking:

Image

Image

Image

Re: HDA went offline

Posted: Tue Mar 03, 2015 3:28 pm
by bigfoot65
Could this be a hardware failure? I would think it's highly possible.

It might be a drive going bad. If so, you might want to disconnect it and comment out in /etc/fstab. Then reboot and see if anything changes.

Re: HDA went offline

Posted: Wed Mar 04, 2015 12:53 am
by kikkegek
hi Bigfoot65,

I have checked all drives with "smartctl" and found no pending sectors or relocated sectors.

ATA5 seems to be my OCX VERTEX SSD disc, that I use as a system disc.

SMART INFO of SSD:

Image

and boot ATA5 errors

Image

info from dmesg
[jochen@localhost ~]$ dmesg | grep ata5
[ 2.430644] ata5: SATA max UDMA/100 mmio m512@0x90100000 tf 0x90100080 irq 21
[ 2.735088] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2.754292] ata5.00: ATA-8: OCZ-VERTEX2, 1.37, max UDMA/133
[ 2.754301] ata5.00: 78161328 sectors, multi 16: LBA48 NCQ (depth 0/32)
[ 2.776294] ata5.00: configured for UDMA/100
[ 4.614994] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 4.615031] ata5.00: BMDMA2 stat 0x686d2009
[ 4.615041] ata5.00: failed command: READ DMA
[ 4.615055] ata5.00: cmd c8/00:00:60:1a:8f/00:00:00:00:00/e0 tag 0 dma 131072 in
[ 4.615062] ata5.00: status: { DRDY ERR }
[ 4.615066] ata5.00: error: { ABRT }
[ 4.656294] ata5.00: configured for UDMA/100
I can not comment this disc out, because its the system disk...and from the smart info it looks like there is no problems with it.

DISK2 in my greyhole pool has been moving in and out by itself a couple of times recently (got root-messages by email about it), but no errors of the SSD disc.

I have no clue where to start my troubleshooting

Re: HDA went offline

Posted: Wed Mar 04, 2015 11:24 am
by kikkegek
another day, another chance at solving this mistery.

I booted the HDA again today and its stuck at this screen:

Image

Re: HDA went offline

Posted: Wed Mar 04, 2015 2:52 pm
by bigfoot65
So your answer is near the bottom. The drive you have mounted as drive3 is stopping the boot sequence from completing. That is not your SDA OS disk, so must be a Greyhole drive.

Comment out that disk in /etc/fstab, disconnect that drive, and then try booting it again. You will most likely have to boot the HDA using a Live CD to edit the file unless somehow you can miraculously access the HDA via SSH.

Make sense?

Re: HDA went offline

Posted: Wed Mar 04, 2015 3:00 pm
by kikkegek
Did that and then it complains about disk2

And the READ DMA ERROR is about ata5 with is the ssd for the OS.

Re: HDA went offline

Posted: Wed Mar 04, 2015 3:12 pm
by bigfoot65
Ah. Could be the disk controller is going bad. There are really so many variables with problems like this that it's sometimes hard to pinpoint the real problem.

I would recommend you remove the one disk and see if the OS drive still has problems.

I really do not recommend using SSD for a server. They are designed more for a desktop. One downfall with using them on servers is Linux often performs numerous read/writes and can shorten the SSD life quite rapidly. Very little benefit for the cost in my opinion.

If you have an enterprise grade SSD, it might be beneficial. However, they typically are very expensive. They are designed for all the pounding a server OS generates over time. Most of what we can afford today are not though.

Re: HDA went offline

Posted: Wed Mar 04, 2015 3:22 pm
by kikkegek
Will try tomorrow.

My SSD is fine. Check out the
ssd_life_left
it has 100% left after one year of continuous operation. It's got only the OS.

Re: HDA went offline

Posted: Wed Mar 04, 2015 3:39 pm
by bigfoot65
Ok sounds good. I hope that fixes your issue.

I did not mean the SSD was bad necessarily, but it could be the hardware.

The real test will be when that other drive is eliminated. If things do not return to normal, then we can diagnose further.

Re: HDA went offline

Posted: Thu Mar 05, 2015 2:24 am
by kikkegek
started the HDA this morning and getting again new errors during boot

Image

thinking this might be the SATA controller I use for the OS disc (SSD) and the newest 2TB regular HDD.

I use a extra sata controller, because my mainboard only offers two SATA connections and I run 3 HDDs and one SDD.