RESOLVED: HDA went offline

kikkegek
Posts: 341
Joined: Sun Jul 31, 2011 9:28 am

RESOLVED: HDA went offline

Postby kikkegek » Tue Mar 03, 2015 6:32 am

he guys,

I have been getting messages recently that one of my storagepool drives was going missing and then 3 minutes later get another email that is back online again.

So I payed no attention to it, because I also didnt get any SMART errors during boot.

but now my HDA has went completely offline.

I hooked up a monitor and took some photos of the errors I found in the log.

hope somebody can help, I seriously have no clue where to start looking:

Image

Image

Image

User avatar
bigfoot65
Project Manager
Posts: 11924
Joined: Mon May 25, 2009 4:31 pm

Re: HDA went offline

Postby bigfoot65 » Tue Mar 03, 2015 3:28 pm

Could this be a hardware failure? I would think it's highly possible.

It might be a drive going bad. If so, you might want to disconnect it and comment out in /etc/fstab. Then reboot and see if anything changes.
ßîgƒσστ65
Applications Manager

My HDA: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz on MSI board, 16GB RAM, 1TBx1+2TBx2+4TBx2

kikkegek
Posts: 341
Joined: Sun Jul 31, 2011 9:28 am

Re: HDA went offline

Postby kikkegek » Wed Mar 04, 2015 12:53 am

hi Bigfoot65,

I have checked all drives with "smartctl" and found no pending sectors or relocated sectors.

ATA5 seems to be my OCX VERTEX SSD disc, that I use as a system disc.

SMART INFO of SSD:

Image

and boot ATA5 errors

Image

info from dmesg
[jochen@localhost ~]$ dmesg | grep ata5
[ 2.430644] ata5: SATA max UDMA/100 mmio m512@0x90100000 tf 0x90100080 irq 21
[ 2.735088] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2.754292] ata5.00: ATA-8: OCZ-VERTEX2, 1.37, max UDMA/133
[ 2.754301] ata5.00: 78161328 sectors, multi 16: LBA48 NCQ (depth 0/32)
[ 2.776294] ata5.00: configured for UDMA/100
[ 4.614994] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 4.615031] ata5.00: BMDMA2 stat 0x686d2009
[ 4.615041] ata5.00: failed command: READ DMA
[ 4.615055] ata5.00: cmd c8/00:00:60:1a:8f/00:00:00:00:00/e0 tag 0 dma 131072 in
[ 4.615062] ata5.00: status: { DRDY ERR }
[ 4.615066] ata5.00: error: { ABRT }
[ 4.656294] ata5.00: configured for UDMA/100
I can not comment this disc out, because its the system disk...and from the smart info it looks like there is no problems with it.

DISK2 in my greyhole pool has been moving in and out by itself a couple of times recently (got root-messages by email about it), but no errors of the SSD disc.

I have no clue where to start my troubleshooting

kikkegek
Posts: 341
Joined: Sun Jul 31, 2011 9:28 am

Re: HDA went offline

Postby kikkegek » Wed Mar 04, 2015 11:24 am

another day, another chance at solving this mistery.

I booted the HDA again today and its stuck at this screen:

Image

User avatar
bigfoot65
Project Manager
Posts: 11924
Joined: Mon May 25, 2009 4:31 pm

Re: HDA went offline

Postby bigfoot65 » Wed Mar 04, 2015 2:52 pm

So your answer is near the bottom. The drive you have mounted as drive3 is stopping the boot sequence from completing. That is not your SDA OS disk, so must be a Greyhole drive.

Comment out that disk in /etc/fstab, disconnect that drive, and then try booting it again. You will most likely have to boot the HDA using a Live CD to edit the file unless somehow you can miraculously access the HDA via SSH.

Make sense?
ßîgƒσστ65
Applications Manager

My HDA: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz on MSI board, 16GB RAM, 1TBx1+2TBx2+4TBx2

kikkegek
Posts: 341
Joined: Sun Jul 31, 2011 9:28 am

Re: HDA went offline

Postby kikkegek » Wed Mar 04, 2015 3:00 pm

Did that and then it complains about disk2

And the READ DMA ERROR is about ata5 with is the ssd for the OS.

User avatar
bigfoot65
Project Manager
Posts: 11924
Joined: Mon May 25, 2009 4:31 pm

Re: HDA went offline

Postby bigfoot65 » Wed Mar 04, 2015 3:12 pm

Ah. Could be the disk controller is going bad. There are really so many variables with problems like this that it's sometimes hard to pinpoint the real problem.

I would recommend you remove the one disk and see if the OS drive still has problems.

I really do not recommend using SSD for a server. They are designed more for a desktop. One downfall with using them on servers is Linux often performs numerous read/writes and can shorten the SSD life quite rapidly. Very little benefit for the cost in my opinion.

If you have an enterprise grade SSD, it might be beneficial. However, they typically are very expensive. They are designed for all the pounding a server OS generates over time. Most of what we can afford today are not though.
ßîgƒσστ65
Applications Manager

My HDA: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz on MSI board, 16GB RAM, 1TBx1+2TBx2+4TBx2

kikkegek
Posts: 341
Joined: Sun Jul 31, 2011 9:28 am

Re: HDA went offline

Postby kikkegek » Wed Mar 04, 2015 3:22 pm

Will try tomorrow.

My SSD is fine. Check out the
ssd_life_left
it has 100% left after one year of continuous operation. It's got only the OS.

User avatar
bigfoot65
Project Manager
Posts: 11924
Joined: Mon May 25, 2009 4:31 pm

Re: HDA went offline

Postby bigfoot65 » Wed Mar 04, 2015 3:39 pm

Ok sounds good. I hope that fixes your issue.

I did not mean the SSD was bad necessarily, but it could be the hardware.

The real test will be when that other drive is eliminated. If things do not return to normal, then we can diagnose further.
ßîgƒσστ65
Applications Manager

My HDA: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz on MSI board, 16GB RAM, 1TBx1+2TBx2+4TBx2

kikkegek
Posts: 341
Joined: Sun Jul 31, 2011 9:28 am

Re: HDA went offline

Postby kikkegek » Thu Mar 05, 2015 2:24 am

started the HDA this morning and getting again new errors during boot

Image

thinking this might be the SATA controller I use for the OS disc (SSD) and the newest 2TB regular HDD.

I use a extra sata controller, because my mainboard only offers two SATA connections and I run 3 HDDs and one SDD.

Who is online

Users browsing this forum: No registered users and 66 guests