New Hard Drives failing

jwhirry
Posts: 3
Joined: Tue Apr 05, 2011 9:13 am

New Hard Drives failing

Postby jwhirry » Fri Apr 08, 2011 1:18 pm

I have been using AMAHI for about 2 weeks now and love it. However I am having some issues that I am suspecting may just be hardware related.

So far I have used the following or tried to use the following Hard Drives in my system.
Drives.JPG
Drives
Drives.JPG (37.07 KiB) Viewed 2530 times
So to sum up as quickly as possible,

I have some older drives that have been used thru several configurations of Windows and Linux that don’t have any errors. I first bought 2 WD10EALX Caviar Blue 1TB drives, 1 was DOA, the other I formatted with gparted and installed into Amahi and Greyhole pool without issue. A week later the system is warning me failure is imminent S.M.A.R.T. data gives a lot of Realocated, Pending Realocation, and uncorrectable errors. I order the 2 WD1002FAEX Caviar Black 1TB, and completely reinstalled this was Wednesday evening. Yesterday afternoon the system is already warning me of failing drives.

Being 3Gb/s SATA ports on the board I looked for issues between the 6Gb/s drives and the board. I had originally read somewhere (:P) that the system or the drive were able to detect the board speed vs. the drive and adjust the drives data rate to match. When the first WD10EALX gave warnings I thought maybe I should dig up some jumpers to limit these things, thus the WD1002FAEX are jumpered to limit to 3Gb/s. I have tried swapping SATA cables (even bought new cables before installing the WD10EALX drives. I also have used different power cables and ports. My system is slightly underclocked and the last time I ran memtest it came thru stable. The Mobo is 6 years old as is the PSU, the CPU and RAM are about 3 years old.

I ran e2fsck on the 2 new drives last night, it only scanned the system drive, but worked for quite awhile on the other WD1002FAEX that I am using for storage.

I glanced over the S.M.A.R.T. data for all the drives, the older 3Gb/s drives have no new errors and no older or only a minute amount of older errors, but the system—from the S.M.A.R.T. data is telling me the new drives are failing.

So if anyone has any ideas of why the new drives would be failing and nothing else, I would be happy to hear them. Otherwise I will continue to research and troubleshoot.

User avatar
cpg
Administrator
Posts: 2618
Joined: Wed Dec 03, 2008 7:40 am
Contact:

Re: New Hard Drives failing

Postby cpg » Fri Apr 08, 2011 5:08 pm

Sounds like infant mortality?
My HDA: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz on MSI board, 8GB RAM, 1TBx2+3TBx1

jwhirry
Posts: 3
Joined: Tue Apr 05, 2011 9:13 am

Re: New Hard Drives failing

Postby jwhirry » Mon Apr 11, 2011 9:46 am

My apologies if this would be better off over in storage pooling/greyhole.

From my short browse of the net it looks like some other people have had similar issues with new WD Caviar drives and a write to zeros and a full format has seemed to cleared up the issue for them. So I’ve decided I would like to first pull the 1TB WD Caviar Black I use for storage, back it up to another drive on my other machine, zero the drive and run some rigourous extended testing, and as long as it checks out ok, copy the data back to it and reinstall it. If everything goes as planned I would also like to do the system drive.

I know this is going to be a time consuming process but for my wife's sanity I want to be sure I do it right the first time and as quickly as possible.

Would the best or proper method be to:
  • 1. Just uncheck it from the greyhole storage pool
  • 2. Edit the /etc/fstab to remove its UUID
  • 3. Shutdown and pull the drive.
  • 4. Do my back up and testing.
  • 5. Put the drive back in and add it like I would for any other drive.
  • 6. Once back in the greyhole pool run

    Code: Select all

    greyhole --fsck
Alternately would it be good to use

Code: Select all

greyhole --wait-for [=path]
and what would be the proper method to use that?

I believe the backup coming off the drive could be cloned or copied but going back I want the data placed contiguously and not in the reallocated position it currently is in. What do you think the best method for this would be? I have looked at several programs and either I cannot seem to figure out the drive copy feature or they do not support ext4.

I think I could use the

Code: Select all

greyhole --going[=path]
but I want to avoid backing up data to a drive that will possibly have more read write errors, and I thought this might get redundant with switching drives in and out. I suppose I could get this to work if I unchecked the system drive storage from the pool while it ran.

And then for the system drive, I was thinking maybe of cloning the system drive to the other drive as long as it passes testing, but again I don’t know of a good way to clone this drive, especially since I used LVM for the system partitions.

I think I could again just use

Code: Select all

greyhole --going[=path]
and then do a complete reinstall once it passes testing, then add all the storage drives and partitions to the pool. Then run

Code: Select all

greyhole --fsck
and run

Code: Select all

greyhole --balance
once that is complete.

I know Western Digital Lifeguard Diagnostic Tool can write zeros and do extended drive testing or at least it used to have this functionality. Is there another program out there that anyone thinks would do a good job rigorously testing these drives?

I have not had any errors on any of the drives since Thursday, and looking at the log it looks like the errors occurred over about a 2 hour period from about 12:00 - 14:00 local. During which I had been copying data to the pool over a 10+ hour period. I have worked with the data on the drives since, even running a greyhole --balance, which moved quite a bit of data around. So I am a bit clueless, as to what would have caused the errors, I find it hard to believe I received 4 bad drives in a row. I am hopeful that this will clear the issue.

Sorry this got so long I had a lot of ideas and wanted to be sure that I have the best procedure, thanks for taking the time to read this thru and thanks for any help offered

mage182
Posts: 1
Joined: Mon Apr 11, 2011 1:05 pm

Re: New Hard Drives failing

Postby mage182 » Mon Apr 11, 2011 1:08 pm

Since I don't have much experience with the linux side of things, I'll order my hardware experience prospective.

How is the power you have your machine attached to? Since buying a good quality UPS for my two machines I haven't had any of the 6 drives I've used in the past 6 years go bad yet. I attribute it mostly to proper cooling and power quality. A UPS with AVR keeps the power clean and plentiful. No brown outs, no spikes.

Since it's just the drives failing and not anything else. It's unlikely that it's the power. But it can't hurt to check the ground and pick up a UPS. It is after all a place you put your data to keep it safe.

jwhirry
Posts: 3
Joined: Tue Apr 05, 2011 9:13 am

Re: New Hard Drives failing

Postby jwhirry » Wed Apr 13, 2011 11:32 am

Thanks for the tip on UPS. I do not currently use one. I do use what should be a good surge protector, I wouldn’t plug in any expensive or sensitive electronics around here without at least using one of those, we occasionaly get some wicked power surges. I will have to read up on UPS, I know a few people who use them, but don’t know what they think of them other than your review here.

As for the drives, I am thinking they are simply defective and I am starting the RMA process. One of the Caviar Black failed Monday night, I managed to back it up on my other machine. The other drive was showing more and more reallocations and pending reallocations. I tried zeroing the Caviar Blue 1 TB and after it completed a full zero run, which took more than 22 hours, it failed smart testing within seconds.

And what looks promising the replacement for the DOA Caviar Blue 1TB drive I RMA'd was received Friday 4/8, I have given the replacement drive some heavy use to stress it and to try and get any errors to show up, it has been filled with files and reformatted twice now, and is currently finishing the backup of the two Caviar Blacks, it was in the storage pool for a little over 24 hours before the one caviar black failed, and as of this morning still had zero pending, reallocated, or unreallocatable sectors. Same with any of the older drives I had used in the machine. It looks like it is just these three new drives, well 4 total defective drives if you count the DOA that have been an issue.

As an aside I now have wised up on RMA’s, with a half month turn around to receive the replacement, the 30 day return period ticks away from the original invoice date, its an open policy, but I am just going to do refunds and reorder if still want the product, just my opinion and experience.

rkillcrazy
Posts: 44
Joined: Wed May 19, 2010 9:01 am
Location: USA

Re: New Hard Drives failing

Postby rkillcrazy » Thu Apr 14, 2011 6:42 am

Another thing to look at is the BIOS. I've had new boards fail miserably until a updated the BIOS version. Some of these newer HDDs have controller boards on them that give older motherboards a fit! Try updating the BIOS and see if these newer HDDs work better. Also, look at power. I know a UPS has been mentioned but what about your power supply unit? An older PSU might not have the needed wattage to power the CPU, the fans, the HDDs and anything else you have in the case. Failing PSUs were always good for giving strange, inconsistent errors when I was a road tech....
Amahi HDA:
  • MOBO: ASRock K10N78
  • CPU: AMD Athlon II 64 X2 Dual Core Processor 5600+
  • RAM: 2-GB (dual-channel)
  • HDDs: WD20EARS (qty: 4)
HTPC:
  • Boxee Box
  • Popcorn Hour A-400

rkillcrazy
Posts: 44
Joined: Wed May 19, 2010 9:01 am
Location: USA

Re: New Hard Drives failing

Postby rkillcrazy » Thu Apr 14, 2011 6:45 am

As an aside I now have wised up on RMA’s, with a half month turn around to receive the replacement, the 30 day return period ticks away from the original invoice date, its an open policy, but I am just going to do refunds and reorder if still want the product, just my opinion and experience.
I'm sure you're dealing with the vender on this, but you could also send them back to the manufacture. WD used to have a cross-ship setup where they could take your CC number and hold it until they got the RMA'd HDD in their hands. In the meantime, they'd ship you a new one. It's another option...
Amahi HDA:
  • MOBO: ASRock K10N78
  • CPU: AMD Athlon II 64 X2 Dual Core Processor 5600+
  • RAM: 2-GB (dual-channel)
  • HDDs: WD20EARS (qty: 4)
HTPC:
  • Boxee Box
  • Popcorn Hour A-400

Who is online

Users browsing this forum: No registered users and 26 guests