Sunday, February 18th 2007

Google Claims Hard Drives Don’t Fail Because of High Temperature or Usage

One common argument in the world of computing is that high temperatures make hard drives more likely to fail, and the same is said for high usage levels. However, when conducting internal research, search giant Google suggests that this is only true for hard drives in their first months of operation or once they are over five years old. According to Google, there are so many other variables that the biggest factor in the lifetime of hard drives is actually the model itself rather than the conditions. In fact, Google saw a trend suggesting that drives are more likely to fail at lower temperatures or extremely high temperatures, and generally speaking, hard drives failed less as temperature increased (until these extremes were reached). As for high usage, the research showed that hard drives only seem to be affected by high usage in the first months or after it is five years old, and other than that the rate of failure was the same as drives in low-usage environments.Source: TG Daily
Add your own comment

40 Comments on Google Claims Hard Drives Don’t Fail Because of High Temperature or Usage

#1
kakazza
So can Google please recommend me some drives for my RAID? ;)
Posted on Reply
#2
Mussels
Moderprator
I kinda agree with them - i have seen high temps etc kill drives, but they're right, once they live 6 months or so they rarely die til the 4+ year mark.

(this is based on my ~10 years spent crying with every GB of data i lost ;) )
Posted on Reply
#3
kakazza
I am at 0GB so far ;)
Posted on Reply
#4
ryzak7
Wow, the concept makes sense too. Imagine that. All mechanical devices require a "break in" time. If excess temps or speeds or whatever are achieved before the moving parts have a chance to "get used to each other" there tends to be a problem.
Posted on Reply
#5
Alcpone
I have a 74Gb seagate barracuda that I got in 06/2002, its still going strong to this day, it only has a 2Mb cache but I only use it for storage of downloaded games :D
Posted on Reply
#6
Canuto
Hum.. Interesting story :)
Posted on Reply
#7
dmce
Would be good if they gave us some figures for various drives. However silence is king for me.

What is the warranty on new drives?
Posted on Reply
#8
PVTCaboose1337
Graphical Hacker
I still have a HD, 8 years old, 2GB, still works, but is slow.
Posted on Reply
#9
Easy Rhino
Linux Advocate
by: ryzak7
Wow, the concept makes sense too. Imagine that. All mechanical devices require a "break in" time. If excess temps or speeds or whatever are achieved before the moving parts have a chance to "get used to each other" there tends to be a problem.
just like a new car. it is highly recommended that most new cars stay below 50 MPH the first 1000 miles. and then it must be brought into the shop for a quick once over and an oil change.
Posted on Reply
#10
DRDNA
My personal opinion is that the biggest KILLER of harddrives is lack of Ram !! when there is not enough ram then the OS make the harddrive use virtual ram to make up for it and when a mechanical devise like a hard drive tries to do what non mechanical ram does ,then there will be a problem as the mechanical devise will where out trying to compensate..

I see this all the time at the company I work for...My company would rather not spend the little extra for the correct amount of ram , but they don't seem to mind at all replacing hard drives at an alarming rate.
Posted on Reply
#11
DRDNA
by: Easy Rhino
just like a new car. it is highly recommended that most new cars stay below 50 MPH the first 1000 miles. and then it must be brought into the shop for a quick once over and an oil change.
This doesnt hold true for performance cars.. I had a 455 oldsmobile professionally built (street drag) 13.5 in 1/4 mile ... They told me to break it in the way you would drive it(with alot of oil changing at first . Every buddy at the track for the most part did this as well.
Posted on Reply
#12
Alec§taar
by: DRDNA
My personal opinion is that the biggest KILLER of harddrives is lack of Ram !! when there is not enough ram then the OS make the harddrive use virtual ram to make up for it and when a mechanical devise like a hard drive tries to do what non mechanical ram does ,then there will be a problem as the mechanical devise will where out trying to compensate..
Good point, & on that note?

* Moving the pagefile.sys to ANOTHER DISK, can help!

(Also, on your logic & train-of-thought? Fragmentation will also contribute to excessive head movements as well, adding additional & unneeded "wear & tear" as well!)

:)

APK
Posted on Reply
#13
DRDNA
by: Alec§taar
Good point, & on that note?

* Moving the pagefile.sys to ANOTHER DISK, can help!

(Also, on your logic & train-of-thought? Fragmentation will also contribute to excessive head movements as well, adding additional & unneeded "wear & tear" as well!)

:)

APK
Good minds think alike !!!!! I have been telling people about the harm of a fragmented drive for what seems like for ever . Most people dont listen:confused: :p :confused: I guess thats why there are guys like us around .We can fix ya right up no problem:laugh:
Posted on Reply
#14
Mediarocker543
I Had a 7GB drive. It worked until ESD got it. T_T

In my current rig i've had it for like..2-3 years or more.. i lost count.. but its running the origional drives and everything nice and strong. Seen plenty of drives die before their prime due to ESD and dust. and overal lack of use.
Posted on Reply
#15
Alec§taar
by: Easy Rhino
just like a new car. it is highly recommended that most new cars stay below 50 MPH the first 1000 miles. and then it must be brought into the shop for a quick once over and an oil change.
Exactamundo... & like cars, or anything w/ moving parts? You either breakdown in the first few months, or have a winner that goes on for YEARS, w/ proper care.

APK
Posted on Reply
#16
Completely Bonkers
The most important pieces of information I got from this article http://216.239.37.132/papers/disk_failures.pdf was this:


1./ For long life, keep disks BELOW 45 degrees C
2./ The optimal temperature is 30-35 degrees C
3./ Cooling below 20 degrees C is not a good idea... although there is no clear understanding why
4./ The failure rate is increasing quickly with temperature... so that a drive at 60+C would be in big trouble
5./ There were not comments on temperature variability (which IMO is the biggest problem... a HDD going from cold to hot back to cold again and which might explain point 3 and is the typical desktop/workstation scenario compared to server... ie. switched on and off in regular cycles)

6./ Failure rate does not seem to increase per year according to the shown stats... y2 and y3 in fact are higher risk than y4 and y5
7./ If a HDD has survived to 5 years, there is no indication that failure rate is getting worse... (A surprise result to me)
8./ Failure rate 1y and less is much lower than 2-5, BUT this is probably due to NEW standards in drive manu, or NEW brand or model number being used compared to older systems, or NEW drives having big cache... and this is reducing wear and tear on heads/arms

***

Summary

1./ High temperatures cause exponentially increasing errors when temps > 45C. Keep you HDD temps down. Run SMART and check you are not going over these temps. If you are MOVE YOUR DRIVES so they are not sandwiched. Increase your HDD cooling, e.g. make sure your system temp is low... and airflow to HDD

2./ Newer drives are more realiable than older drives. BUT AGING does not seem to increase failure rate. (Assuming no knocks or other issues that accumulate over time)

NOTE THAT THE INFO here contradicts the OP's summary. IMO we have a bad summary at the start of this thread...
Posted on Reply
#17
WarEagleAU
Bird of Prey
OK....I dont know what to believe on this. Is it good that Google did this or what?
Posted on Reply
#18
Alec§taar
by: WarEagleAU
OK....I dont know what to believe on this. Is it good that Google did this or what?
I tend to believe it, mainly because of the size of the statistical sampleset... it is large, & on disks that doubtless get POUNDED, day in & day out, running db engine queries like mad + having multiple users (1000's @ a time) pounding on them.

I do think that heat & usage DO contribute to a HDD's end of life, absolutely, because it's common-sense, but, what I THINK Google's trying to say is, they do as well, but NOT AS LARGELY AS YOU'D THINK, & IT'S MORE ABOUT THE QUALITY OF THE DRIVE ITSELF & ITS INTERNALS INITIALLY (in other words, how well it's made, from the get-go).

Good enough analysis for me... decent 'scientific method' & statistical sampling set, in the RIGHT type of conditions in heavily used diskdrives.

APK

P.S.=> Is this the "definitive work" on this? No, probably not possible, but VERY close imo... Admittedly, I did NOT look @ the .pdf they supplied, but upon scrutiny of the topic here & @ SLASHDOT earlier today, it is a pretty damn good step in the RIGHT direction... especially when you compare it to other items that are mechanical, like cars for instance, as others mentioned above... apk
Posted on Reply
#19
mullered07
well ive never had one hdd die on me yet (touchwood :slap: ) but thats interesting reading all the same, and i would have to agree from my experiences with hdd's

ive mainly used second hand/ old drives that have been around the block a good few (million?) times lol and at most ive had the odd slow/noisy drive which isnt too bad unless your constantly moving files about or downloading.
Posted on Reply
#20
Completely Bonkers
Does anyone have any data on the "bonk" effect. That is, the cumulative effect of small bumps, shocks, and vibration? For example, has anyone got a car PC and discovered the thing died after one year? Or other vibration environments.

I know HDD are supposed to survive up to 300G (as long as they are OFF at the time of the bump). But what about the cumulative effect of vibration while ON?

I've had TERRIBLE experience from second hand HDD... and never buy them now. All the second hand HDD arrived DOA or went dead within a couple of weeks/months. The issues of second hand HDD are: unable to see drop or vibration damage... is the person selling due to problems... and they do a reformat and then sell on.

Also... second hand systems (ex corporate desktops) tend to be handled like SH1t during pickup, store, and repackaging. Hence while the PC is fine... the HDD usually has a short life.
Posted on Reply
#21
Zubasa
The only hard drive that die on me was six years old;)
Posted on Reply
#22
Wile E
Power User
by: DRDNA
This doesnt hold true for performance cars.. I had a 455 oldsmobile professionally built (street drag) 13.5 in 1/4 mile ... They told me to break it in the way you would drive it(with alot of oil changing at first . Every buddy at the track for the most part did this as well.
I was actually gonna chime in on this, but you beat me to it. lol The changes in this way of thinking comes from the advances in precision machining, lubricants and metallurgy. The tolerances have improved so greatly, that the break in cycles of old, are now obsolete.
Posted on Reply
#23
SpoonMuffin
by: kakazza
So can Google please recommend me some drives for my RAID? ;)
i would personaly go seagate, 5year warr and in my exp GREAT support, also fail rates have been lower then wd and maxtor in my exp, maxtor-hitchi-samsung are all good choices to, WD in my long term exp have a higher fail rate.


by: Completely Bonkers
The most important pieces of information I got from this article http://216.239.37.132/papers/disk_failures.pdf was this:


1./ For long life, keep disks BELOW 45 degrees C
2./ The optimal temperature is 30-35 degrees C
3./ Cooling below 20 degrees C is not a good idea... although there is no clear understanding why
4./ The failure rate is increasing quickly with temperature... so that a drive at 60+C would be in big trouble
5./ There were not comments on temperature variability (which IMO is the biggest problem... a HDD going from cold to hot back to cold again and which might explain point 3 and is the typical desktop/workstation scenario compared to server... ie. switched on and off in regular cycles)

6./ Failure rate does not seem to increase per year according to the shown stats... y2 and y3 in fact are higher risk than y4 and y5
7./ If a HDD has survived to 5 years, there is no indication that failure rate is getting worse... (A surprise result to me)
8./ Failure rate 1y and less is much lower than 2-5, BUT this is probably due to NEW standards in drive manu, or NEW brand or model number being used compared to older systems, or NEW drives having big cache... and this is reducing wear and tear on heads/arms

***

Summary

1./ High temperatures cause exponentially increasing errors when temps > 45C. Keep you HDD temps down. Run SMART and check you are not going over these temps. If you are MOVE YOUR DRIVES so they are not sandwiched. Increase your HDD cooling, e.g. make sure your system temp is low... and airflow to HDD

2./ Newer drives are more realiable than older drives. BUT AGING does not seem to increase failure rate. (Assuming no knocks or other issues that accumulate over time)

NOTE THAT THE INFO here contradicts the OP's summary. IMO we have a bad summary at the start of this thread...
i cant agree with your artical here, i have a maxtor that overheated BADDLY, the data was totaly scrambled, i ran 2 low levels on it then used hdd regen on it 2x and its been going over 2 years since without fail, it had reached over 70c in an external case thats fan died.

in my exp their results are pretty true, some drives are MADE TO TAKE THE HEAT, like 15k seagate scsi drives, they all run pretty warm to HOT AS HELL, but the fail rate is so low that i have only seem maby 5 of them die since the 15k drives came out, and all those died in the burn in period, seagate replaced them FAST(1 week turnaround or less) and they had a 10year warr on them as well when i asked if we needed to cool them better the tech i talked to said they where spicificly designed to run at those high temps and it wasnt anyting to worrie about, just not to touch them after they had been on and under heavy use for a while(already larned that, blistered fingers suck)

also got an old uw scsi hdd here tahts quantium, 9.4gb, 7500rpm(one of the first 7k rpm drives its about 2x as thick as todays drives) that sucker has warrrning allover it, its YEARS old, (like late 90's maby 2001 at latest) and its still running FLAWLESS in my old p3 box, and it dosnt got any extra cooling on it other then case airflow(not that good its an old pivilion hp case)

in my exp this artical is VERY true, if a drives going to fail its normaly in the first 6mo-1year depending on level of use(burn in use time) if it lasts that year and dosnt get droped/outside damnage its going to last 5+years.

now wd do have some querks in my exp, mostly with VIA chipset ide controlers, if they stop being recognized on the controler you will never get them to be recognized on that board/chipset again, but they will work on nforce,sis,intel,ali/uli,serverworks chipsets just fine, i have only seen this once with sata WD drives and it may have been a flook the drive worked on an addin card or another chipset based board but wouldnt even detect on via chipset based controlers, i just swaped it with a comp i was setting up gave the person the hitchi i was using in the new build, the wd was less then a month old, no harm since the warr for the system was on the shop for a year eather way, and the drive had a 3 year warr(owner only buys drives with at least 3year warr)
Posted on Reply
#24
Alec§taar
ANOTHER ARTICLE ON THIS SUBJECT (rated well by others @ SLASHDOT)

Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?

http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html

:)

================================
KEY POINTS SUMMARY:
================================
Infant mortality? - Interestingly, we observe little difference in replacement rates between SCSI, FC and SATA drives, potentially an indication that disk-independent factors, such as operating conditions, affect replacement rates more than component specific factors.”
-----------------------------------------
Vendor MTBF reliability?. . failure rate is not constant with age, and that, rather than a significant infant mortality effect, we see a significant early onset of wear-out degradation.
-----------------------------------------
Vendor MTBF reliability? While the datasheet AFRs are between 0.58% and 0.88%, the observed ARRs range from 0.5% to as high as 13.5%. That is, the observed ARRs by dataset and type, are by up to a factor of 15 higher than datasheet AFRs. Most commonly, the observed ARR values are in the 3%rang
------------------------------------------
Actual MTBFs? The weighted average ARR was 3.4 times larger than 0.88%, corresponding to a datasheet MTTF of 1,000,000 hours.
------------------------------------------
Drive reliability after burn-in? Contrary to common and proposed models, hard drive replacement rates do not enter steady state after the first year of operation. Instead replacement rates seem to steadily increase over time.
------------------------------------------
Data safety under RAID 5?. . . a key application of the exponential assumption is in estimating the time until data loss in a RAID system. This time depends on the probability of a second disk failure during reconstruction, a process which typically lasts on the order of a few hours. The . . . exponential distribution greatly underestimates the probability of a second failure . . . . the probability of seeing two drives in the cluster fail within one hour is four times larger under the real data . . . .
-------------------------------------------
Independence of drive failures in an array? The distribution of time between disk replacements exhibits decreasing hazard rates, that is, the expected remaining time until the next disk was replaced grows with the time it has been since the last disk replacement.
-------------------------------------------

:)

* You guys may wish to check that out... I found it @ SLASHDOT today, & it continues to expand on this topic, albeit from another set of researchers findings.

APK

P.S.=> This is what I meant above in 1 of my posts here about GOOGLE's findings being "the definitive work" on this topic... although well done & from a heck of a sampleset (their doubtless CONSTANTLY pounded on disks for their servers in their search engine), this one seems to have been rated HIGHER as a good analysis of this @ SLASHDOT & per this quote:

http://hardware.slashdot.org/hardware/07/02/21/004233.shtml

---------------------------------------
"Google's wasn't the best storage paper at FAST '07. Another, more provocative paper looking at real-world results from 100,000 disk drives got the 'Best Paper' award. Bianca Schroeder, of CMU's Parallel Data Lab"
---------------------------------------

Are SLASHDOT readers "the final word"? No, nobody really is, & no one knows it all in this field or is the "God of Computing" etc. but, they are another point of reference for you all into this topic!

(As far as slashdotters go? Well, imo, many are TOO "Pro-Linux/Anti-Microsoft", but there are guys there that REALLY know their stuff as well... take a read, & enjoy if this topic's "YOU"... )

That said, if the actual paper's "TOO MUCH" & it does get that way @ times...? You can always skim thru what the readers @ slashdot stated... they offer much of what it says, in more "human language terms/laymen's terms", & often 'from the trenches'... apk
Posted on Reply
#25
Alec§taar
ANOTHER GOOD READ (less techno) ABOUT THIS SUBJECT FROM STORAGEREVIEW.COM

BrandMostReliable:

http://faq.storagereview.com/tiki-index.php?page=BrandMostReliable

:)

* They seem to tend to also contradict GOOGLE's findings as well as the other paper, lol!

(In that, yes, heat/wear & tear do matter as well as use patterns & such, but rather than handling during shipping & also conditions of use as far as shock (CompletelyBonkers notes this above) matter more, which is what I drew from THIS article @ least vs. that from GOOGLE & USENIX.ORG's article on the subject @ hand here!)

APK

P.S.=> Personally, I think it's a GOOD combination of ALL of each of the 3 article's points: How the disk is handled during shipping + manufacture & yes, conditions it's used in, as far as 'shock/bonk factor', also heat & use patterns as well during usage, & lastly, how well it's made from the 'get go' in terms of parts quality & engineering used... apk
Posted on Reply
Add your own comment