Why do Solid State Drives fail so suddenly?

Shrek · Mar 4, 2022

What is going on when solid state drives fail? If they run out of spare sectors they could go into read-only mode, so I assume it is something more than this.

AsRock · Mar 4, 2022

4 fail only one with read only and that was recoverable, as for why there is just so many things that can go wrong and some times it's just a cap or fuse and a easy fix.

freeagent · Mar 4, 2022

I had a crucial M4 256 fail with 98% left..

Guess which brand I won’t be buying :laugh:

elghinnarisa · Mar 4, 2022

If I would hazard a guess, a combination of wear leveling and system area not doing wear leveling. Both with their own issues.
The main storage area of the NAND will do wear leveling, so it will wear more or less evenly. Which also means they will also all get borked, more or less evenly. So when one goes, it all goes.
And the second part would be that the system area of the NAND, which stores firmware, wear leveling data, metadata in general etc. Does not do wear leveling, and if that section goes well then you got a drive without firmware and it will just fail to boot.

Though its been a while since I read about it, not sure if thats the case but it would be my guess at least. And mostly the second part being the larger issue with drives just going poff one day even with plenty of "health" left reported by SMART.

Shrek · Mar 4, 2022

A car ECM has two CPU's, one rather anemic so the vehicle can limp back home in case of failure.

Why can't a solid state drive include a similar second CPU so it can limp along after failure and allow access to the data?

ShiBDiB · Mar 4, 2022

Andy Shiekh said:
A car ECM has two CPU's, one rather anemic so the vehicle can limp back home in case of failure.

Why can't a solid state drive include a similar second CPU so it can limp along after failure and allow access to the data?

That's a fairly easy question to answer... $$$$$

elghinnarisa · Mar 4, 2022

Depends on how you see it, sure the ECM might have redundant systems. But does everything else? Does it also come with extra cylinders of one implodes? How about a third axle if one falls off? Two steering wheels? Plenty of things that can make sure no limping will be done.

Either way just because a drive isnt detected or seen as "raw" doesnt mean the data is gone. Factory Access Mode exists for a reason. From there you can read the raw data from the blocks directly. So data recovery is certainly possible in most cases. Though it's not exactly tools distributed freely and as far as I recall, there is not standard method for accessing said factory mode.

ThrashZone · Mar 4, 2022

freeagent said:
I had a crucial M4 256 fail with 98% left..

Guess which brand I won’t be buying

Hi,
I still have four very old crucial mx100 in operation
2-256gb
2-128gb
Only one died and it was really linux never running trim on it although the crucial firmware wasn't compatible with linux as it turns out
As long as windows is the os they are fine and crucial sent back a rma replacement although a refurbished it's still working just fine heck might be the same I sent to them lol

How long do Hard Drives and SSDs last? How do they die over time?

Here's some details on how HDDs and SSDs regarding how long they last and how you can avoid data loss issues.

www.tweaktown.com

freeagent · Mar 4, 2022

Now that I think of it, it died while running in my X58 system, and I lost 3 out of 4 Raptor 150s, and a RevoDrive with that board in the 8 or 9 years that I ran it as a daily..

chrcoluk · Mar 4, 2022

Unless the information is made public by people who examine failures we wont really know, but from what I have researched, I think all of the following are possible causes.

1 - Early more primitive firmware having poor wear levelling so some cells fail way earlier than they would do with wear levelling.
2 - The address mapping table (sorry if I got name wrong), as I understand it is stuck on the same physical cells, certain workloads might wear these down before the other cells, I think also SATA ssd's with no DRAM would have accelerated wear on these cells as well. As I understand it NVME dramless ssd's can utilize system ram in place of a dram buffer so might not have this issue to the same degree although still riskier than onboard buffer.
3 - Bad firmware. Where a bug could brick an ssd.
4 - Blown capacitors or other failed circuitry, more likely on power cycles.

Enterprise SSD's because they have power loss protection, they dont honor synchronous writes, this reduces the window of vulnerability to kernel panics, as the data is written quicker, it also drastically reduces write amplification as synch writes are very bad for that. I actually emulate enteprise ssd behavior on my consumer ssd's when possible as I am now of the opinion it is safer. I have never blogged or posted about it before though as I am aware its a controversial opinion, but as an example, on a FIO sync write test, on one of my consumer ssd's in default mode it took 46s to write all data so 46s of vulnerability to kernel panic or power loss, with enterprise emulation (disabling sync on ssd but not the filesystem), the same data is written in 7s, so much smaller window of vulnerability. In this mode the filesystem still is sync not async, so isnt immediately reporting success to the software.

Full async it writes in 4s (with OS been told done immediately) providing there is no other writes at same time, as sync writes on filesystem nearly always take priority over async writes. So I still keep sync enabled on filesystems.

mechtech · Mar 4, 2022

I remember over a decade ago when they were new, a lot of times it was attributed to firmware, some wouldn't even come out of sleep.

OCZ had issues, there was a few controllers then too, sandforce, indilinx/bigfoot, etc. I would say that's long in the past, but who knows.

Chrcoluk covered it well.

RJARRRPCGP · Mar 4, 2022

freeagent said:
I had a crucial M4 256 fail with 98% left..

Guess which brand I won’t be buying

I'm with you, because I caught the SMART being wacky with my Crucial MX 500 500 GB SATA SSD. It got taken out of my PC by February 4, 2021.
IIRC, February 4, 2021, was when I installed my Samsung 970 Pro 512 GB NVMe SSD.

GerKNG · Mar 4, 2022

the only SSDs that are went bad on me were all M.2 drives.

i have/had 6x 1TB MX500 which are all fine since years of constant usage (not just being a data grave.)
my brother still runs a very old sandisk plus (120GB) which exceeded its TBW by almost 2X and is still fine.
atm i have two 4TB 870 QVO, 1x 4TB Sandisk Ultra 3D (basically a WD Blue) and two 500GB 870 Evo for backups.
not a single SSD has even one dead sector or any kind of error.

But my HP EX900 (M.2 NVME) dropped its write speed to 0.x MB/s and bluescreened non stop after less than a week of usage.
my Corsair MP600 just froze my PC and never came back at the second day.
and now my MP600 Pro runs fine but already dropped to 94% with just around 2% of its TBW reached. (no errors)

and i had a WD Green M.2 SATA SSD for an old laptop that went to read only before the windows installation was finished.

Nike_486DX · Mar 5, 2022

My guess is that ssds are similar to thumb drives in that aspect. The memory chip can just fail, and with thumb drives it usually happens quite often if its got some generic/low quality memory. As of current ssds they are all using tlc and qlc which is both cheaper /gb and is also less reliable than lets say slc or mlc. Especially with the current trends to eliminate the dram even in midrange models (think WD SN550). I think its the algorithms not doing well enough to hold that cheap rubbish hardware together. Personally i ve experienced a couple of ssd failures, and just to get a bit more fail-safe (at least in the main desktop rig) i chose to have a dedicated os drive - a 32gig optane module as a standalone ssd. And so far it works as an above average ssd. Even though its just got a basic intel microcontroller and 2 banks of 3d xpoint memory. Windows treats it like an ssd and the trim is working. Also got a cheapass WDS240 (only for games tho), an WD SN750 for some random stuff and also some old 2010 wd blue 1tb with 15k hours and issue-free smart.

Shrek · Mar 5, 2022

Seems my concerns are ill founded
Are SSDs as reliable as HDDs now? (msn.com)

vigor · Mar 6, 2022

Mine is still good after 7 years, a SanDisk Extreme Pro 960GB

AsRock · Mar 6, 2022

vigor said:
Mine is still good after 7 years, a SanDisk Extreme Pro 960GB

I have 2 original Intel SSD's set me back near $500 for 2x80GB and they still work today, in the end there is a lot to go wrong and i would not be surprised it's just a cheap component on the ssd that gets sorted and makes it fail.

Cutechri · Mar 6, 2022

ThrashZone said:
linux never running trim on it

Ah yes, I love having to enable automatic TRIM myself on an OS by editing the file that automounts drives at boot (another thing I had to do myself, setting my drives to be actually mounted on startup. Why??) - had to do that while I was testing Linux. Needless to say, never again.

freeagent said:
I had a crucial M4 256 fail with 98% left..

Guess which brand I won’t be buying

Jeez. Both my MX500's are manually updated to the latest firmware and both at 99% and 98% health respectively, been using them for over a year. Hope they last.

The real star in my PC is a SanDisk X400 that I've had for ages now and it's still at 95% life despite all the hell and formats I've put it through. It's still as quick as my MX500s as well.

My 980 Pro is at 94% and 20 TBW after over a year, should be fine to last until that 300 TBW. If it dies, I'll just replace it - my important backups are on the cloud anyway.

droid-I · Mar 6, 2022

This is an interesting question. Had to do a brief read-up on this (yes, been using various storage methods since the m-tape ages

Mechanical discs usually show some signs of wear & tear before failing, SSDs seldom do that, IMO.

Write amplification - Wikipedia

en.wikipedia.org

Mean time between failures - Wikipedia

en.wikipedia.org

Shrek · Mar 6, 2022

I really like the idea of soft failure (warning before total failure)

eidairaman1 · Mar 7, 2022

Andy Shiekh said:
I really like the idea of soft failure (warning before total failure)

Back up

Shrek · Mar 7, 2022

Absolutely

RJARRRPCGP · Mar 7, 2022

With SSDs, expect the first symptom to suddenly be an error in the event log about a file being corrupted or SFC failing and saying that it can't fix some files, when SFC /scannow is ran!
Along with extreme lag!

Symptoms are based on failures of 2 different SSDs.

Aquinus · Mar 7, 2022

Andy Shiekh said:
Why can't a solid state drive include a similar second CPU so it can limp along after failure and allow access to the data?

If the actual storage medium fails, that'll do you no good. DRAM and Flash memory are far more likely to fail than the controller driving them. What SSDs need is more redundancy. More ECC bits gives you more resilience at the cost of capacity. Not every bit has to be committed to novel data.

freeagent said:
I had a crucial M4 256 fail with 98% left..

I had two WD Black drive fail within a week of buying them. I still buy WD though. The reality is that hardware can fail early in its life which is why a good stress test on new hardware is never a bad idea. I would have caught the two bad drives if I had properly vetted them before adding them to my RAID.

System Name	CyberPowerPC ET8070
Processor	Intel Core i5-10400F
Motherboard	Gigabyte B460M DS3H AC-Y1
Memory	2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s)	MSI Nvidia GeForce GTX 1660 Super
Storage	Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s)	Dell P2416D (2560 x 1440)
Power Supply	EVGA 500W1 (modified to have two bridge rectifiers)
Software	Windows 11 Home

Processor	AMD 3900X \ AMD 7700X
Motherboard	ASRock AM4 X570 Pro 4 \ ASUS X670Xe TUF
Cooling	D15
Memory	Patriot 2x16GB PVS432G320C6K \ G.Skill Flare X5 F5-6000J3238F 2x16GB
Video Card(s)	eVga GTX1060 SSC \ XFX RX 6950XT RX-695XATBD9
Storage	Sammy 860, MX500, Sabrent Rocket 4 Sammy Evo 980 \ 1xSabrent Rocket 4+, Sammy 2x990 Pro
Display(s)	Samsung 1080P \ LG 43UN700
Case	Fractal Design Pop Air 2x140mm fans from Torrent \ Fractal Design Torrent 2 SilverStone FHP141x2
Audio Device(s)	Yamaha RX-V677 \ Yamaha CX-830+Yamaha MX-630 \Paradigm 7se MKII, Paradigm 5SE MK1 , Blue Yeti
Power Supply	Seasonic Prime TX-750 \ Corsair RM1000X Shift
Mouse	Steelseries Sensei wireless \ Steelseries Sensei wireless
Keyboard	Logitech K120 \ Wooting Two HE
Benchmark Scores	Meh benchmarks.

Processor	AMD R9 9900X @ booost
Motherboard	Asus Strix X670E-F
Cooling	Thermalright Phantom Spirit 120 EVO, 2x T30
Memory	2x 16GB Lexar Ares @ 6400 28-36-36-68 1.55v
Video Card(s)	Zotac 4070 Ti Trinity OC @ 3045/1500
Storage	WD SN850 1TB, SN850X 2TB, 2x SN770 1TB
Display(s)	LG 50UP7100
Case	Asus ProArt PA602
Audio Device(s)	JBL Bar 700
Power Supply	Seasonic Vertex GX-1000, Monster HDP1800
Mouse	Logitech G502 Hero
Keyboard	Logitech G213
VR HMD	Oculus 3
Software	Yes
Benchmark Scores	Yes

System Name	Dusty
Processor	5900x
Motherboard	MSI B550 Tomahawk
Cooling	Noctua NH-D15
Memory	Corsair Vengence LPX 32GB
Video Card(s)	MSI RTX 3070 Gaming X
Storage	yes
Case	Fractal Design Define R6
Power Supply	EVGA SuperNOVA 750w
VR HMD	Oculus CV1

System Name	CyberPowerPC ET8070
Processor	Intel Core i5-10400F
Motherboard	Gigabyte B460M DS3H AC-Y1
Memory	2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s)	MSI Nvidia GeForce GTX 1660 Super
Storage	Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s)	Dell P2416D (2560 x 1440)
Power Supply	EVGA 500W1 (modified to have two bridge rectifiers)
Software	Windows 11 Home

Why do Solid State Drives fail so suddenly?

Shrek

AsRock

TPU addict

freeagent

Moderator

elghinnarisa

Shrek

ShiBDiB

elghinnarisa

ThrashZone

How long do Hard Drives and SSDs last? How do they die over time?

freeagent

Moderator

chrcoluk

mechtech

RJARRRPCGP

GerKNG

Nike_486DX

Shrek

vigor

AsRock

TPU addict

Cutechri

droid-I

Write amplification - Wikipedia

Mean time between failures - Wikipedia

Shrek

eidairaman1

The Exiled Airman

Shrek

RJARRRPCGP

Aquinus

Resident Wat-man

System Name	[Daily Driver]
Processor	[Ryzen 7 5800X3D]
Motherboard	[MSI MAG B550 TOMAHAWK]
Cooling	[be quiet! Dark Rock Slim]
Memory	[64GB Crucial Pro 3200MHz (32GBx2)]
Video Card(s)	[PNY RTX 3070Ti XLR8]
Storage	[1TB SN850 NVMe, 4TB 990 Pro NVMe, 2TB 870 EVO SSD, 2TB SA510 SSD]
Display(s)	[2x 27" HP X27q at 1440p]
Case	[Fractal Meshify-C]
Audio Device(s)	[Fanmusic TRUTHEAR IEM, HyperX Duocast]
Power Supply	[CORSAIR RMx 1000]
Mouse	[Logitech G Pro Wireless]
Keyboard	[Logitech G512 Carbon (GX-Brown)]
Software	[Windows 11 64-Bit]

System Name	Ghetto Rigs z490\|x99\|Acer 17 Nitro 7840hs/ 5600c40-2x16/ 4060/ 1tb acer stock m.2/ 4tb sn850x
Processor	10900k w/Optimus Foundation \| 5930k w/Black Noctua D15
Motherboard	z490 Maximus XII Apex \| x99 Sabertooth
Cooling	oCool D5 res-combo/280 GTX/ Optimus Foundation/ gpu water block \| Blk D15
Memory	Trident-Z Royal 4000c16 2x16gb \| Trident-Z 3200c14 4x8gb
Video Card(s)	Titan Xp-water \| evga 980ti gaming-w/ air
Storage	970evo+500gb & sn850x 4tb \| 860 pro 256gb \| Acer m.2 1tb/ sn850x 4tb\| Many2.5" sata's ssd 3.5hdd's
Display(s)	1-AOC G2460PG 24"G-Sync 144Hz/ 2nd 1-ASUS VG248QE 24"/ 3rd LG 43" series
Case	D450 \| Cherry Entertainment center on Test bench
Audio Device(s)	Built in Realtek x2 with 2-Insignia 2.0 sound bars & 1-LG sound bar
Power Supply	EVGA 1000P2 with APC AX1500 \| 850P2 with CyberPower-GX1325U
Mouse	Redragon 901 Perdition x3
Keyboard	G710+x3
Software	Win-7 pro x3 and win-10 & 11pro x3
Benchmark Scores	Are in the benchmark section

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

Processor	Ryzen 5700x
Motherboard	Gigabyte X570S Aero G R1.1 BiosF5g
Cooling	Noctua NH-C12P SE14 w/ NF-A15 HS-PWM Fan 1500rpm
Memory	Micron DDR4-3200 2x32GB D.S. D.R. (CT2K32G4DFD832A)
Video Card(s)	AMD RX 6800 - Asus Tuf
Storage	Kingston KC3000 1TB & 2TB & 4TB Corsair MP600 Pro LPX
Display(s)	LG 27UL550-W (27" 4k)
Case	Be Quiet Pure Base 600 (no window)
Audio Device(s)	Realtek ALC1220-VB
Power Supply	SuperFlower Leadex V Gold Pro 850W ATX Ver2.52
Mouse	Mionix Naos Pro
Keyboard	Corsair Strafe with browns
Software	W10 22H2 Pro x64

System Name	KHR-1
Processor	Ryzen 9 5900X
Motherboard	ASRock B550 PG Velocita (UEFI-BIOS P3.40)
Memory	64 GB G.Skill RipJaws V F4-3200C16D-64GVK
Video Card(s)	Sparkle Titan Arc A770 16 GB
Storage	Western Digital Black SN850 1 TB NVMe SSD
Display(s)	Alienware AW3423DWF OLED-ASRock PG27Q15R2A (backup)
Case	Corsair 275R
Audio Device(s)	Technics SA-EX140 receiver with Polk VT60 speakers
Power Supply	eVGA Supernova G3 750W
Mouse	Logitech G Pro (Hero)
Software	Windows 11 Pro x64 23H2

Processor	AMD Ryzen 9 9950X3D
Motherboard	ASRock B850M PRO-A
Cooling	Corsair Nautilus 360 RS
Memory	2x32GB Kingston Fury Beast 6000 CL30
Video Card(s)	PowerColor Hellhound RX 9070 XT
Storage	1TB Samsung 990 Pro, 2TB Samsung 990 Pro, 4TB Samsung 990 Pro
Display(s)	LG 27GS95QE-B, MSI G272QPF E2
Case	Lian Li DAN Case A3 Black Wood Edition
Audio Device(s)	Bose Companion Series 2 III, Sennheiser GSP600 and HD599 SE - Creative Soundblaster X4
Power Supply	Corsair RM1000X ATX 3.1
Mouse	Razer Deathadder V3
Keyboard	Razer Black Widow V3 TKL
VR HMD	Oculus Rift S

System Name	Mini efficient rig.
Processor	R9 3900, @4ghz -0.05v offset. 110W peak.
Motherboard	Gigabyte B450M DS3H, bios f41 pcie 4.0 unlocked.
Cooling	some server blower @1500rpm
Memory	2x16GB oem Samsung D-Die. 3200MHz
Video Card(s)	RX 6600 Pulse w/conductonaut @65C hotspot
Storage	1x 128gb nvme Samsung 950 Pro - 4x 1tb sata Hitachi 2.5" hdds
Display(s)	Samsung C24RG50FQI
Case	Jonsbo C2 (almost itx sized)
Audio Device(s)	integrated Realtek crap
Power Supply	Seasonic SSR-750FX
Mouse	Logitech G502
Keyboard	Redragon K539 brown switches
Software	Windows 7 Ultimate SP1 + Windows 10 21H2 LTSC (patched).
Benchmark Scores	Cinebench: R15 3050 pts, R20 7000 pts, R23 17800 pts, r2024 1050 pts.

System Name	sleeper Compaq
Processor	Ryzen 5 3600
Motherboard	MSI B550M Pro-VDH WiFi
Cooling	ARCTIC Freezer 7 X, Nexus D12SL-12
Memory	PNY XLR8 Gaming 2x8GB 3200Mhz
Video Card(s)	Asus GTX750Ti GDDR5 2GB
Storage	XPG SX6000 Lite 128 M.2 NVMe, Seagate 500GB
Display(s)	Sony 32" bravia
Case	Compaq SR5700 series
Audio Device(s)	Samsung Pleomax S2-500B 2.1. Sennheiser hp.
Power Supply	SilverStone 500W Strider Essential ST50F-ES230 V2.0
Mouse	Logitech G1
Keyboard	Logitech 350
Software	Win11 Home

System Name	PCGOD
Processor	AMD FX 8350@ 5.0GHz
Motherboard	Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling	Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory	16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s)	AMD Radeon 290 Sapphire Vapor-X
Storage	Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s)	NEC Multisync LCD 1700V (Display Port Adapter)
Case	AeroCool Xpredator Evil Blue Edition
Audio Device(s)	Creative Labs Sound Blaster ZxR
Power Supply	Seasonic 1250 XM2 Series (XP3)
Mouse	Roccat Kone XTD
Keyboard	Roccat Ryos MK Pro
Software	Windows 7 Pro 64

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.3.1