HP Enterprise SSD Firmware Bug Causes them to Fail at 32,768 Hours of Use, Fix Released

btarunr · Nov 27, 2019

HP issued a warning to its customers that some of its SAS SSDs come with a bug that causes them to fail at exactly 32,768 hours of use. For an always-on or high-uptime server, this translates to 3 years, 270 days and 8 hours of usage. The affected models of SSDs are shipped in many of HP's flagship server and storage products, spanning its HPE ProLiant, Synergy, Apollo, JBOD D3xxx, D6xxx, D8xxx, MSA, StoreVirtual 4335 and StoreVirtual 3200 product-lines.

HP has released an SSD firmware update that fixes this bug and cannot stress the importance of deploying the update enough. This is because once a drive hits the 32,768-hour literal deadline and breaks down, both the drive and the data on it become unrecoverable. There is no other mitigation to this bug than the firmware update. HP released easy to use online firmware update tools that let admins update firmware of their drivers from within their OS. The online firmware update tools support Linux, Windows, and VMWare. Below is a list of affected drives. Get the appropriate firmware update from this page.

View at TechPowerUp Main Site

DeathtoGnomes · Nov 27, 2019

why does that number look familiar?

btarunr · Nov 27, 2019

DeathtoGnomes said:
why does that number look familiar?

It's 32 kibi hours.

piloponth · Nov 27, 2019

DeathtoGnomes said:
why does that number look familiar?

2^15

Easo · Nov 27, 2019

This is SSD manufacturer's fault, most likely. While HPE does use custom firmwares, I really doubt they write them from scratch for, say, Samsung drives. It is possible that Dell and others also may come forward soon.

R-T-B · Nov 27, 2019

Easo said:
This is SSD manufacturer's fault, most likely. While HPE does use custom firmwares, I really doubt they write them from scratch for, say, Samsung drives. It is possible that Dell and others also may come forward soon.

Possibly, but HP would certainly not be the only one noticing effects from this then I'd think?

I would be curious what controller the drives use. It sounds like it triggers a SSD controller reset, judging from the value I can only assume it is a "value wrap" situation which the controller drtects and freaks out about, triggering a drive wide reset including the onboard encryption keys.

If so... Much dumb, very dead, WOW.

LocutusH · Nov 27, 2019

A fix for a bug... sure... who believes this?
There you have your proof, that planned obscolescence on purpose exists.

I am sure they implemented this on purpose, just didnt expect anyone to find out, why the drives die shortly after the warranty period expires...

I hope someone sues HP, and forces all other manufacturers too, to stop this practice.

Yukikaze · Nov 27, 2019

LocutusH said:
A fix for a bug... sure... who believes this?
There you have your proof, that planned obscolescence on purpose exists.

I am sure they implemented this on purpose, just didnt expect anyone to find out, why the drives die shortly after the warranty period expires...

I hope someone sues HP, and forces all other manufacturers too, to stop this practice.

It is amazing what some people believe.

Easo · Nov 27, 2019

LocutusH said:
A fix for a bug... sure... who believes this?
There you have your proof, that planned obscolescence on purpose exists.

I am sure they implemented this on purpose, just didnt expect anyone to find out, why the drives die shortly after the warranty period expires...

I hope someone sues HP, and forces all other manufacturers too, to stop this practice.

We are talking about enterprise drives here, Mr. Conspiracy. HPE are not that retarded to fuck with their primary clients, some of which are bigger than them. It is not like there are bunch of other storage manufacturers who would be more than happy to take HPE's share in that case.

Nater · Nov 27, 2019

Easo said:
We are talking about enterprise drives here, Mr. Conspiracy. HPE are not that retarded to fuck with their primary clients, some of which are bigger than them. It is not like there are bunch of other storage manufacturers who would be more than happy to take HPE's share in that case.

Nothing says HP made these drives. Being they're keeping the mfg unnamed, probably means they did though. Otherwise I'd be throwing my source under the bus to protect my brand.

To further the conspiracies tho, I remember back in the college days(early 2000s) I had Wifi routers from Netgear and D-Link that would die literally days after the 1 year warranty was up. 3 years straight IIRC. I then bought a Linksys WRT54G (model?) and reflashed it to that third party DD-WRT firmware, and it's still running to this day at my in-laws far as I know (~10+ years?).

Rich Riedl · Nov 27, 2019

From the HPE bullitin: (https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00092491en_us)

HPE was notified by a Solid State Drive (SSD) manufacturer of a firmware defect affecting certain SAS SSD models (reference the table below) used in a number of HPE server and storage products (i.e., HPE ProLiant, Synergy, Apollo, JBOD D3xxx, D6xxx, D8xxx, MSA, StoreVirtual 4335 and StoreVirtual 3200 are affected).

The issue affects SSDs with an HPE firmware version prior to HPD8 that results in SSD failure at 32,768 hours of operation (i.e., 3 years, 270 days 8 hours). After the SSD failure occurs, neither the SSD nor the data can be recovered. In addition, SSDs which were put into service at the same time will likely fail nearly simultaneously.

Deleted member 158293 · Nov 27, 2019

"After the SSD failure occurs, neither the SSD nor the data can be recovered. In addition, SSDs which were put into service at the same time will likely fail nearly simultaneously."

:eek:

Backup Restore hell awaits...

Not acceptable at all from HP...

Steevo · Nov 27, 2019

Nater said:
Nothing says HP made these drives. Being they're keeping the mfg unnamed, probably means they did though. Otherwise I'd be throwing my source under the bus to protect my brand.

To further the conspiracies tho, I remember back in the college days(early 2000s) I had Wifi routers from Netgear and D-Link that would die literally days after the 1 year warranty was up. 3 years straight IIRC. I then bought a Linksys WRT54G (model?) and reflashed it to that third party DD-WRT firmware, and it's still running to this day at my in-laws far as I know (~10+ years?).

Those were caused by the plague of cheap electrolytic capacitors. There were more expensive and higher quality but almost everyone got bit by cheaper caps.

gamefoo21 · Nov 27, 2019

Steevo said:
Those were caused by the plague of cheap electrolytic capacitors. There were more expensive and higher quality but almost everyone got bit by cheaper caps.

The other dirty little thing they did was use low quality flash. Flashing the rom, things that wrote to the NVRAM...

Brickity brick...

efikkan · Nov 27, 2019

DeathtoGnomes said:
why does that number look familiar?

Every decent coder probably immediately understands what's going on here, this is an integer overflow, causing the firmware to crash. The maximum range for a signed 16-bit integer is 32767, add 1 to this and you'll get -32768, which probably causes undefined behavior in the firmware.

Those who wants to see what happens can run this:

C:

#include <stdio.h>
#include <stdint.h>

int main(int argc, char* argv[]) {
    int16_t test = 32767;
    printf("Before: %d\n", test);
    test++;
    printf("After: %d\n", test);

    return 0;
}

This will output:
Before: 32767
After: -32768

This is a well known rookie mistake, but there are in fact two mistakes here; 1) the small range for the integer and 2) whatever caused the crash after the overflow, where the second one is the serious one. This kind of bug is inexcusable in critical software like firmware.

So how did this mistake pass code review? Well, either the coder explicitly used a fixed precision integer type like int16_t, which should have made the overflow pretty obvious, or used int and the compiler chose a 16-bit integer for the embedded platform. For native code, I usually recommend using fixed precision integer types over int whenever possible as it makes potential overflows much more obvious, and it forces the coder to consciously choose an appropriate range.

Readlight · Nov 27, 2019

Why HP Windows 10 only haw standard AHC driver?

Vayra86 · Nov 27, 2019

LocutusH said:
A fix for a bug... sure... who believes this?
There you have your proof, that planned obscolescence on purpose exists.

I am sure they implemented this on purpose, just didnt expect anyone to find out, why the drives die shortly after the warranty period expires...

I hope someone sues HP, and forces all other manufacturers too, to stop this practice.

Isn't planned obscolescence always on purpose?

R-T-B · Nov 27, 2019

efikkan said:
Every decent coder probably immediately understands what's going on here, this is an integer overflow, causing the firmware to crash. The maximum range for a signed 16-bit integer is 32767, add 1 to this and you'll get -32768, which probably causes undefined behavior in the firmware.

Those who wants to see what happens can run this:

C:

#include <stdio.h> #include <stdint.h> int main(int argc, char* argv[]) { int16_t test = 32767; printf("Before: %d\n", test); test++; printf("After: %d\n", test); return 0; }

This will output:
Before: 32767
After: -32768

This is a well known rookie mistake, but there are in fact two mistakes here; 1) the small range for the integer and 2) whatever caused the crash after the overflow, where the second one is the serious one. This kind of bug is inexcusable in critical software like firmware.

So how did this mistake pass code review? Well, either the coder explicitly used a fixed precision integer type like int16_t, which should have made the overflow pretty obvious, or used int and the compiler chose a 16-bit integer for the embedded platform. For native code, I usually recommend using fixed precision integer types over int whenever possible as it makes potential overflows much more obvious, and it forces the coder to consciously choose an appropriate range.

I called that earlier. This is exactly what is going on. The fact that the firmware resets after is probably an antitampering measure biting them.

lexluthermiester · Nov 27, 2019

Easo said:
We are talking about enterprise drives here, Mr. Conspiracy. HPE are not that retarded to fuck with their primary clients, some of which are bigger than them. It is not like there are bunch of other storage manufacturers who would be more than happy to take HPE's share in that case.

While that is a bit "tinhat", what I find interesting(and simultaneously disturbing), is how easy such a scenario would be to pull off. And has it actually been done?

R-T-B · Nov 27, 2019

lexluthermiester said:
While that is a bit "tinhat", what I find interesting(and simultaneously disturbing), is how easy such a scenario would be to pull off. And has it actually been done?

I mean, HPs own consumer ink division has been doing this for some time (increments a counter on print ops, unrelated to ink level)... so yeah.

lexluthermiester · Nov 27, 2019

R-T-B said:
I mean, HPs own consumer ink division has been doing this for some time (increments a counter on print ops, unrelated to ink level)... so yeah.

Yeah, but that's printer ink. Not really a vital part of a system that can cause liability issues. SSD's are a critical component which have legal liability potential.

R-T-B · Nov 27, 2019

lexluthermiester said:
Yeah, but that's printer ink. Not really a vital part of a system that can cause liability issues. SSD's are a critical component which have legal liability potential.

I thought you were asking if it was technically done before?

Not comparing the practices by any means, just saying yep, it has.

rtwjunkie · Nov 27, 2019

LocutusH said:
I am sure they implemented this on purpose, just didnt expect anyone to find out, why the drives die shortly after the warranty period expires...

Maybe my math is off, but servers running these will reach that point in 1,333 days (roughly). At 365 days per year, that would be about 3.65 years until that bug failure is reached.

That’s a pretty good run. But if you want to go the conspiracy route, I’ll step out of your way. I’m not one to limit thiose who have a quest to tilt with windmills.

rutra80 · Nov 28, 2019

Reminds VelociRaptor firmware bug which would spit TLERs after a month.
Never understood how it became a worldwide advisory to use same drives in mirrored arrays.

Easo · Nov 28, 2019

Nater said:
Nothing says HP made these drives. Being they're keeping the mfg unnamed, probably means they did though. Otherwise I'd be throwing my source under the bus to protect my brand.

To further the conspiracies tho, I remember back in the college days(early 2000s) I had Wifi routers from Netgear and D-Link that would die literally days after the 1 year warranty was up. 3 years straight IIRC. I then bought a Linksys WRT54G (model?) and reflashed it to that third party DD-WRT firmware, and it's still running to this day at my in-laws far as I know (~10+ years?).

HPE literally does not manufacture SSD's. They are using rebranded ones - Samsung, Intel, I think Micron too.

R-T-B said:
I mean, HPs own consumer ink division has been doing this for some time (increments a counter on print ops, unrelated to ink level)... so yeah.

IIRC they got punished for that, no?
P.S.
HP and HPE split 4 years ago, just a reminder.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Dumbass
Processor	AMD Ryzen 7800X3D
Motherboard	ASUS TUF gaming B650
Cooling	Artic Liquid Freezer 2 - 420mm
Memory	G.Skill Sniper 32gb DDR5 6000
Video Card(s)	GreenTeam 4070 ti super 16gb
Storage	Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s)	1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case	Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s)	onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply	Corsair HX1000i
Mouse	Steeseries Esports Wireless
Keyboard	Corsair K100
Software	windows 10 H
Benchmark Scores	https://i.imgur.com/aoz3vWY.jpg?2

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Personal \\ Work - HP EliteBook 840 G6
Processor	7700X \\ i7-8565U
Motherboard	Asrock X670E PG Lightning
Cooling	Noctua DH-15
Memory	G.SKILL Trident Z5 RGB Black 32GB 6000MHz CL36 \\ 16GB DDR4-2400
Video Card(s)	ASUS RoG Strix 1070 Ti \\ Intel UHD Graphics 620
Storage	2x KC3000 2TB, Samsung 970 EVO 512GB \\ OEM 256GB NVMe SSD
Display(s)	BenQ XL2411Z \\ FullHD + 2x HP Z24i external screens via docking station
Case	Fractal Design Define Arc Midi R2 with window
Audio Device(s)	Realtek ALC1150 with Logitech Z533
Power Supply	Corsair AX860i
Mouse	Logitech G502
Keyboard	Corsair K55 RGB PRO
Software	Windows 11 \\ Windows 10

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	MSI MAG X670E Tomahawk Wifi
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6200(Running 1T no GDM)
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs, 1x 2TB Seagate Exos 3.5"
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64, other office machines run Windows 11 Enterprise

HP Enterprise SSD Firmware Bug Causes them to Fail at 32,768 Hours of Use, Fix Released

btarunr

Editor & Senior Moderator

DeathtoGnomes

btarunr

Editor & Senior Moderator

piloponth

Easo

R-T-B

LocutusH

Yukikaze

Easo

Nater

Rich Riedl

Deleted member 158293

Guest

Steevo

gamefoo21

efikkan

Readlight

Vayra86

R-T-B

lexluthermiester

R-T-B

lexluthermiester

R-T-B

rtwjunkie

PC Gaming Enthusiast

rutra80

Easo

System Name	Ryzen shine, Mr Freeman
Processor	5900X
Motherboard	ASUS X570 Dark Hero
Cooling	Arctic Liquid Freezer III 360 ARGB
Memory	32GB TridentZ Neo 3600 CL14
Video Card(s)	MSI Supreme 4080
Storage	2TB 970 EVO PLUS, 1TB 980
Display(s)	LG OLED 42C3
Case	O11D XL Black
Audio Device(s)	Xonar Essence STU, Mackie MR5+MR10S, HD598
Power Supply	Seasonic Prime Titanium 850W
Mouse	GPW
Keyboard	Epomaker TH96

System Name	Dire Wolf IV
Processor	Intel Core i9 14900K
Motherboard	Asus ROG STRIX Z790-I GAMING WIFI
Cooling	Arctic Liquid Freezer II 280 w/Thermalright Contact Frame
Memory	2x24GB Corsair DDR5-6600
Video Card(s)	NVIDIA RTX4080 FE
Storage	Intel Optane P5801X 400GB + AORUS 7300 1TB
Display(s)	Alienware AW3423DWF (QD-OLED, 3440x1440, 165hz)
Case	Corsair Airflow 2000D
Power Supply	Corsair SF1000L
Mouse	Razer Deathadder Essential
Keyboard	E-Yooso Rapid Trigger 80%
Software	Windows 11 Professional

System Name	Fractal
Processor	Intel Core i5 13600K
Motherboard	Asus ProArt Z790 Creator WiFi
Cooling	Arctic Cooling Liquid Freezer II 360
Memory	16GBx2 G.SKILL Ripjaws S5 DDR5 6000 CL30-40-40-96 (F5-6000J3040F16GX2-RS5K)
Video Card(s)	PNY RTX A2000 6GB
Storage	SK Hynix Platinum P41 2TB
Display(s)	LG 34GK950F-B (34"/IPS/1440p/21:9/144Hz/FreeSync)
Case	Fractal Design R6 Gunmetal Blackout w/ USB-C
Audio Device(s)	Steelseries Arctis 7 Wireless/Klipsch Pro-Media 2.1BT
Power Supply	Seasonic Prime 850w 80+ Titanium
Mouse	Logitech G700S
Keyboard	Corsair K68
Software	Windows 11 Pro

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

System Name	R2V2 *In Progress
Processor	Ryzen 7 2700
Motherboard	Asrock X570 Taichi
Cooling	W2A... water to air
Memory	G.Skill Trident Z3466 B-die
Video Card(s)	Radeon VII repaired and resurrected
Storage	Adata and Samsung NVME
Display(s)	Samsung LCD
Case	Some ThermalTake
Audio Device(s)	Asus Strix RAID DLX upgraded op amps
Power Supply	Seasonic Prime something or other
Software	Windows 10 Pro x64

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	Fujitsu Siemens, HP Workstation
Processor	Athlon x2 5000+ 3.1GHz, i5 2400
Motherboard	Asus
Memory	4GB Samsung
Video Card(s)	rx 460 4gb
Storage	750 Evo 250 +2tb
Display(s)	Asus 1680x1050 4K HDR
Audio Device(s)	Pioneer
Power Supply	430W
Mouse	Acme
Keyboard	Trust

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

Processor	Core i9-9900k
Motherboard	ASRock Z390 Phantom Gaming 6
Cooling	All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax ETS-T50 Black CPU cooler
Memory	32GB (2x16) Mushkin Redline DDR-4 3200
Video Card(s)	ASUS RTX 4070 Ti Super OC 16GB
Storage	1x 1TB MX500 (OS); 2x 6TB WD Black; 1x 2TB MX500; 1x 1TB BX500 SSD; 1x 6TB WD Blue storage (eSATA)
Display(s)	Infievo 27" 165Hz @ 2560 x 1440
Case	Fractal Design Define R4 Black -windowed
Audio Device(s)	Soundblaster Z
Power Supply	Seasonic Focus GX-1000 Gold
Mouse	Coolermaster Sentinel III (large palm grip!)
Keyboard	Logitech G610 Orion mechanical (Cherry Brown switches)
Software	Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)