SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs

Solid State Brain · Feb 14, 2025

lexluthermiester said:
While that is a reasonable point, NAND flash simply doesn't have the durability to be useful long term in such a way.

What makes you think so? LLM weights (at least as of now) are static and once loaded in memory they won't need to be modified unless you need to replace them entirely with something else. Since datacenter GPUs will basically never be turned off and the HBF isn't going to store irreplaceable data anyway (the weights will likely be first read from slower long-term storage devices), data retention doesn't need to be very long, and this will increase the number of write/erase cycles allowed.

lexluthermiester said:
Another reasonable point, however, that was not the claim made in the above article.

The linked original presentation from SanDisk is showing one such configurations on page 99:

https://documents.sandisk.com/content/dam/asset-library/en_us/assets/public/sandisk/corporate/Sandisk-Investor-Day_2025.pdf

duckface · Feb 14, 2025

I don't understand why vram is so limited these days for the price we pay for video cards, they could launch cards with 512gb 1tb even if it's not the total used all the time as it would need a lot of speed for reading, they could make cards to store vram for AI it's very important. AMD should focus on cards with large vram for personal use for AI

lexluthermiester · Feb 14, 2025

Solid State Brain said:
What makes you think so? LLM weights (at least as of now) are static and once loaded in memory they won't need to be modified unless you need to replace them entirely with something else.

Yes, but they have to be updated everytime they are altered and that means block erase/write cycles. This happens more frequently than you think.

HBF can not replace HBM. Augment it, maybe. Replace? Absolutely not.

Solid State Brain · Feb 14, 2025

Deployed LLMs don't get updated as frequently as you think. Even if that occurred daily, that would be 3650 program/erase cycles over 10 years of service, which should be easy to attain for Flash memory that doesn't need to have an end-life data retention longer than hours or even minutes.

lexluthermiester · Feb 14, 2025

Solid State Brain said:
Deployed LLMs don't get updated as frequently as you think. Even if that occurred daily, that would be 3650 program/erase cycles over 10 years of service, which should be easy to attain for Flash memory that doesn't need to have an end-life data retention longer than hours or even minutes.

That would only be true IF the end user stays on the same LLM all the time. Most do not. It depends on the required task. For this tech to be of ANY benefit, the LLM would need to be dynamically switchable on the fly. That means lots of erase/write cycles.

Wirko · Feb 14, 2025

Solid State Brain said:
data retention doesn't need to be very long, and this will increase the number of write/erase cycles allowed.

One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.

lexluthermiester · Feb 14, 2025

Wirko said:
One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.

That's an interesting idea. I don't think that is what Sandisk is marketing at though.

Solid State Brain · Feb 14, 2025

lexluthermiester said:
That would only be true IF the end user stays on the same LLM all the time. Most do not. It depends on the required task. For this tech to be of ANY benefit, the LLM would need to be dynamically switchable on the fly. That means lots of erase/write cycles.

I don't know where you got the idea that cloud AI model providers switch LLMs on the fly that frequently. It isn't happening at a small scale (tens~hundreds of simultaneous users)—the same models get served continuously for at least days or weeks—and at a large scale (up to hundreds of thousands of users) they will have entire GPU clusters dedicated to specific models in order to increase availability as much as possible.

Wirko said:
One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.

I imagine this would more easily imply hardware-level support for quantized AI model weights. Every low-precision model parameter (e.g. in 4- or 5- bit) could be directly mapped to raw NAND cells for potentially improved performance.

lexluthermiester · Feb 14, 2025

Solid State Brain said:
and at a large scale (up to hundreds of thousands of users) they will have entire GPU clusters dedicated to specific models in order to increase availability as much as possible.

That's a good point. Had not thought about it in that wide of a scale. What was striking is the idea of replacing DRAM with NAND. It seems like a foolish idea and I'm highly dubious of it.

Wirko · Feb 14, 2025

lexluthermiester said:
That's a good point. Had not thought about it in that wide of a scale. What was striking is the idea of replacing DRAM with NAND. It seems like a foolish idea and I'm highly dubious of it.

Also, it can be mixed HBM+HBF. One of the slides at Tom's shows such a case.

LMTMFA · Feb 14, 2025

Assimilator said:
Yeah, no. NAND flash is not RAM, it is designed for entirely different usage patterns, and the notion that it could be used as a replacement for RAM is nonsensical. Considering GPUs already have effectively direct access to storage via APIs like DirectStorage, I see no use-case for this technology.

Welp, glad you weighed in, all those researchers and engineers can go back to doing stuff that's actually good for something now. /s

My God, the Ego to make a statement like "I don't see any use-case", like these people overlooked something that you just armchaired into.

This connects like HBM, the access is much more direct, so it's far faster. LLMs don't need a lot of write, mostly read (relatively), and it can be paired with HBM or VRAM for the things that do need writes. Loading up a huge LLM like this would be huge.

It's niche, but it's one hell of a niche.

Processor	Intel i7-12700K
Motherboard	MSI PRO Z690-A WIFI
Cooling	Noctua NH-D15S
Memory	Corsair Vengeance 4x16 GB (64GB) DDR4-3600 C18
Video Card(s)	MSI GeForce RTX 3090 GAMING X TRIO 24G
Storage	Samsung 980 Pro 1TB, SK hynix Platinum P41 2TB
Case	Fractal Define C
Power Supply	Corsair RM850x
Mouse	Logitech G203
Software	openSUSE Tumbleweed

Processor	Intel i7-12700K
Motherboard	MSI PRO Z690-A WIFI
Cooling	Noctua NH-D15S
Memory	Corsair Vengeance 4x16 GB (64GB) DDR4-3600 C18
Video Card(s)	MSI GeForce RTX 3090 GAMING X TRIO 24G
Storage	Samsung 980 Pro 1TB, SK hynix Platinum P41 2TB
Case	Fractal Define C
Power Supply	Corsair RM850x
Mouse	Logitech G203
Software	openSUSE Tumbleweed

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

Processor	Intel i7-12700K
Motherboard	MSI PRO Z690-A WIFI
Cooling	Noctua NH-D15S
Memory	Corsair Vengeance 4x16 GB (64GB) DDR4-3600 C18
Video Card(s)	MSI GeForce RTX 3090 GAMING X TRIO 24G
Storage	Samsung 980 Pro 1TB, SK hynix Platinum P41 2TB
Case	Fractal Define C
Power Supply	Corsair RM850x
Mouse	Logitech G203
Software	openSUSE Tumbleweed

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs

New Member