• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs

While that is a reasonable point, NAND flash simply doesn't have the durability to be useful long term in such a way.

What makes you think so? LLM weights (at least as of now) are static and once loaded in memory they won't need to be modified unless you need to replace them entirely with something else. Since datacenter GPUs will basically never be turned off and the HBF isn't going to store irreplaceable data anyway (the weights will likely be first read from slower long-term storage devices), data retention doesn't need to be very long, and this will increase the number of write/erase cycles allowed.

Another reasonable point, however, that was not the claim made in the above article.

The linked original presentation from SanDisk is showing one such configurations on page 99:
 
I don't understand why vram is so limited these days for the price we pay for video cards, they could launch cards with 512gb 1tb even if it's not the total used all the time as it would need a lot of speed for reading, they could make cards to store vram for AI it's very important. AMD should focus on cards with large vram for personal use for AI
 
What makes you think so? LLM weights (at least as of now) are static and once loaded in memory they won't need to be modified unless you need to replace them entirely with something else.
Yes, but they have to be updated everytime they are altered and that means block erase/write cycles. This happens more frequently than you think.

HBF can not replace HBM. Augment it, maybe. Replace? Absolutely not.
 
Deployed LLMs don't get updated as frequently as you think. Even if that occurred daily, that would be 3650 program/erase cycles over 10 years of service, which should be easy to attain for Flash memory that doesn't need to have an end-life data retention longer than hours or even minutes.
 
Deployed LLMs don't get updated as frequently as you think. Even if that occurred daily, that would be 3650 program/erase cycles over 10 years of service, which should be easy to attain for Flash memory that doesn't need to have an end-life data retention longer than hours or even minutes.
That would only be true IF the end user stays on the same LLM all the time. Most do not. It depends on the required task. For this tech to be of ANY benefit, the LLM would need to be dynamically switchable on the fly. That means lots of erase/write cycles.
 
data retention doesn't need to be very long, and this will increase the number of write/erase cycles allowed.
One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.
 
One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.
That's an interesting idea. I don't think that is what Sandisk is marketing at though.
 
That would only be true IF the end user stays on the same LLM all the time. Most do not. It depends on the required task. For this tech to be of ANY benefit, the LLM would need to be dynamically switchable on the fly. That means lots of erase/write cycles.

I don't know where you got the idea that cloud AI model providers switch LLMs on the fly that frequently. It isn't happening at a small scale (tens~hundreds of simultaneous users)—the same models get served continuously for at least days or weeks—and at a large scale (up to hundreds of thousands of users) they will have entire GPU clusters dedicated to specific models in order to increase availability as much as possible.

One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.

I imagine this would more easily imply hardware-level support for quantized AI model weights. Every low-precision model parameter (e.g. in 4- or 5- bit) could be directly mapped to raw NAND cells for potentially improved performance.
 
and at a large scale (up to hundreds of thousands of users) they will have entire GPU clusters dedicated to specific models in order to increase availability as much as possible.
That's a good point. Had not thought about it in that wide of a scale. What was striking is the idea of replacing DRAM with NAND. It seems like a foolish idea and I'm highly dubious of it.
 
That's a good point. Had not thought about it in that wide of a scale. What was striking is the idea of replacing DRAM with NAND. It seems like a foolish idea and I'm highly dubious of it.
Also, it can be mixed HBM+HBF. One of the slides at Tom's shows such a case.
 
Yeah, no. NAND flash is not RAM, it is designed for entirely different usage patterns, and the notion that it could be used as a replacement for RAM is nonsensical. Considering GPUs already have effectively direct access to storage via APIs like DirectStorage, I see no use-case for this technology.

Welp, glad you weighed in, all those researchers and engineers can go back to doing stuff that's actually good for something now. /s

My God, the Ego to make a statement like "I don't see any use-case", like these people overlooked something that you just armchaired into.

This connects like HBM, the access is much more direct, so it's far faster. LLMs don't need a lot of write, mostly read (relatively), and it can be paired with HBM or VRAM for the things that do need writes. Loading up a huge LLM like this would be huge.

It's niche, but it's one hell of a niche.
 
Back
Top