• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

Time will tell how silly this market will get over time, I guess.

Here you go..

Would you look at that. Nice to see it might actually come out, this Frank guy really sucks when it comes to rumors honestly. Too early to say "I told you so" though, I'll wait for something to actually get released to do that :cool:

Agree, but I think that eventually given the rising the demands of LLMs, they'll develop a solution that is dedicated to training and inference. IMHO, Project DIGITS is probably a "prototype" of sorts, remember the CMP crypto mining processor series? It'll likely be something like that, but not rushed out and derived from gaming cards as those used to be: probably a dedicated GPGPU or FPGA processor that lacks a display engine and pretty much every other area useful for most computing tasks, but tailored specifically for inferencing, kind of something like this AMD/Xilinx card, which is a bit older now:
Eh, they'll have to do something at some point. Once CPUs get better at inference and training, people might just use that instead. I don't think devs are even doing anything for XDNA hardware, but unified memory alone is pretty huge. Strix and Medusa Halo should come cheaper than Nvidia's $3k option...
 
Would you look at that. Nice to see it might actually come out, this Frank guy really sucks when it comes to rumors honestly. Too early to say "I told you so" though, I'll wait for something to actually get released to do that :cool:


Eh, they'll have to do something at some point. Once CPUs get better at inference and training, people might just use that instead. I don't think devs are even doing anything for XDNA hardware, but unified memory alone is pretty huge. Strix and Medusa Halo should come cheaper than Nvidia's $3k option...

Unified memory is only necessary because LLM data sets are too large to fit on a GPU's dedicated VRAM. That enables marketing departments to make deceitful, misleading slides that fanboys of respective companies (and this is not a problem with AMD or NVIDIA, it applies to pretty much both of them) will often parrot without question, such as this little gem right here:

aimax.png


Of course it's "up to 2.2x faster" when you can actually load the model onto memory (provided you have at least 96 or 128 GB of RAM in this case), and you're not at all compute bottlenecked, which is the issue quite literally any GPU short of NVIDIA's many-thousand-dollar, 80 GB+ HBM AI accelerators right now. Needless to say, the person who posted this slide to me as a rebuttal on X (where I told them, that if they believed this product was faster than a 4090 at anything, I had a bridge to sell 'em) summarily blocked me right after posting it and calling me a "smug f**k", go figure. For context, Ryzen AI Max Plus 395 is rated at 126 AI TOPS, an RTX 4090 is, at a worst case scenario basis, 10x faster.
 
Unified memory is only necessary because LLM data sets are too large to fit on a GPU's dedicated VRAM. That enables marketing departments to make deceitful, misleading slides that fanboys of respective companies (and this is not a problem with AMD or NVIDIA, it applies to pretty much both of them) will often parrot without question, such as this little gem right here:

View attachment 385052

Of course it's "up to 2.2x faster" when you can actually load the model onto memory (provided you have at least 96 or 128 GB of RAM in this case), and you're not at all compute bottlenecked, which is the issue quite literally any GPU short of NVIDIA's many-thousand-dollar, 80 GB+ HBM AI accelerators right now. Needless to say, the person who posted this slide to me as a rebuttal on X (where I told them, that if they believed this product was faster than a 4090 at anything, I had a bridge to sell 'em) summarily blocked me right after posting it and calling me a "smug f**k", go figure. For context, Ryzen AI Max Plus 395 is rated at 126 AI TOPS, an RTX 4090 is, at a worst case scenario basis, 10x faster.
Unified memory is pretty fast in it's own right and has an APU that can use it. It should be around GDDR5 performance or so. I don't expect DDR5 or quad channel to do anywhere near as well in inference as an M2 or Strix Halo, even if you could offload the same model quant completely on GPU and the rest on RAM.

I don't really consider TOPS that great of a metric, either. It's just comes off as another FLOPS thing that doesn't really matter in real world performance.
 
Unified memory is pretty fast in it's own right and has an APU that can use it. It should be around GDDR5 performance or so. I don't expect DDR5 or quad channel to do anywhere near as well in inference as an M2 or Strix Halo, even if you could offload the same model quant completely on GPU and the rest on RAM.

I don't really consider TOPS that great of a metric, either. It's just comes off as another FLOPS thing that doesn't really matter in real world performance.

Mostly because memory is so important here. It's pretty much the only thing that matters until that requirement is satisfied.
 
The self-entitlement from the OP is exactly what I've come to expect from "AI" companies and the people who believe those companies are in any way shape or form useful to humanity.

"Capable of running" is not in the same solar system as "good at running". If you want the latter for a nonstandard consumer use case you're not a consumer, you're a professional, and you need to pull the stick outta your a** and pony up the cash for professional products.
"Get yourself on the Trabant waiting list."

trabant.jpg


The borderline between consumer and professional is arbitrary when there is monopoly and duopoly — which there is. If we were in a situation with actual capitalism you might have a point.

AI models that need a lot more than 32 GB of VRAM are freely available for download right now. Yet, even though there absolutely is a consumer market for using these, the gods (uh... I mean quasi-monopolists) don't deign to even use older nodes like Samsung 8 nm to produce anything for them. Peons are not prioritized, even when they're willing to pony up reasonable amounts of cash. Chiding ordinary people for not having enterprise-level budgets is absurd.
 
The borderline between consumer and professional is arbitrary when there is monopoly and duopoly — which there is. If we were in a situation with actual capitalism you might have a point.

AI models that need a lot more than 32 GB of VRAM are freely available for download right now. Yet, even though there absolutely is a consumer market for using these, the gods (uh... I mean quasi-monopolists) don't deign to even use older nodes like Samsung 8 nm to produce anything for them. Peons are not prioritized, even when they're willing to pony up reasonable amounts of cash. Chiding ordinary people for not having enterprise-level budgets is absurd.

Strictly speaking, plain "capitalism" is just about the idea of pooling capital to achieve something no single entity has enough capital for and then sharing the profits. We have plenty of that going on.

Free market capitalism implies having enough players to compete, and it is relatively rare in capital heavy industries, especially now. Few companies make large passenger airplanes, few companies make gene sequencers, few companies make fast CPUs, few companies make fast GPUs, few companies make fast FPGAs.

This is not just because of money, but also human resources. We are stretched thin and one or two companies away from losing leadership in many industries.

As for VRAM - it really does not cost that much to add a few chips. What's more a lot of people would gladly get VRAM that is 10-20% slower, but quadruple the capacity. Limited VRAM is purely market segmentation.
 
This is not just because of money, but also human resources. We are stretched thin and one or two companies away from losing leadership in many industries.

As for VRAM - it really does not cost that much to add a few chips. What's more a lot of people would gladly get VRAM that is 10-20% slower, but quadruple the capacity. Limited VRAM is purely market segmentation.
In the 80s, Japanese DRAM firms dumpled DRAM and drove US companies, innovative companies, out of business, then raised prices. The DRAM prices went way up not because there weren't people capable of making good DRAM in the US. Not having enough people to produce competitor GPUs I think is way way down on the list of reasons why there is inadequate competition. The system faciliates too much wealth concentration. It enables too much monopolization and collusion.

I agree about the VRAM. That makes it quite galling to see the level of shenanigans that are gotten away with. This "market segmentation" exists because of inadequate competition, in a manner similar to Intel's endless quad core CPU iterations until AMD decided to compete. If AMD hadn't started selling dual-core chips, Intel would probably still be selling consumers single-core CPUs.

Capitalism is a neat idea but it seems to be designed to turn itself into corporate socialism, where people are on waiting lists (and having to jump through other absurd hoops like joining Discord clubs to beg) to purchase overpriced products.

There is more than enough money and manpower to have GPU competition. The problem is that money is allowed to be hoarded to extremes by massive corporations like Apple and Nvidia. I posited a Potato GPU corporation a few months ago and I have just about one potato in capital to get it going. Superior beings have it so they can build flaming moats.
 
In the 80s, Japanese DRAM firms dumpled DRAM and drove US companies, innovative companies, out of business, then raised prices. The DRAM prices went way up not because there weren't people capable of making good DRAM in the US. Not having enough people to produce competitor GPUs I think is way way down on the list of reasons why there is inadequate competition. The system faciliates too much wealth concentration. It enables too much monopolization and collusion.

I agree about the VRAM. That makes it quite galling to see the level of shenanigans that are gotten away with. This "market segmentation" exists because of inadequate competition, in a manner similar to Intel's endless quad core CPU iterations until AMD decided to compete. If AMD hadn't started selling dual-core chips, Intel would probably still be selling consumers single-core CPUs.

Capitalism is a neat idea but it seems to be designed to turn itself into corporate socialism, where people are on waiting lists (and having to jump through other absurd hoops like joining Discord clubs to beg) to purchase overpriced products.

There is more than enough money and manpower to have GPU competition. The problem is that money is allowed to be hoarded to extremes by massive corporations like Apple and Nvidia. I posited a Potato GPU corporation a few months ago and I have just about one potato in capital to get it going. Superior beings have it so they can build flaming moats.

One reason that we only have AMD and Intel is that only AMD has a license to x86 architecture. You'd think implementing it from scratch would be fair use, but that's not what we have. The "intellectual property" focused too much on the second and not enough on the first. One issue is that patents and copyrights encourage preventing others from using the invention. We should instead have legal framework that encourages accessibility and time to market.
 
You'll be seeing such products with higher VRAM amount in their datacenter/workstation offerings with a heavier price tag, not in the consumer space.

Don't expect those to be any cheaper than a 5090. All in all, if you want a large vram consumer product, the 5090 existis for this sole reason.

Most DDR5 motherboards should support 256GB of RAM. The problem is that there are no 64GB UDIMMs available for sale yet.

Maybe you try these: https://www.crucial.de/memory/ddr5/...naFJYpuKSG8A70e-3SzLzTJwcDyDvFsuWlLH3Yih_jMNu

 
Back
Top