• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

Needed a BIOS flash on the board too, it was in limp-mode on the 9950X at 0.56GHz and didn't work with the 64GB DIMM until I updated to the December BIOS with AGESA 1.2.0.2b

Also, make sure you go 9000-series, I couldn't get the CUDIMMs working on a 7900X which makes me sad, because that's the bulk of our workstations. I have a nasty feeling AMD don't support them on 7000-series and either don't plan to, or physically can't.

FYI Raptor lake has solid CUDIMM support. Most of the rabbit holes I dove into when hunting for 64GB DIMMs were LGA1700.
I'm more interested on an AMD platform because it's both cheaper than arrow lake, and also because of AVX-512.
Raptor lake is not that interesting for me. It doesn't support CUDIMM either, so didn't you mean Arrow lake instead?

But yeah, I'm planning on a 9950x nonetheless, likely with a b650 or x670e proart.
 
I'm more interested on an AMD platform because it's both cheaper than arrow lake, and also because of AVX-512.
Raptor lake is not that interesting for me. It doesn't support CUDIMM either, so didn't you mean Arrow lake instead?

But yeah, I'm planning on a 9950x nonetheless, likely with a b650 or x670e proart.
It seems like 9000-series only are supported, and operating in bypass mode, whatever that means.

There's a verified AMD engineer responding in this thread, so you can take it as pretty accurate (for a Reddit thread).

Potentially there's faster 256GB AM5 configurations possible than the 4800 I've achieved. I'm just not particularly clued up on manual DDR5 timings and I'm aiming for stability over speed. All of our 128GB AM4/AM5 systems are running at JEDEC speeds. I've been punished by user callouts for instability where I've found a kit that passes a memtest loop and subsequent OCCT certificate, only to start crashing a few weeks later - either degredation or marginal stability from the outset.
 
It seems like 9000-series only are supported, and operating in bypass mode, whatever that means.
Means that they will work as regular UDIMMs, and the clock buffering circuitry on the DIMMs won't be used.
Potentially there's faster 256GB AM5 configurations possible than the 4800 I've achieved. I'm just not particularly clued up on manual DDR5 timings and I'm aiming for stability over speed. All of our 128GB AM4/AM5 systems are running at JEDEC speeds. I've been punished by user callouts for instability where I've found a kit that passes a memtest loop and subsequent OCCT certificate, only to start crashing a few weeks later - either degredation or marginal stability from the outset.
I'd be happy with 4800MHz already, just want to double up on quantity.
 
Call me stupid, but I think I do, cause I haven't the faintest idea what someone would need LLMs for on a home PC.
That was my point..

Deepseek hasn't proven anything. Their actual impressive model is 671B params in size, which requires at least 350GB of VRAM/RAM to run, that's not modest.
The models you are talking about that ran on a 6GB GPU and a raspberry pi are the distilled models, which are the ones based on existing models (llama and qwen).
Larger models of the same generation always give have better quality than smaller ones.
Of course that as time improves, the smaller models improve as well, but so do their larger counterparts.
Watch, learn.
It's just too much entitlement for something that's a hobby.
If not a hobby, then one should have enough money to pony up on professional stuff.
Agreed on both points.
 
Last edited:
That was my point..


Watch, learn.
If you give the video a proper watch, you'd know what I'm talking about.
But let me make it even easier for you:
But sensationalist headlines aren't telling you the full story.

The Raspberry Pi can technically run Deepseek R1... but it's not the same thing as Deepseek R1 671b, which is a four hundred gigabyte model.

That model (the one that actually beats ChatGPT), still requires a massive amount of GPU compute.
 
If you give the video a proper watch, you'd know what I'm talking about.
But let me make it even easier for you:
How did you miss the point twice in a row? The point is, it's doable on low end machines. You don't need high end specs to do AI stuff now. Sure it might take longer, but it's doable. And in reference to the OP, we don't need specialized hardware or GPU's to do it. General everyday hardware is all a person needs.
 
How did you miss the point twice in a row? The point is, it's doable on low end machines
You are the one missing it. For those smaller models, they used already existing models, and improved their quality a little bit. There's nothing new regarding this.
You don't need high end specs to do AI stuff now.
That's the point I'm making, this level of performance on consumer devices has been available for over an year now.
The new stuff deepseek brought is all related to their bigger models, that's where the innovation lies in.
Don't fall into the sensationalism some outlets are spouting (as Jeff himself said), and specially try not to reinforce those since this is just giving way to misinformation.
 
Yup, that's gotta be it. See ya.
Well, in case you're open to learning something and having a proper discussion, I can heavily recommend giving the actual deepseek paper a read:
 
Well, in case you're open to learning something and having a proper discussion, I can heavily recommend giving the actual deepseek paper a read:
Oh, you mean this paper?

Have you actually read that PDF? Page 13 & 14 are most interesting..
 
Or here's an idea: we actually for the hardware instead of trying to brute force everything. You can already use AI at home - won't be particularly fast, but it isn't something you need at home at this point in time. Developers can't even optimise for games properly and you're asking for hardware that you probably won't even truly benefit from.
 
Oh, you mean this paper?

Have you actually read that PDF? Page 13 & 14 are most interesting..
Yeah, that's the exact paper I linked.
Page 13 is not really relevant since it's only pertaining to the bigger model. 14 is indeed where the fun is at, along with page 15, which has the comparison between a distilled model vs one just using their techniques from scratch (table 6), with this conclusion:
Therefore, we can draw two conclusions: First, distilling more powerful models into smaller ones yields excellent results, whereas smaller models relying on the large-scale RL mentioned in this paper require enormous computational power and may not even achieve the performance of distillation. Second, while distillation strategies are both economical and effective, advancing beyond the boundaries of intelligence may still require more powerful base models and larger-scale reinforcement learning.
It does reasonably good on reasoning benchmarks, however if you compare it to their regular base models, the distilled ones aren't that impressive. On HF's leaderboard, the distilled deepseek models rank quite low:

I'll assume you don't have much experience with running LLMs locally. You could either go with ollama on the CLI, or you could try something like LM Studio:
I personally haven't used it (I just run ollama myself), but I've heard it makes it pretty easy for people that are not that tech-savvy to run LLMs, even though it's not the most performant stack.

This way you could give those different models a go, and even compare to the big deepseek model somehwere and see how they fare.
 
Yeah, that's the exact paper I linked.
Page 13 is not really relevant since it's only pertaining to the bigger model. 14 is indeed where the fun is at, along with page 15, which has the comparison between a distilled model vs one just using their techniques from scratch (table 6), with this conclusion:

It does reasonably good on reasoning benchmarks, however if you compare it to their regular base models, the distilled ones aren't that impressive. On HF's leaderboard, the distilled deepseek models rank quite low:

I'll assume you don't have much experience with running LLMs locally. You could either go with ollama on the CLI, or you could try something like LM Studio:
I personally haven't used it (I just run ollama myself), but I've heard it makes it pretty easy for people that are not that tech-savvy to run LLMs, even though it's not the most performant stack.

This way you could give those different models a go, and even compare to the big deepseek model somehwere and see how they fare.
Ok, sure, moving on..
 
With the AI age being here, we need fast memory and lots of it, so we can host our favorite LLMs locally.
+1. With 32GB AMD 9070XT possibly coming, at least AMD is listening. nVidia has project DIGITS. I think AMD has good chance to make unified RAM platform based on PS5/XBOX.

We also need DDR6 quad channel (or something entirely new and faster) consumer desktop motherboards with up to 256 or 384GB RAM (a/my current B650 mobo supports only up to 128GB RAM)
sRT5 socket Threadripper has 8x DDR5 channels, but MB prices are four figures ofc.

With LLMs expected to exceed all aspects of human capabilities in 1-2 years according to Anthropic, this topic is going to be huge and change humanity in ways none of us can imagine. Local LLMs will be part of upcoming change. Excellent time for various hardware and software companies to ride the LLM wave.
 
I think AMD has good chance to make unified RAM platform based on PS5/XBOX.
That'd be strix halo, up to 128GB unified memory on a 256-bit bus.
 
However, it seems to me that the current trend of running large LLMs locally will initially make powerful APUs like the Halo Strix scarce. In a second phase, however, it will stimulate the development of bigger and better APUs. Just my theory, but I believe this will bring significant changes to the market.

The big three players will likely try to sell CPU+GPU as a single product, effectively eliminating the low-end and mid-end dGPU market in the medium term.
Agreed, with nVidia rumoured to launch an ARM APU in 2026, and AMD its Medusa Halo around the same timeframe (IIRC), the trend seems here to stay. Intel better have something ready as well, otherwise they'll face even more difficulties than today
 
Don't get the hate for OP. These companies are artificially holding us back, and not just in the AI space.

Either give us more VRAM or developers will optimize for CPU and crash your stock prices in the process. Hell, tons of people are getting M2s just for this stuff, and that's just sad.

Yup, that's gotta be it. See ya.
He's literally right though. Flux.1-dev =/= Flux.1-schnell, either.
 
Last edited:
Don't get the hate for OP.
It's not hate for the OP. It's that cards for this kind of use already exist. They're called professional cards. Right now they come in 32GB, 48GB and 64GB flavors. We don't need consumers cards with those memory banks. That failure of understanding is common. Don't sweat it.

He's literally right though. Flux.1-dev =/= Flux.1-schnell, either.
And another.. :rolleyes:
 
Last edited:
It's not hate for the OP. It's that cards for this kind of use already exist. They're called profession cards. Right now that come in 32GB, 48GB and 64GB flavors. We don't need consumers cards with those memory banks. That failure of understanding is common. Don't sweat it.
Ah, yes, the big fat "We" dictating consumer "needs". I guess "we", the consumers, should just get the exact same card and move on. It's not like different consumers have different needs or different tastes or anything which drives the market in the first place, after all.

And no, those cards the OP is talking about don't exist, because they aren't "cheap" at all - which is the entire point of his initial post.
And another.. :rolleyes:
I don't understand this response. Sorry.
 
which is the entire point of his initial post.
Yes and there are some of us that are trying to help them and everyone else parroting this idea that it is not going to happen. And it does NOT take a genius to figure that out.

Either spend the money for the professional compute cards or live with the reduced speed of the consumer cards. Those are the choices. There are no others.

I don't understand this response. Sorry.
That's ok, no worries.
 
what we need is cheaper cards without AI crap, gtx back.
you want AI crap there should be dedicated gpus with rtx and ai stuff and pay for them, not use the normal gpus
All in due time.

First, the conclusion must be drawn that RT is too costly for all those involved.
 
All in due time.

First, the conclusion must be drawn that RT is too costly for all those involved.

i think we're past that, unless you have a very high end card, there is no point in turning RT on
 
i think we're past that, unless you have a very high end card, there is no point in turning RT on
The response you'll get is 'lies, because we can use DLSS'. Also engines have indeed added software based RT.
 
Back
Top