• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

but even then you only have 24GB or 32gb if you pony up $2k and can even find a 5090.

if you get a refurb M chip you can get 64GB unified ~ 50-55GB usable to load in the model for $2500 or over 110GB if you can snag a 128GB.
Yeah. But - important detail - there's going to be a tradeoff in speed. If you're using a model that fits into 24/32GB, nvidia's stuff is going to run it faster. Especially applicable to things like text2video or img2video, with writing/asisstant-type requests the wait times aren't painful. Even M4 isn't all that fast. Depends on the needs but sure Macs are a decent option.
 
Yeah. But - important detail - there's going to be a tradeoff in speed. If you're using a model that fits into 24/32GB, nvidia's stuff is going to run it faster. Especially applicable to things like text2video or img2video, with writing/asisstant-type requests the wait times aren't painful. Even M4 isn't all that fast. Depends on the needs but sure Macs are a decent option.
Very true -- they are slower at that size, especially for the reasoning models. Ideal would be a 64Gb 5090 or titan class, but considering the H100 80Gb is selling for $30K i don't think that's going to happen.

5090s for $2000 seem like charity compared to what the AI companies are paying.

If AMD really wanted to take some market share they could do it with some high density cards, then people who don't want to pony up nvidia will have to put up and hopefully improve the software stack.
 
So, and I want to be serious. Balls.

I'd love to end there, but most people would probably assume that this is a low quality post...because it sounds like a troll. It, surprisingly enough, is not. About a decade ago people started making ball robots...think back to the BB-8, and when Star Wars was just getting out of Lucas's hands. These things were just coming out of universities and the like...and they were hot. Now...in the last two years they've basically become something special...because China has decided to lead the world with them.

Don't believe me...because I sound like I've drank the bong water? CCTV on youtube

That's right. An idea goes open source and all of the sudden China is leading the world with their unique new future. You listen to them describe what most people would absolutely consider technology from a decade ago...given it exists in cheapo hoverboards...as though they were the first people to ever think of it. You listen to how these ball robots can patrol streets...surprisingly empty of any obstacles or challenges, and see that they are absolutely the cost effective way for a policing force that literally cannot be stopped by morality...and you laugh. This crap was on hackaday 15 years ago, with hobbyists creating them 9 years ago. BB-8 on Hackaday



So...this is one of many things that leads me to the conclusion that if it magics into existence in China, after it's open sourced elsewhere, it's probably a monetary enrichment scheme...given that it promises exactly what China wants. A way to use cheap hardware to run AI, that gets around the western ban on selling them the good AI chips. This goes right along with being able to fabricate 5 nm chips by hand, the litany of silicon startups that no longer exist in China now they have to demonstrate something, and the general vibe. As such, DeepSeek is likely just an AI taught AI...which limits system resources and training time...but also limits the AI. As with any copy of a copy, some definition is lost. In China, that's just par for the course.


Wrapping back around, the requested accelerator card from the OP is silly. It'll exist when there is a market for it...and no better one exists. We do not need AI in the same way we need raw computation...read: folding@home...but asking for anything to be cheap is the icing on the cake that this is a request from somebody who doesn't understand why things are expensive at all.
 
No, we really don't. DeepSeek has proven very well that one needs only a 6GB GPU and a RaspberryPi4 or 5 to get the job done in a very good way. Your "request" is not very applicable to the general consumer anyway.
Deepseek hasn't proven anything. Their actual impressive model is 671B params in size, which requires at least 350GB of VRAM/RAM to run, that's not modest.
The models you are talking about that ran on a 6GB GPU and a raspberry pi are the distilled models, which are the ones based on existing models (llama and qwen).
Larger models of the same generation always give have better quality than smaller ones.
Of course that as time improves, the smaller models improve as well, but so do their larger counterparts.
Seriously, how many people need AI at all? Hmm? What would they use it for?
(those are rhetorical questions, meaning they do not need answers)
With the above said, I do agree with those rhetorical questions. It's just too much entitlement for something that's a hobby.
If not a hobby, then one should have enough money to pony up on professional stuff.

Right now Apple M chips with the unified memory are crushing local LLM development. I think as AI devs flock to apple devices nvidia will react and release their N1X chips with unified memory or start offering higher GB consumer cards.
Nvidia has that Digitis product now.
Deepseek 14B model is capable of fitting into the 4090's buffer, but it's far inferior to the 32B model that's available (like if you ask it to code a typescript website, it will create jsx files and make a bunch of basic mistakes) IMO the 32B model is better than chat 4o. 32B runs brutally slow and only uses 60% GPU since it runs out of framebuffer -- I would love to be able to run the larger models.
Q4 quants are a thing. Problem comes to the 70B models, those end up requiring 2 GPUs even at Q4.
but even then you only have 24GB or 32gb if you pony up $2k and can even find a 5090.

if you get a refurb M chip you can get 64GB unified ~ 50-55GB usable to load in the model for $2500 or over 85GB if you can get the 96GB version for $3200.

For the same money you would build a 5090 rig. -- Granted the models will run alot slower, but if you're looking for ram size m4 max might be the best price/performance.
2x3090s should cost less than a 5090 and would give you 48GB, while being way faster than any M4 Max.
For >50GB models, yeah, going for unified memory is the most cost-effective way currently.
If AMD really wanted to take some market share they could do it with some high density cards, then people who don't want to pony up nvidia will have to put up and hopefully improve the software stack.
Strix Halo with 128GB should fill this niche nicely.
Too bad it doesn't seem it'll have higher RAM models available, a 192/256GB model would be hella cool.

Call me stupid, but I think I do, cause I haven't the faintest idea what someone would need LLMs for on a home PC.
I may not be your average user, but I use it as a coding assistant most of the time (with a mix of claude/gpt4 as well), and for some academic projects (some related to chatbots, others related to RAG stuff).
Using it as a "helper" while writing academic papers is pretty useful as well.
 
There are B650 mbs with 256GB ram support even on budget side (gigabyte b650m ds3h, msi b650 gaming plus wifi...) I'm surprised (well not really cuz Asus) Asus one doesnt have that.


I think more VRAM or more system RAM alone is not a good solution to this and I think AMDs new laptops with up to 128GB shared ram specially targeted for LLM works are better approach. It solves both problems really. GPU power and bandwidth are the next problems but they can only get better from this. And Deepseek kinda showed that LLMs will also get better and will be more usable on consumer devices day by day.
I'm already building 256GB systems - though I am having to run at JEDEC 4800 CL40 using 9950X. Maybe there's a combination of kit, board and BIOS that's guaranteed to run faster speeds with 100% stability, but at this point it's almost certainly dual-rank, dual-channel, dual-DIMMs per channel and that's why 5600 or 6000 isn't happening.
 
I'm already building 256GB systems - though I am having to run at JEDEC 4800 CL40 using 9950X. Maybe there's a combination of kit, board and BIOS that's guaranteed to run faster speeds with 100% stability, but at this point it's almost certainly dual-rank, dual-channel, dual-DIMMs per channel and that's why 5600 or 6000 isn't happening.
Did you build a 256GB system using UDIMMs, or was it on a RDIMM platform? Given that you said a 9950x, I'm going to assume it's the former.
Where did you get those sticks? I'm really looking towards a 9950x + 256GB build, even did a thread about this:
 
256Gb on a 9950x is nuts. Would love to see what combo works for that and at what speed.
 
More channels = more CPU pins = more mobo traces = higher costs.
Even Threadripper non-pro only does 1DPC with its 4 channels, likely for market segmentation reasons.
Now that motherboard prices are pretty high, I believe 4 channels wouldn't be too expensive, especially if restricted to 1 DPC.
 
"Capable of running" is not in the same solar system as "good at running".

Actually quite good - but for 20th century simulations, not the 21st century ones.

If you want the latter for a nonstandard consumer use case you're not a consumer, you're a professional, and you need to pull the stick outta your a** and pony up the cash for professional products.
The "professional" products are for large corporations with large budgets, boring goals and no original ideas.

The affordable compute is needed for hobbyists and small startups to innovate. And it benefits manufacturers by opening new markets for their products.

To see my point just look at NVidia - it would not be in position it is now if its GPUs weren't affordable when used for the first time to do compute.
 
  • Like
Reactions: SRS
what we need is cheaper cards without AI crap, gtx back.
you want AI crap there should be dedicated gpus with rtx and ai stuff and pay for them, not use the normal gpus
 
if you get a refurb M chip you can get 64GB unified ~ 50-55GB usable to load in the model for $2500 or over 85GB if you can get the 96GB version for $3200.

For the same money you would build a 5090 rig. -- Granted the models will run alot slower, but if you're looking for ram size m4 max might be the best price/performance.
Can you link model, pls?

offtopic: I saw news of 5090 with superior ram size 48-96. Get 3 of them and limit to 350w et voila. But … what you are running locally are not deepseek models but llama3 or qwen 2.5. I dunno are we in 90’s with 56k modem speed? Replaceable by token/s. And in 10 years technology should advance enough
 
Can you link model, pls?

offtopic: I saw news of 5090 with superior ram size 48-96. Get 3 of them and limit to 350w et voila. But … what you are running locally are not deepseek models but llama3 or qwen 2.5. I dunno are we in 90’s with 56k modem speed? Replaceable by token/s. And in 10 years technology should advance enough
Sure:
Apple Mac Studio - USFF - M2 Max - 12-Core CPU - 38-Core GPU - 96 GB RAM - 512 GB SSD - Silver - Z17Z-2002206686 - Towers - CDW.com

Apple Mac Studio with M2 Ultra Z17Z00073 B&H Photo Video - 60 core GPU

also helpful
Can someone please tell me how many tokens per second you get on MACs unified memory : r/LocalLLaMA

also:
  • Short queries (one-liners): ~10-50 tokens
  • Medium queries (paragraph-length): ~100-300 tokens
  • Detailed responses (code, explanations, multi-paragraph answers): ~500-1000 tokens
  • Extensive responses (deep analysis, large code blocks): ~1000-4000 tokens

Might want to wait for the m4 ultra which should be out shortly. Apple has MLX and some accelerators, but anything on Nvidia optimized models is years faster than the macs. Seems like it depends on use case.
Whisper: Nvidia RTX 4090 vs M1Pro with MLX (updated with M2/M3) - Oliver Wehrens
 
Last edited:
Outside of the companies producing AI hardware, firmware, and software and those who market it, has anyone else actually made any real money from AI?
 
What we need is AI to die.
Thanks,
 
It's the pinnacle of wishful thinking. You won't get hardware that advanced on AMD B650 platform money, not 10 years ago, not now, and I dare say not in 10 years from now.

If you want to run LLMs on an advanced platform on a discount, you'll have to go earlier generation. Buy yourself a quality X299 motherboard, 256 GB of DDR4 and a Core i9-7980XE. No need to bother with the 9980XE or 10980XE, they are a lot more expensive and are incremental upgrades. I'd tell you go Threadripper, but I'm fairly sure these LLM engines are very heavily SIMD optimized and probably do run on AVX-512, which Zen 1/Zen 1+ Threadrippers don't support.

For GPU, used Quadro GP100 and Titan V are finally getting down to earth, but yourself two or three at ~$400 a pop.

Either way, that is not a cheap system to build, but a far cry from a latest generation workstation.
 
Vega 56/64/FE and Radeon 7 ahead of their time I guess.

Blame Raja

What we need is AI to die.
Thanks,
When AI is powerful enough I'm going to use it to build a time machine and go back and stop AI
 
Vega 56/64/FE and Radeon 7 ahead of their time I guess.

Blame Raja

ahead of their time as in a major source of the problem. AMD just gives you more RAM, better RAM because they don't know what else to do. Leaving Nvidia free to do crazy stuff and move the market into crazy
 
Nvidia would do that anyway, especially when 90% of consumers vote with their wallets to allow Nvidia to ruin the market.

always the same argument, AMD is amazing, everyone is an idiot. Sure mate, scream at the wind.
 
Did you build a 256GB system using UDIMMs, or was it on a RDIMM platform? Given that you said a 9950x, I'm going to assume it's the former.
Where did you get those sticks? I'm really looking towards a 9950x + 256GB build, even did a thread about this:
MSI B650 Tomahawk and Kinston ValueRAM. Think it was this 2Rx8 single 6400 CL52 module (CUDIMM) from Exertis UK: KVR64A52BD8-64

I absolutely could not get it running at anything other than bone-stock 4800 CL40, but if you need more than 192GB RAM without stepping up to a Xeon/Threadripper, you can't be choosy.

I just read your thread from yesterday - seems like you found the same stuff. I found that model by reading about MSI's support for 256GB - they posted a screengrab of GPU-Z with the DIMM model number so that's what made me confident it would work, and I have no idea how they got it working at 6400 with 4 sticks!

If you really need a lot of RAM, you might have to go ThreadRipper though.
 
Last edited:
MSI B650 Tomahawk and Kinston ValueRAM. Think it was this 2Rx8 single 6400 CL52 module (CUDIMM) from Exertis UK: KVR64A52BD8-64

I absolutely could not get it running at anything other than bone-stock 4800 CL40, but if you need more than 192GB RAM without stepping up to a Xeon/Threadripper, you can't be choosy.
Wholesome, thank you for the info!
Great to know that you managed to get it at 4800CL40, I'd consider that great already. I'll be likely buying those same sticks for me along with a 9950x as well.
 
Wholesome, thank you for the info!
Great to know that you managed to get it at 4800CL40, I'd consider that great already. I'll be likely buying those same sticks for me along with a 9950x as well.
Needed a BIOS flash on the board too, it was in limp-mode on the 9950X at 0.56GHz and didn't work with the 64GB DIMM until I updated to the December BIOS with AGESA 1.2.0.2b

Also, make sure you go 9000-series, I couldn't get the CUDIMMs working on a 7900X which makes me sad, because that's the bulk of our workstations. I have a nasty feeling AMD don't support them on 7000-series and either don't plan to, or physically can't.

FYI Raptor lake has solid CUDIMM support. Most of the rabbit holes I dove into when hunting for 64GB DIMMs were LGA1700.
 
Back
Top