Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

Dristun · Feb 4, 2025

phanbuey said:
but even then you only have 24GB or 32gb if you pony up $2k and can even find a 5090.

if you get a refurb M chip you can get 64GB unified ~ 50-55GB usable to load in the model for $2500 or over 110GB if you can snag a 128GB.

Yeah. But - important detail - there's going to be a tradeoff in speed. If you're using a model that fits into 24/32GB, nvidia's stuff is going to run it faster. Especially applicable to things like text2video or img2video, with writing/asisstant-type requests the wait times aren't painful. Even M4 isn't all that fast. Depends on the needs but sure Macs are a decent option.

phanbuey · Feb 4, 2025

Dristun said:
Yeah. But - important detail - there's going to be a tradeoff in speed. If you're using a model that fits into 24/32GB, nvidia's stuff is going to run it faster. Especially applicable to things like text2video or img2video, with writing/asisstant-type requests the wait times aren't painful. Even M4 isn't all that fast. Depends on the needs but sure Macs are a decent option.

Very true -- they are slower at that size, especially for the reasoning models. Ideal would be a 64Gb 5090 or titan class, but considering the H100 80Gb is selling for $30K i don't think that's going to happen.

5090s for $2000 seem like charity compared to what the AI companies are paying.

If AMD really wanted to take some market share they could do it with some high density cards, then people who don't want to pony up nvidia will have to put up and hopefully improve the software stack.

mb194dc · Feb 4, 2025

This already exists, 2k is pretty cheap anyway:

Zazigalka · Feb 4, 2025

lexluthermiester said:
Seriously, how many people need AI at all? Hmm? What would they use it for?
(those are rhetorical questions, meaning they do not need answers)

Call me stupid, but I think I do, cause I haven't the faintest idea what someone would need LLMs for on a home PC.

lilhasselhoffer · Feb 4, 2025

So, and I want to be serious. Balls.

I'd love to end there, but most people would probably assume that this is a low quality post...because it sounds like a troll. It, surprisingly enough, is not. About a decade ago people started making ball robots...think back to the BB-8, and when Star Wars was just getting out of Lucas's hands. These things were just coming out of universities and the like...and they were hot. Now...in the last two years they've basically become something special...because China has decided to lead the world with them.

Don't believe me...because I sound like I've drank the bong water? CCTV on youtube

That's right. An idea goes open source and all of the sudden China is leading the world with their unique new future. You listen to them describe what most people would absolutely consider technology from a decade ago...given it exists in cheapo hoverboards...as though they were the first people to ever think of it. You listen to how these ball robots can patrol streets...surprisingly empty of any obstacles or challenges, and see that they are absolutely the cost effective way for a policing force that literally cannot be stopped by morality...and you laugh. This crap was on hackaday 15 years ago, with hobbyists creating them 9 years ago. BB-8 on Hackaday

So...this is one of many things that leads me to the conclusion that if it magics into existence in China, after it's open sourced elsewhere, it's probably a monetary enrichment scheme...given that it promises exactly what China wants. A way to use cheap hardware to run AI, that gets around the western ban on selling them the good AI chips. This goes right along with being able to fabricate 5 nm chips by hand, the litany of silicon startups that no longer exist in China now they have to demonstrate something, and the general vibe. As such, DeepSeek is likely just an AI taught AI...which limits system resources and training time...but also limits the AI. As with any copy of a copy, some definition is lost. In China, that's just par for the course.

Wrapping back around, the requested accelerator card from the OP is silly. It'll exist when there is a market for it...and no better one exists. We do not need AI in the same way we need raw computation...read: folding@home...but asking for anything to be cheap is the icing on the cake that this is a request from somebody who doesn't understand why things are expensive at all.

igormp · Feb 4, 2025

lexluthermiester said:
No, we really don't. DeepSeek has proven very well that one needs only a 6GB GPU and a RaspberryPi4 or 5 to get the job done in a very good way. Your "request" is not very applicable to the general consumer anyway.

Deepseek hasn't proven anything. Their actual impressive model is 671B params in size, which requires at least 350GB of VRAM/RAM to run, that's not modest.
The models you are talking about that ran on a 6GB GPU and a raspberry pi are the distilled models, which are the ones based on existing models (llama and qwen).
Larger models of the same generation always give have better quality than smaller ones.
Of course that as time improves, the smaller models improve as well, but so do their larger counterparts.

lexluthermiester said:
Seriously, how many people need AI at all? Hmm? What would they use it for?
(those are rhetorical questions, meaning they do not need answers)

With the above said, I do agree with those rhetorical questions. It's just too much entitlement for something that's a hobby.
If not a hobby, then one should have enough money to pony up on professional stuff.

phanbuey said:
Right now Apple M chips with the unified memory are crushing local LLM development. I think as AI devs flock to apple devices nvidia will react and release their N1X chips with unified memory or start offering higher GB consumer cards.

Nvidia has that Digitis product now.

phanbuey said:
Deepseek 14B model is capable of fitting into the 4090's buffer, but it's far inferior to the 32B model that's available (like if you ask it to code a typescript website, it will create jsx files and make a bunch of basic mistakes) IMO the 32B model is better than chat 4o. 32B runs brutally slow and only uses 60% GPU since it runs out of framebuffer -- I would love to be able to run the larger models.

Q4 quants are a thing. Problem comes to the 70B models, those end up requiring 2 GPUs even at Q4.

phanbuey said:
but even then you only have 24GB or 32gb if you pony up $2k and can even find a 5090.

if you get a refurb M chip you can get 64GB unified ~ 50-55GB usable to load in the model for $2500 or over 85GB if you can get the 96GB version for $3200.

For the same money you would build a 5090 rig. -- Granted the models will run alot slower, but if you're looking for ram size m4 max might be the best price/performance.

2x3090s should cost less than a 5090 and would give you 48GB, while being way faster than any M4 Max.
For >50GB models, yeah, going for unified memory is the most cost-effective way currently.

phanbuey said:
If AMD really wanted to take some market share they could do it with some high density cards, then people who don't want to pony up nvidia will have to put up and hopefully improve the software stack.

Strix Halo with 128GB should fill this niche nicely.
Too bad it doesn't seem it'll have higher RAM models available, a 192/256GB model would be hella cool.

Zazigalka said:
Call me stupid, but I think I do, cause I haven't the faintest idea what someone would need LLMs for on a home PC.

I may not be your average user, but I use it as a coding assistant most of the time (with a mix of claude/gpt4 as well), and for some academic projects (some related to chatbots, others related to RAG stuff).
Using it as a "helper" while writing academic papers is pretty useful as well.

Chrispy_ · Feb 4, 2025

adilazimdegilx said:
There are B650 mbs with 256GB ram support even on budget side (gigabyte b650m ds3h, msi b650 gaming plus wifi...) I'm surprised (well not really cuz Asus) Asus one doesnt have that.

I think more VRAM or more system RAM alone is not a good solution to this and I think AMDs new laptops with up to 128GB shared ram specially targeted for LLM works are better approach. It solves both problems really. GPU power and bandwidth are the next problems but they can only get better from this. And Deepseek kinda showed that LLMs will also get better and will be more usable on consumer devices day by day.

I'm already building 256GB systems - though I am having to run at JEDEC 4800 CL40 using 9950X. Maybe there's a combination of kit, board and BIOS that's guaranteed to run faster speeds with 100% stability, but at this point it's almost certainly dual-rank, dual-channel, dual-DIMMs per channel and that's why 5600 or 6000 isn't happening.

igormp · Feb 4, 2025

Chrispy_ said:
I'm already building 256GB systems - though I am having to run at JEDEC 4800 CL40 using 9950X. Maybe there's a combination of kit, board and BIOS that's guaranteed to run faster speeds with 100% stability, but at this point it's almost certainly dual-rank, dual-channel, dual-DIMMs per channel and that's why 5600 or 6000 isn't happening.

Did you build a 256GB system using UDIMMs, or was it on a RDIMM platform? Given that you said a 9950x, I'm going to assume it's the former.
Where did you get those sticks? I'm really looking towards a 9950x + 256GB build, even did a thread about this:

64GB (C)UDIMMs, where are they?

tldr; does anyone have any extra information regarding the availability of such DIMMs, or have heard anything about them actually becoming available at retail anytime soon? So far I've been really eyeing an upgrade from my current platform (5950x + 4x32GB 3200MHz) to double up on CPU perf and...

www.techpowerup.com

phanbuey · Feb 4, 2025

256Gb on a 9950x is nuts. Would love to see what combo works for that and at what speed.

AnotherReader · Feb 4, 2025

igormp said:
More channels = more CPU pins = more mobo traces = higher costs.
Even Threadripper non-pro only does 1DPC with its 4 channels, likely for market segmentation reasons.

Now that motherboard prices are pretty high, I believe 4 channels wouldn't be too expensive, especially if restricted to 1 DPC.

qxp · Feb 4, 2025

Assimilator said:
"Capable of running" is not in the same solar system as "good at running".

Actually quite good - but for 20th century simulations, not the 21st century ones.

Assimilator said:
If you want the latter for a nonstandard consumer use case you're not a consumer, you're a professional, and you need to pull the stick outta your a** and pony up the cash for professional products.

The "professional" products are for large corporations with large budgets, boring goals and no original ideas.

The affordable compute is needed for hobbyists and small startups to innovate. And it benefits manufacturers by opening new markets for their products.

To see my point just look at NVidia - it would not be in position it is now if its GPUs weren't affordable when used for the first time to do compute.

Bomby569 · Feb 4, 2025

what we need is cheaper cards without AI crap, gtx back.
you want AI crap there should be dedicated gpus with rtx and ai stuff and pay for them, not use the normal gpus

cerulliber · Feb 4, 2025

phanbuey said:
if you get a refurb M chip you can get 64GB unified ~ 50-55GB usable to load in the model for $2500 or over 85GB if you can get the 96GB version for $3200.

For the same money you would build a 5090 rig. -- Granted the models will run alot slower, but if you're looking for ram size m4 max might be the best price/performance.

Can you link model, pls?

offtopic: I saw news of 5090 with superior ram size 48-96. Get 3 of them and limit to 350w et voila. But … what you are running locally are not deepseek models but llama3 or qwen 2.5. I dunno are we in 90’s with 56k modem speed? Replaceable by token/s. And in 10 years technology should advance enough

phanbuey · Feb 4, 2025

cerulliber said:
Can you link model, pls?

offtopic: I saw news of 5090 with superior ram size 48-96. Get 3 of them and limit to 350w et voila. But … what you are running locally are not deepseek models but llama3 or qwen 2.5. I dunno are we in 90’s with 56k modem speed? Replaceable by token/s. And in 10 years technology should advance enough

Sure:
Apple Mac Studio - USFF - M2 Max - 12-Core CPU - 38-Core GPU - 96 GB RAM - 512 GB SSD - Silver - Z17Z-2002206686 - Towers - CDW.com

Apple Mac Studio with M2 Ultra Z17Z00073 B&H Photo Video - 60 core GPU

also helpful
Can someone please tell me how many tokens per second you get on MACs unified memory : r/LocalLLaMA

also:

Short queries (one-liners): ~10-50 tokens
Medium queries (paragraph-length): ~100-300 tokens
Detailed responses (code, explanations, multi-paragraph answers): ~500-1000 tokens
Extensive responses (deep analysis, large code blocks): ~1000-4000 tokens

Might want to wait for the m4 ultra which should be out shortly. Apple has MLX and some accelerators, but anything on Nvidia optimized models is years faster than the macs. Seems like it depends on use case.
Whisper: Nvidia RTX 4090 vs M1Pro with MLX (updated with M2/M3) - Oliver Wehrens

TumbleGeorge · Feb 4, 2025

phanbuey said:
H100 80Gb

If was only with 80Gb so sure nobody will spend a rusty cent.

720p low · Feb 4, 2025

Outside of the companies producing AI hardware, firmware, and software and those who market it, has anyone else actually made any real money from AI?

mxthunder · Feb 4, 2025

What we need is AI to die.
Thanks,

Dr. Dro · Feb 4, 2025

It's the pinnacle of wishful thinking. You won't get hardware that advanced on AMD B650 platform money, not 10 years ago, not now, and I dare say not in 10 years from now.

If you want to run LLMs on an advanced platform on a discount, you'll have to go earlier generation. Buy yourself a quality X299 motherboard, 256 GB of DDR4 and a Core i9-7980XE. No need to bother with the 9980XE or 10980XE, they are a lot more expensive and are incremental upgrades. I'd tell you go Threadripper, but I'm fairly sure these LLM engines are very heavily SIMD optimized and probably do run on AVX-512, which Zen 1/Zen 1+ Threadrippers don't support.

For GPU, used Quadro GP100 and Titan V are finally getting down to earth, but yourself two or three at ~$400 a pop.

Either way, that is not a cheap system to build, but a far cry from a latest generation workstation.

damric · Feb 4, 2025

Vega 56/64/FE and Radeon 7 ahead of their time I guess.

Blame Raja

mxthunder said:
What we need is AI to die.
Thanks,

When AI is powerful enough I'm going to use it to build a time machine and go back and stop AI

Bomby569 · Feb 4, 2025

damric said:
Vega 56/64/FE and Radeon 7 ahead of their time I guess.

Blame Raja

ahead of their time as in a major source of the problem. AMD just gives you more RAM, better RAM because they don't know what else to do. Leaving Nvidia free to do crazy stuff and move the market into crazy

Hecate91 · Feb 4, 2025

Bomby569 said:
AMD just gives you more RAM

Nvidia does the same thing, they just wait until the Super version to milk their consumers twice.

Bomby569 said:
Leaving Nvidia free to do crazy stuff and move the market into crazy

Nvidia would do that anyway, especially when 90% of consumers vote with their wallets to allow Nvidia to ruin the market.

Bomby569 · Feb 4, 2025

Hecate91 said:
Nvidia would do that anyway, especially when 90% of consumers vote with their wallets to allow Nvidia to ruin the market.

always the same argument, AMD is amazing, everyone is an idiot. Sure mate, scream at the wind.

Chrispy_ · Feb 4, 2025

igormp said:
Did you build a 256GB system using UDIMMs, or was it on a RDIMM platform? Given that you said a 9950x, I'm going to assume it's the former.
Where did you get those sticks? I'm really looking towards a 9950x + 256GB build, even did a thread about this:

64GB (C)UDIMMs, where are they?

tldr; does anyone have any extra information regarding the availability of such DIMMs, or have heard anything about them actually becoming available at retail anytime soon? So far I've been really eyeing an upgrade from my current platform (5950x + 4x32GB 3200MHz) to double up on CPU perf and...

www.techpowerup.com

MSI B650 Tomahawk and Kinston ValueRAM. Think it was this 2Rx8 single 6400 CL52 module (CUDIMM) from Exertis UK: KVR64A52BD8-64

I absolutely could not get it running at anything other than bone-stock 4800 CL40, but if you need more than 192GB RAM without stepping up to a Xeon/Threadripper, you can't be choosy.

I just read your thread from yesterday - seems like you found the same stuff. I found that model by reading about MSI's support for 256GB - they posted a screengrab of GPU-Z with the DIMM model number so that's what made me confident it would work, and I have no idea how they got it working at 6400 with 4 sticks!

If you really need a lot of RAM, you might have to go ThreadRipper though.

igormp · Feb 4, 2025

Chrispy_ said:
MSI B650 Tomahawk and Kinston ValueRAM. Think it was this 2Rx8 single 6400 CL52 module (CUDIMM) from Exertis UK: KVR64A52BD8-64

I absolutely could not get it running at anything other than bone-stock 4800 CL40, but if you need more than 192GB RAM without stepping up to a Xeon/Threadripper, you can't be choosy.

Wholesome, thank you for the info!
Great to know that you managed to get it at 4800CL40, I'd consider that great already. I'll be likely buying those same sticks for me along with a 9950x as well.

Chrispy_ · Feb 4, 2025

igormp said:
Wholesome, thank you for the info!
Great to know that you managed to get it at 4800CL40, I'd consider that great already. I'll be likely buying those same sticks for me along with a 9950x as well.

Needed a BIOS flash on the board too, it was in limp-mode on the 9950X at 0.56GHz and didn't work with the 64GB DIMM until I updated to the December BIOS with AGESA 1.2.0.2b

Also, make sure you go 9000-series, I couldn't get the CUDIMMs working on a 7900X which makes me sad, because that's the bulk of our workstations. I have a nasty feeling AMD don't support them on 7000-series and either don't plan to, or physically can't.

FYI Raptor lake has solid CUDIMM support. Most of the rabbit holes I dove into when hunting for 64GB DIMMs were LGA1700.

Processor	Intel 12600K
Motherboard	Gigabyte Z690 Gaming X
Cooling	CPU: Noctua NH-D15S; Case: 2xNoctua NF-A14, 1xNF-S12A.
Memory	Ballistix Sport LT DDR4 @3600CL16 2*16GB
Video Card(s)	Palit RTX 4080
Storage	Samsung 970 Pro 512GB + Crucial MX500 500gb + WD Red 6TB
Display(s)	Dell S2721qs
Case	Phanteks P300A Mesh
Audio Device(s)	Behringer UMC204HD
Power Supply	Fractal Design Ion+ 560W
Mouse	Glorious Model D-

System Name	stress-less
Processor	9800X3D @ 5.42GHZ
Motherboard	MSI PRO B650M-A Wifi
Cooling	Thermalright Phantom Spirit EVO
Memory	64GB DDR5 6600 1:2 CL36, FCLK 2200
Video Card(s)	RTX 4090 FE
Storage	2TB WD SN850, 4TB WD SN850X
Display(s)	Alienware 32" 4k 240hz OLED
Case	Jonsbo Z20
Audio Device(s)	Yes
Power Supply	Corsair SF750
Mouse	DeathadderV2 X Hyperspeed
Keyboard	65% HE Keyboard
Software	Windows 11
Benchmark Scores	They're pretty good, nothing crazy.

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Bragging Rights
Processor	Atom Z3735F 1.33GHz
Motherboard	It has no markings but it's green
Cooling	No, it's a 2.2W processor
Memory	2GB DDR3L-1333
Video Card(s)	Gen7 Intel HD (4EU @ 311MHz)
Storage	32GB eMMC and 128GB Sandisk Extreme U3
Display(s)	10" IPS 1280x800 60Hz
Case	Veddha T2
Audio Device(s)	Apparently, yes
Power Supply	Samsung 18W 5V fast-charger
Mouse	MX Anywhere 2
Keyboard	Logitech MX Keys (not Cherry MX at all)
VR HMD	Samsung Oddyssey, not that I'd plug it into this though....
Software	W10 21H1, barely
Benchmark Scores	I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

Processor	Ryzen 5 5700x
Motherboard	B550 Elite
Cooling	Thermalright Perless Assassin 120 SE
Memory	32GB Fury Beast DDR4 3200Mhz
Video Card(s)	Gigabyte 3060 ti gaming oc pro
Storage	Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s)	LG 27gp850 1440p 165Hz 27''
Case	Lian Li Lancool II performance
Power Supply	MSI 750w
Mouse	G502

System Name	2nd-hand Hand-me-down V3.0, Mk. VIIb
Processor	Ryzen R7-5700X
Motherboard	ASRock X370
Cooling	Wraith Spire
Memory	2 x 16Gb Mushkin @ 3200Mhz
Video Card(s)	XFX RX 6750 XT
Storage	500 Gb Crucial MX500 x 2, 2Tb WD SA510
Display(s)	LG 31.5.0" 1440p
Case	(early) DeepCool
Audio Device(s)	Ubiquitous Realtek
Power Supply	650W FSP
Mouse	Logitech
Keyboard	Logitech
VR HMD	What?
Software	Yes
Benchmark Scores	[REDACTED]

System Name	Daily driver
Processor	i9 13900k
Motherboard	Z690 Aorus Master
Cooling	Custom loop
Memory	2x16 GB GSkill DDR5 @ 6000
Video Card(s)	RTX4090 FE
Storage	2x 2TB 990 Pro SSD 1x 2TB 970 evo SSD, 1x 4TB HDD
Display(s)	LG 32" 2560x1440
Case	Fractal Design Meshify 2 XL
Audio Device(s)	onboard
Power Supply	beQuiet Dark Power 12 1000W
Mouse	Razer Death adder
Keyboard	Razer blackwidow v3
VR HMD	n/a
Software	Windows 11 pro
Benchmark Scores	Heaven 4.0 @ 2560x1440 270.5 FPS

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

System Name	Test Bench #3
Processor	Ryzen 5 8600G delidded, +200PBO -30CO
Motherboard	B650M-HDV M.2
Cooling	Water, liquid metal
Memory	TRIDENT Z5 @8400CL36
Video Card(s)	760M @3300MHz
Storage	Samsung PM981
Display(s)	MAG401QR
Case	Open Frame
Audio Device(s)	Logitech Z623
Power Supply	EVGA P5
Mouse	Cooler Master MM710
Keyboard	Huntsman Elite
Software	10 Pro
Benchmark Scores	https://hwbot.org/user/luke