• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

Joined
Jun 26, 2023
Messages
110 (0.16/day)
Processor 7800X3D @ Curve Optimizer: All Core: -25
Motherboard TUF Gaming B650-Plus
Memory 2xKSM48E40BD8KM-32HM ECC RAM (ECC enabled in BIOS)
Video Card(s) 4070 @ 110W
Display(s) SAMSUNG S95B 55" QD-OLED TV
Power Supply RM850x
With the AI age being here, we need fast memory and lots of it, so we can host our favorite LLMs locally.
Even Edward Snowden is complaining.
vram.png


We also need DDR6 quad channel (or something entirely new and faster) consumer desktop motherboards with up to 256 or 384GB RAM (a/my current B650 mobo supports only up to 128GB RAM), so we can self-host our favourite big (MOE) LLMs like DeepSeek-R1 (quants: e.g. 1, 2) (real DeepSeek are the ones without the "Distill" in the names) (MOE LLMs run (much) faster than dense LLM, when both have the same number of parameters, DS-R1 tested on 4.gen Epyc server at like 8 tokens/s.) or Llama-3.1 405B quants. Bigger LLMs will always be better than smaller ones (everything else being equal, and not specialized). Please, don't hold humanity back.
 
The response back, amidst peals of laughter:

Dear consumer, silicon production is a finite resource. It's expensive, and there are a lot of people out there who believe the next person who nails AI as something more than a novelty will rewrite our economy (and thus be stupidly rich). As such, the demand for processing cards which mirror or match those that used to be GPUs is functionally infinite. The only reason that we aren't royally paddling your wallet is that anti-trust lawsuits hurt, and now that billion dollar investments are happening even a slightly unfair pricing policy would trigger legal issues that we can entirely avoid and still print money.

In short, either make your own or pay for ours in a market that will take almost any price we can imagine.

-GPU Makers.



P.S. In a little bit of time, when this shell of an industry collapses under its own weight, we won't be bailing ourselves out. Because we now control enough resources critical to the defense industry, we literally are incapable of failing. Please continue your stupid team red vs. team green shenanigans, while we count our green all the way to the bank.




-I like these thought exercises. Imagining Nvidia as a Bond villain with steepled hands in a volcano seems...somehow fitting.
 
There are B650 mbs with 256GB ram support even on budget side (gigabyte b650m ds3h, msi b650 gaming plus wifi...) I'm surprised (well not really cuz Asus) Asus one doesnt have that.


I think more VRAM or more system RAM alone is not a good solution to this and I think AMDs new laptops with up to 128GB shared ram specially targeted for LLM works are better approach. It solves both problems really. GPU power and bandwidth are the next problems but they can only get better from this. And Deepseek kinda showed that LLMs will also get better and will be more usable on consumer devices day by day.
 
There are B650 mbs with 256GB ram support even on budget side (gigabyte b650m ds3h, msi b650 gaming plus wifi...) I'm surprised (well not really cuz Asus) Asus one doesnt have that.


I think more VRAM or more system RAM alone is not a good solution to this and I think AMDs new laptops with up to 128GB shared ram specially targeted for LLM works are better approach. It solves both problems really. GPU power and bandwidth are the next problems but they can only get better from this. And Deepseek kinda showed that LLMs will also get better and will be more usable on consumer devices day by day.

This is where I get to the point of laughing.

Research DeepSeek...I'll wait a second.


1) Company came from nowhere.
2) Company is in China.
3) Company sensors its responses, see 2.
4) Company has breakthrough that is basically repackaging the technique of teaching an LLM with a more complex LLM...and thus requiring it to use less resources.
5) Company is seeking large capital investment...because their "new" ideas will get it.
6) Company has nothing really to show...but through social media manipulation creates a spark of ignorance that burns the monopoly money making Nvidia stupid rich...promising the usual from China. They'll take something complex, copy it incompletely, repackage it, and claim it as their globe leading technology until 6-18 months from now when it fails to deliver anything new and everyone disappears with the money.


I love me some Oroboros levels of cyclical stupidity...but the OP is asking for the horsepower of a supercar while paying for a four-banger. This is genuinely funny, because it's like asking Ferrari to make something like a Kia Sol. It's...if it happened then I'm sure next week it'd rain frogs and absolutely apoplectic former Ferrari personnel.
 
This is where I get to the point of laughing.

Research DeepSeek...I'll wait a second.


1) Company came from nowhere.
2) Company is in China.
3) Company sensors its responses, see 2.
4) Company has breakthrough that is basically repackaging the technique of teaching an LLM with a more complex LLM...and thus requiring it to use less resources.
5) Company is seeking large capital investment...because their "new" ideas will get it.
6) Company has nothing really to show...but through social media manipulation creates a spark of ignorance that burns the monopoly money making Nvidia stupid rich...promising the usual from China. They'll take something complex, copy it incompletely, repackage it, and claim it as their globe leading technology until 6-18 months from now when it fails to deliver anything new and everyone disappears with the money.


I love me some Oroboros levels of cyclical stupidity...but the OP is asking for the horsepower of a supercar while paying for a four-banger. This is genuinely funny, because it's like asking Ferrari to make something like a Kia Sol. It's...if it happened then I'm sure next week it'd rain frogs and absolutely apoplectic former Ferrari personnel.

The Chinese trained model is pretty useless. Only the model structure is useful.

The interesting part will be when a model is trained using western data with deepseek code. Which given it's open source, shouldn't be too long.

Could be useful to run the full model locally, can do with 1TB of ram, are some ideas around using server boards to do it in not too expensively.
 
HBM memory is even better, and I can think of far more useful things to do with my GPUs than play with AI. I think this is coming as the high margins on professional chips will lead to overbuilt capacity. If we are really lucky CPUs will converge to look like Xeon Max, but with more cores and far smaller pricing and then who needs GPUs.
 
With the AI age being here, we need fast memory and lots of it, so we can host our favorite LLMs locally.
Even Edward Snowden is complaining.
vram.png
You'll be seeing such products with higher VRAM amount in their datacenter/workstation offerings with a heavier price tag, not in the consumer space.

Don't expect those to be any cheaper than a 5090. All in all, if you want a large vram consumer product, the 5090 existis for this sole reason.
consumer desktop motherboards with up to 256 or 384GB RAM (a/my current B650 mobo supports only up to 128GB RAM),
Most DDR5 motherboards should support 256GB of RAM. The problem is that there are no 64GB UDIMMs available for sale yet.
 
Most DDR5 motherboards should support 256GB of RAM. The problem is that there are no 64GB UDIMMs available for sale yet.
What would be nicer is to have more channels. Why are we limited to only 4 sticks ?
 
What would be nicer is to have more channels. Why are we limited to only 4 sticks ?
More channels = more CPU pins = more mobo traces = higher costs.
Even Threadripper non-pro only does 1DPC with its 4 channels, likely for market segmentation reasons.
 
You'd need an entirely new I/O die for that as well, TR reuses the EPYC(?) ones IIRC.
 
What is being asked for simply isn't consumer equipment. There is not a single thing stopping you from buying any of the things you want. That is what they are.....wants. They are not needs. Humanity needs a lot of things......nothing pertaining to computer equipment even comes close to making the list.
 
Don’t chase tech… just wait for the parts that brings you at least 2-4x performance upgrade. Currently my 9900k compared to 9800x3d is around 2.7x for gaming and rtx3090 to rtx5090 is around 2.6x and after this year I should be looking at over 3.5x performance upgrade over my current pc…

I went from 1070ti to 3090 and that was just over 2x but I reckon I could have held out for 40 series but I upgraded my monitor 3440x1440 and the 1070ti can play several titles well but a lot of the newer games struggled this was back 2020
 
If only xpoint wasn’t dropped, they could have made a killing in the ai boom
 
They won't see your swan song here, go to their websites and email them. h
Heck get on their social media and their forums and make your requests be known there.
 
More channels = more CPU pins = more mobo traces = higher costs.
Even Threadripper non-pro only does 1DPC with its 4 channels, likely for market segmentation reasons.

Market segmentation is right, and that's why Intel is having problems right now. They had perfectly good competitor to GPUs in Xeon Phi cards, but they had to market segment it away to prevent Xeon Phis competing with their server processors. Result - lost market share and lost revenue streams they need to stay competitive.

What is being asked for simply isn't consumer equipment. There is not a single thing stopping you from buying any of the things you want. That is what they are.....wants. They are not needs. Humanity needs a lot of things......nothing pertaining to computer equipment even comes close to making the list.
You could say a computer capable of running nuclear explosion simulations is not consumer equipment either, and yet, quite likely, you have a phone in your pocket. Consumer equipment is simply what consumer is willing to buy. Some of it is frivolous, and some becomes indispensable as use increases.
 
U$ 9999 is the best I can do.

It won't happen because all the companies can sell the precious silicon at much better prices directly to hyperscalers.

However, it seems to me that the current trend of running large LLMs locally will initially make powerful APUs like the Halo Strix scarce. In a second phase, however, it will stimulate the development of bigger and better APUs. Just my theory, but I believe this will bring significant changes to the market.

The big three players will likely try to sell CPU+GPU as a single product, effectively eliminating the low-end and mid-end dGPU market in the medium term.
 
The self-entitlement from the OP is exactly what I've come to expect from "AI" companies and the people who believe those companies are in any way shape or form useful to humanity.

You could say a computer capable of running nuclear explosion simulations is not consumer equipment either, and yet, quite likely, you have a phone in your pocket.
"Capable of running" is not in the same solar system as "good at running". If you want the latter for a nonstandard consumer use case you're not a consumer, you're a professional, and you need to pull the stick outta your a** and pony up the cash for professional products.
 
Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs
No, we really don't. DeepSeek has proven very well that one needs only a 6GB GPU and a RaspberryPi4 or 5 to get the job done in a very good way. Your "request" is not very applicable to the general consumer anyway.

Seriously, how many people need AI at all? Hmm? What would they use it for?
(those are rhetorical questions, meaning they do not need answers)

Out of touch is my vote.
 
Right now Apple M chips with the unified memory are crushing local LLM development. I think as AI devs flock to apple devices nvidia will react and release their N1X chips with unified memory or start offering higher GB consumer cards.

Deepseek 14B model is capable of fitting into the 4090's buffer, but it's far inferior to the 32B model that's available (like if you ask it to code a typescript website, it will create jsx files and make a bunch of basic mistakes) IMO the 32B model is better than chat 4o. 32B runs brutally slow and only uses 60% GPU since it runs out of framebuffer -- I would love to be able to run the larger models.

The models that run on a 6GB gpu have a difficult time answering simple math problems.

1738677675131.png


Something like this is probably an ideal development workstatoin if you want to run local LLM - $/performance.
 
Last edited:
Used 3090s will be the cheapest ticket to running stuff locally for at least a couple more years. Sure, "Arc Pro B580 24GB" or whatever Intel will name it is on the way but nothing will work out of the box on that, ever + it's still going to be slower than a 3090. Then you have 4090s and 5090s. It is what it is - I'd love to see more options but let's be real they're not coming because there's a ton more money to be made elsewhere for anyone producing GPUs. Nobody's running a charity in this biz.
 
The self-entitlement from the OP is exactly what I've come to expect from "AI" companies and the people who believe those companies are in any way shape or form useful to humanity.

Sam Altman was talking about how we need to reform the social contract....
 
Used 3090s will be the cheapest ticket to running stuff locally for at least a couple more years. Sure, "Arc Pro B580 24GB" or whatever Intel will name it is on the way but nothing will work out of the box on that, ever + it's still going to be slower than a 3090. Then you have 4090s and 5090s. It is what it is - I'd love to see more options but let's be real they're not coming because there's a ton more money to be made elsewhere for anyone producing GPUs. Nobody's running a charity in this biz.
but even then you only have 24GB or 32gb if you pony up $2k and can even find a 5090.

if you get a refurb M chip you can get 64GB unified ~ 50-55GB usable to load in the model for $2500 or over 85GB if you can get the 96GB version for $3200.

For the same money you would build a 5090 rig. -- Granted the models will run alot slower, but if you're looking for ram size m4 max might be the best price/performance.
 
Last edited:
Seriously, how many people need AI at all? Hmm? What would they use it for?
I have like two uses for it and:
1-Image generation is an ethics minefield to say the least due to the lack of proper crediting/repayment to the people whose works were used to build the AI's models
2-Instead of spending a week trying to make the AI draw whatever it is I want, I could probably ask a human and get a better result in less time.
3-Asking anything written from an AI is bound to bring in errors of varied nature. I'm better off reading and writing stuff by myself.
 
Back
Top