• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Details DeepSeek R1 Performance on Radeon RX 7900 XTX, Confirms Ryzen AI Max Memory Sizes

Great time to uncancel Navi 41 and 42 then ? Bring them to Market with 30 and 36GB of VRAM.

You mean 40/48GB of ram? I doubt it was ever GDDR7, but it's possible.

I think N41 (partially) got canned because they know once people have >80TF and 24GB (essentially a 4090) most ain't upgrading for a long-long time. Those that wanted that at >$1000+ bought a 4090.
Cutting the price of 4080 from $1200 to $1000 probably also had something to do with it, as I think that's where AMD wanted to compete.
Similar reason for the gap in nV products. Why GB203 limited to <80TF (1 less cluster than half GB202 + PL locks) and doesn't have a 24GB option. Gotta milk needing those upgrades as long as possible...
Hence both wanted to get one more cycle in before that happened...or maybe just able to make it for a larger margin given the move to 3nm and 3GB GDDR7 (256-bit instead of 384-bit for 24GB spec).
Something like a $500 BOM (~GB203/N48 size; 100+ KGD per 20k wafer + ~$300 of 3GB GDDR7) makes a lot more sense than making a slightly-slower 4090 for ~$1200 MSRP.
They would've needed 12288sp @ 3640mhz to match a 4090...That's probably impossible if not not close-to-impossible to yield on 4/5nm for a gpu.
We may see w/ N48 3.4ghz is probably difficult-enough to yield within decent power. I say that because if all N48 products can't hit 3.3ghz+ they've kinda failed; might as well buy a 6800xt/7800xt.
I'll be verrryyy curious if (binned) 3x8-pin designs will be able to hit anywhere around ~3.6ghz(+/-?), as that may have been the N4 goal, both (cancelled) large and (non-cancelled) smalls, with 24gbps ram.
Still think something like a 11264sp+ 3nm design is going to be a lot of people's last stop in this market for the most part. People with a 4090 (unless they have to have the best) probably already don't care.
Making ~1920sp*6/96 ROPs is just sooo much cheaper. It would only require 3900mhz to match 4090 which I think is very doable given how current 5nm GPU designs yield against 2.93/3.24 Apple products.
We don't know how N48 yielded against the 3460-3700mhz Apple products yet, or how much power it uses, but it should be interesting. Both clock yields and the power usage for those clocks on the curve.
This could be telling who has the better idea on 3nm.
nVIDIA is probably shooting for 12288sp@3780mhz/36000 like Apple's efficient clock on N3B, while AMD could perhaps be shooting for 11520sp @ ~3.87/40000+, more-similar to Apple's 4050mhz N3P.
Whatever they do, it'll be a lot cheaper to make than a 4090 or whatever AMD wanted to do with N41...chiplet or monolithic.

At any rate, it's fascinating to see what's possible with this deepseek model; it's almost like pure hardware always wins out in the end versus software/marketing bullshit and artificial limitations!
It's amusing to see the hardware limitations exposed when not locked to their ecosystem.

Long-live the Fine Wine of actually well-matched hardware/vram that always prevails in the end.
 
Last edited:
So if the 7900 XTX is faster for AI than the 4090 and AMD mentions that RDNA3 specifically can run this model well because of hardware advantages over RDNA2, explain to me why is the new FSR version was supposed to be exclusive to their new GPUs? I mean even an RTX 2000 GPU can benefit of DLSS, so I'm just confused about these stuff.

IIRC RDNA3 has a lot higher throughput in some floating point formats (e.g. FP32) than Lovelace and vice versa for other formats.

I'm not sure what DLSS/FSR use and don't care.
 
I think N41 (partially) got canned because they know once people have >80TF and 24GB (essentially a 4090) most ain't upgrading for a long-long time. Those that wanted that at >$1000+ bought a 4090.
Cutting the price of 4080 from $1200 to $1000 probably also had something to do with it, as I think that's where AMD wanted to compete.
Similar reason for the gap in nV products. Why GB203 limited to <80TF (1 less cluster than half GB202 + PL locks) and doesn't have a 24GB option. Gotta milk needing those upgrades as long as possible...
Very good points
 
The DeepSeek-R1-Distill-* models are not the real DeepSeek, it's DeepSeek-R1 (without the Distill in its name) and it's 685B parameters. You can run quants and the smallest one ("IQ1_S") requires around 134GB of memory at the minimum, but some say IQ2_* quants are recommened at the minimum, which is 183GB. If Strix Halo only has 128 GB (quad channel) RAM, then that's not enough and it really should have 256 GB.
 
Back
Top