AMD Details DeepSeek R1 Performance on Radeon RX 7900 XTX, Confirms Ryzen AI Max Memory Sizes

alwayssts · Jan 30, 2025

mb194dc said:
Great time to uncancel Navi 41 and 42 then ? Bring them to Market with 30 and 36GB of VRAM.

You mean 40/48GB of ram? I doubt it was ever GDDR7, but it's possible.

I think N41 (partially) got canned because they know once people have >80TF and 24GB (essentially a 4090) most ain't upgrading for a long-long time. Those that wanted that at >$1000+ bought a 4090.
Cutting the price of 4080 from $1200 to $1000 probably also had something to do with it, as I think that's where AMD wanted to compete.
Similar reason for the gap in nV products. Why GB203 limited to <80TF (1 less cluster than half GB202 + PL locks) and doesn't have a 24GB option. Gotta milk needing those upgrades as long as possible...
Hence both wanted to get one more cycle in before that happened...or maybe just able to make it for a larger margin given the move to 3nm and 3GB GDDR7 (256-bit instead of 384-bit for 24GB spec).
Something like a $500 BOM (~GB203/N48 size; 100+ KGD per 20k wafer + ~$300 of 3GB GDDR7) makes a lot more sense than making a slightly-slower 4090 for ~$1200 MSRP.
They would've needed 12288sp @ 3640mhz to match a 4090...That's probably impossible if not not close-to-impossible to yield on 4/5nm for a gpu.
We may see w/ N48 3.4ghz is probably difficult-enough to yield within decent power. I say that because if all N48 products can't hit 3.3ghz+ they've kinda failed; might as well buy a 6800xt/7800xt.
I'll be verrryyy curious if (binned) 3x8-pin designs will be able to hit anywhere around ~3.6ghz(+/-?), as that may have been the N4 goal, both (cancelled) large and (non-cancelled) smalls, with 24gbps ram.
Still think something like a 11264sp+ 3nm design is going to be a lot of people's last stop in this market for the most part. People with a 4090 (unless they have to have the best) probably already don't care.
Making ~1920sp*6/96 ROPs is just sooo much cheaper. It would only require 3900mhz to match 4090 which I think is very doable given how current 5nm GPU designs yield against 2.93/3.24 Apple products.
We don't know how N48 yielded against the 3460-3700mhz Apple products yet, or how much power it uses, but it should be interesting. Both clock yields and the power usage for those clocks on the curve.
This could be telling who has the better idea on 3nm.
nVIDIA is probably shooting for 12288sp@3780mhz/36000 like Apple's efficient clock on N3B, while AMD could perhaps be shooting for 11520sp @ ~3.87/40000+, more-similar to Apple's 4050mhz N3P.
Whatever they do, it'll be a lot cheaper to make than a 4090 or whatever AMD wanted to do with N41...chiplet or monolithic.

At any rate, it's fascinating to see what's possible with this deepseek model; it's almost like pure hardware always wins out in the end versus software/marketing bullshit and artificial limitations!
It's amusing to see the hardware limitations exposed when not locked to their ecosystem.

Long-live the Fine Wine of actually well-matched hardware/vram that always prevails in the end.

Beermotor · Jan 30, 2025

hatyii said:
So if the 7900 XTX is faster for AI than the 4090 and AMD mentions that RDNA3 specifically can run this model well because of hardware advantages over RDNA2, explain to me why is the new FSR version was supposed to be exclusive to their new GPUs? I mean even an RTX 2000 GPU can benefit of DLSS, so I'm just confused about these stuff.

IIRC RDNA3 has a lot higher throughput in some floating point formats (e.g. FP32) than Lovelace and vice versa for other formats.

I'm not sure what DLSS/FSR use and don't care.

Vayra86 · Jan 30, 2025

alwayssts said:
I think N41 (partially) got canned because they know once people have >80TF and 24GB (essentially a 4090) most ain't upgrading for a long-long time. Those that wanted that at >$1000+ bought a 4090.
Cutting the price of 4080 from $1200 to $1000 probably also had something to do with it, as I think that's where AMD wanted to compete.
Similar reason for the gap in nV products. Why GB203 limited to <80TF (1 less cluster than half GB202 + PL locks) and doesn't have a 24GB option. Gotta milk needing those upgrades as long as possible...

Very good points

10tothemin9volts · Jan 30, 2025

The DeepSeek-R1-Distill-* models are not the real DeepSeek, it's DeepSeek-R1 (without the Distill in its name) and it's 685B parameters. You can run quants and the smallest one ("IQ1_S") requires around 134GB of memory at the minimum, but some say IQ2_* quants are recommened at the minimum, which is 183GB. If Strix Halo only has 128 GB (quad channel) RAM, then that's not enough and it really should have 256 GB.

System Name	HTPC whhaaaat?
Processor	2600k @ 4500mhz
Motherboard	Asus Maximus IV gene-z gen3
Cooling	Noctua NH-C14
Memory	Gskill Ripjaw 2x4gb
Video Card(s)	EVGA 1080 FTW @ 2037/11016
Storage	2x512GB MX100/1x Agility 3 128gb ssds, Seagate 3TB HDD
Display(s)	Vizio P 65'' 4k tv
Case	Lian Li pc-c50b
Audio Device(s)	Denon 3311
Power Supply	Corsair 620HX

Processor	Ryzen 7 9800X3D
Motherboard	Asus ROG Crosshair x870E Hero
Cooling	Arctic Liquid Freezr II 420mm
Memory	64GB G.Skill DDR5 CAS30 fruity LED RAM
Video Card(s)	Nvidia RTX 4080 (Gigabyte) or a Sapphire Nitro+ 7900XTX depending on planetary alignment.
Storage	3x WD 850whatever 4TB + 2 Spinny disks
Display(s)	Alienware AW3423DWF
Case	Thermaltake Level 20XT E-ATX
Audio Device(s)	Onboard
Power Supply	Super Flower Leadex VII 1000w
Mouse	Logitech g502x
Keyboard	Logitech g915x
Software	Windows 11 Insider Preview

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

Processor	7800X3D @ Curve Optimizer: All Core: -25
Motherboard	TUF Gaming B650-Plus
Memory	2xKSM48E40BD8KM-32HM ECC RAM (ECC enabled in BIOS)
Video Card(s)	4070 @ 110W
Display(s)	SAMSUNG S95B 55" QD-OLED TV
Power Supply	RM850x

AMD Details DeepSeek R1 Performance on Radeon RX 7900 XTX, Confirms Ryzen AI Max Memory Sizes

alwayssts

Beermotor

Vayra86

10tothemin9volts