NVIDIA "Blackwell" GeForce RTX to Feature Same 5nm-based TSMC 4N Foundry Node as GB100 AI GPU

Fouquin · Mar 20, 2024

DemonicRyzen666 said:
Nvidia can't come close to 40% increase on the same node, & has never achieved this.

Citation needed. Actually, let's just debunk this one right here and now.

TSMC 150nm, NV20 to NV25: 44% aggregate increase.
TSMC 130nm, NV38 to NV40: 63% aggregate increase.
TSMC 90nm, G71 to G80: 88.7% aggregate increase.
TSMC 65nm, G92 to GT200: 49.8% aggregate increase.
TSMC 28nm, GK110 to GM200: 49.3% aggregate increase.

Bwaze · Mar 20, 2024

Nvidia is usually staggering node change and architecture change - so people moan if it's only a node change without completely new architecture, or when it's a new architecture but on an old node - but they usually bring about the same generational uplift.

The biggest outlier in recent generations was Turing (20xx) in late 2018 on TSMC 12 nm (FinFET), which was just optimized node of 2016 Pascal (10xx), with also basically no raster uplift, the only real generational change was inclusion of tensor cores for RTX, DLSS, which took a long time for game designers to actually implement (and by that time 20xx was basically obsolete).

DemonicRyzen666 · Mar 20, 2024

Fouquin said:
Citation needed. Actually, let's just debunk this one right here and now.

TSMC 150nm, NV20 to NV25: 44% aggregate increase.
TSMC 130nm, NV38 to NV40: 63% aggregate increase.
TSMC 90nm, G71 to G80: 88.7% aggregate increase.
TSMC 65nm, G92 to GT200: 49.8% aggregate increase.
TSMC 28nm, GK110 to GM200: 49.3% aggregate increase.

how about you show & site an actual factual reference instead posting arbitrary claims.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.

Wirko · Mar 20, 2024

Whatever architecture comes after Blackwell will consume 2000W at least, so it would be inappropriate to name it after a conventional (slim) scientist and use a 3-digit code. I propose Mr. Sherman Klump and no less than 4 digits. SK1000, SK2000 and so on.

Onasi · Mar 20, 2024

DemonicRyzen666 said:
how about you show & site an actual factual reference instead posting arbitrary claims.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.

You probably should have specified from the start that your measuring stick is something that’s quite arbitrary and in all essence irrelevant. What matters is actual performance as it is delivered in a finished product. And, for example, the top GM204 (980) was 60% overall faster than the same class previous gen chip in its top version (680/770) while staying on the same node. Anything else is splitting hairs.
I mean, by the same-ish metric Zen 4 is what, only a couple of percent faster than Zen 3? Since if we lock two single-CCD chips to same frequency and run CB or something that would be the result. However, nobody sane is saying that Zen 4 is a minor at best improvement over Zen 3, right?

Fouquin · Mar 20, 2024

DemonicRyzen666 said:
how about you show & site an actual factual reference instead posting arbitrary claims.

AnandTech's review database for the GeForce4 Ti 4600, GeForce 6800 Ultra, GeForce 8800 GTX, GeForce GTX 280, and GeForce GTX Titan X. This is a really simple task of looking at the performance reviews, and also having lived through each era and owned each of those generations.

DemonicRyzen666 said:
1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.

Aggregate means combination of all elements. Manufacturing improvements, clock speed, pipeline/shader block size, architecture improvements, shader optimization, software optimization, API improvements, per-application optimization. Everything rolled into one figure.

If you want a great history lesson, and I highly recommend that you might, check out reviews on NV40 and NV45 in relation to NV38. There you will find your 40% clock-for-clock, millimeter-for-millimeter increase.

Onasi · Mar 20, 2024

Fouquin said:
If you want a great history lesson, and I highly recommend that you might, check out reviews on NV40 and NV45 in relation to NV38. There you will find your 40% clock-for-clock, millimeter-for-millimeter increase.

I mean, if we are really being nerdy and pedantic, I seem to remember that NV40/45 were significantly larger chips than NV38. I think 1.5 times larger physically and nearly double the transistors. I may be not entirely correct here, I am hazy on the Rankine/Curie era, even though it was precisely when I seriously got into hardware.

Fouquin · Mar 20, 2024

Onasi said:
I mean, if we are really being nerdy and pedantic, I seem to remember that NV40/45 were significantly larger chips than NV38. I think 1.5 times larger physically and nearly double the transistors. I may be not entirely correct here, I am hazy on the Rankine/Curie era, even though it was precisely when I seriously got into hardware.

Just shy of 1.4x die size, ~1.6x transistors, but also 4x logical pipelines with associated 1:1 TMU count, and double the vector pipelines, AND clocked lower with a mere 7W (~9%) increase in rated power. If only we had an excellent and detailed database of graphics card specs to use.

I pulled aggregate increases off launch-day reviews. Obviously in some of those performance metrics the 6800 did not do well, because driver maturity is a big factor. That's something Rankine never received because it was stuck on its lopsided implementation of DX9a and required per-game tuning to achieve proper scaling from the architecture. Curie is full DX9c and received plentiful driver and software improvements, allowing later performance to eclipse Rankine's by as much as 2.2x. This is why the improvement is aggregated; architectural changes exceed just more transistors more better. NVIDIA was still designing chips using EDL programming and that allowed fundamental changes for very little transistor cost every time the programming model was updated. Designs for SM3.0 were a paradigm shift in that regard.

Rankine's FP forward architecture and dual-issue (2fp/1int) scalar pipelines are an interesting rabbit hole to fall down if you want to see the pitfalls of ASIC design by programming limits. NVIDIA could only ever extract 8px/clock in one or two extremely niche scenarios while the TMU arrangement languished waiting for tex fetches.

Onasi · Mar 20, 2024

Fouquin said:
Just shy of 1.4x die size, ~1.6x transistors, but also 4x logical pipelines with associated 1:1 TMU count, and double the vector pipelines, AND clocked lower with a mere 7W (~9%) increase in rated power. If only we had an excellent and detailed database of graphics card specs to use.

I’d look it up on the database, I use it often, but I am currently on my phone and for some reason whenever I start opening several entries to compare it hits me with a captcha thinking I am a killbot from the future and asks me to prove I am not here for Sarah Connor. This gets annoying. And maybe I AM a killbot, what’s with this discrimination? So yeah, that’s why I was using my hazy memory here. Not a bad recollection, actually, seeing how it was 20 years ago.

grammar_phreak · Apr 6, 2024

Well so much for the rumor of the RTX 5090 being 100% faster than the 4090. Maybe in Ray-Tracing though.
This is more like the jump from Kepler to Maxwell.

I do think there's a fair amount of room to extract more performance from the same node, though, but not 100% like that one leaker on twitter claimed.

It did seem like with the density increase from Sam 8nm to 4N that Nvidia was not able to extract all the performance they could out of that node. As far as the die size goes, they can go bigger but not much more than 20% bigger. 20% bigger put the GB202 die into TU102 territory.

gffermari · Apr 7, 2024

I don’t think nVidia release a beastly 5090 when AMD can’t even match the 4090.
A cut down GB202, 20-25% faster than 4090 and they call it a day.
See you in 2027 again.

System Name	S.L.I + RTX research rig
Processor	Ryzen 7 5800X 3D.
Motherboard	MSI MEG ACE X570
Cooling	Corsair H150i Cappellx
Memory	Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s)	2x Dell RTX 2080 Ti in S.L.I
Storage	Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s)	HP X24i
Case	Corsair 7000D Airflow
Power Supply	EVGA G+1600watts
Mouse	Corsair Scimitar
Keyboard	Cosair K55 Pro RGB

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	The Workhorse
Processor	AMD Ryzen R9 5900X
Motherboard	Gigabyte Aorus B550 Pro
Cooling	CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory	GSkill Trident Z 3200CL14
Video Card(s)	NVidia GTX 1070 MSI QuickSilver
Storage	Adata SX8200Pro 1 TB
Display(s)	LG 32GK850G
Case	Fractal Design Torrent (Solid)
Audio Device(s)	Sennheiser HD598, FiiO E-10K DAC/AMP, Samson Meteorite USB Microphone
Power Supply	Corsair RMx850 (2018)
Mouse	Zaopin Z1 Pro on a X-Raypad Heavy Bee Redtail
Keyboard	Cooler Master QuickFire Rapid TKL (Cherry MX Black)
Software	Windows 11 Pro (24H2)

System Name	The Workhorse
Processor	AMD Ryzen R9 5900X
Motherboard	Gigabyte Aorus B550 Pro
Cooling	CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory	GSkill Trident Z 3200CL14
Video Card(s)	NVidia GTX 1070 MSI QuickSilver
Storage	Adata SX8200Pro 1 TB
Display(s)	LG 32GK850G
Case	Fractal Design Torrent (Solid)
Audio Device(s)	Sennheiser HD598, FiiO E-10K DAC/AMP, Samson Meteorite USB Microphone
Power Supply	Corsair RMx850 (2018)
Mouse	Zaopin Z1 Pro on a X-Raypad Heavy Bee Redtail
Keyboard	Cooler Master QuickFire Rapid TKL (Cherry MX Black)
Software	Windows 11 Pro (24H2)

System Name	The Workhorse
Processor	AMD Ryzen R9 5900X
Motherboard	Gigabyte Aorus B550 Pro
Cooling	CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory	GSkill Trident Z 3200CL14
Video Card(s)	NVidia GTX 1070 MSI QuickSilver
Storage	Adata SX8200Pro 1 TB
Display(s)	LG 32GK850G
Case	Fractal Design Torrent (Solid)
Audio Device(s)	Sennheiser HD598, FiiO E-10K DAC/AMP, Samson Meteorite USB Microphone
Power Supply	Corsair RMx850 (2018)
Mouse	Zaopin Z1 Pro on a X-Raypad Heavy Bee Redtail
Keyboard	Cooler Master QuickFire Rapid TKL (Cherry MX Black)
Software	Windows 11 Pro (24H2)

NVIDIA "Blackwell" GeForce RTX to Feature Same 5nm-based TSMC 4N Foundry Node as GB100 AI GPU

Fouquin

Staff

Bwaze

DemonicRyzen666

Wirko

Onasi

Fouquin

Staff

Onasi

Fouquin

Staff

Onasi

grammar_phreak

gffermari

System Name	Not pretty
Processor	Ryzen 9 9950x
Motherboard	Crosshair X870E
Cooling	420mm Arctic LF III, for now
Memory	64GB, DDR5-6000 cl30, G.Skill
Video Card(s)	EVGA FTW3 RTX 3080ti
Storage	1TB Samsung 980 Pro (Win10), 2TB WD SN850X (Win11)
Display(s)	old 27" Viewsonic 1080p, Asus 1080p, Viewsonic 4k
Case	Corsair Obsidian 900D
Power Supply	Super Flower
Benchmark Scores	Cinebench r15, w/ 1680v2 @ 4.6ghz and XMP enabled, 1648 1680v2 @ 4.7ghz RAM @ stock 1333MT/s, 1696

Processor	AMD Ryzen 7 5800X3D
Motherboard	ASUS B550M-Plus WiFi II
Cooling	Noctua U12A chromax.black
Memory	Corsair Vengeance 32GB 3600Mhz
Video Card(s)	Palit RTX 4080 GameRock OC
Storage	Samsung 970 Evo Plus 1TB + 980 Pro 2TB
Display(s)	Acer Nitro XV271UM3B IPS 180Hz
Case	Asus Prime AP201
Audio Device(s)	Creative Gigaworks - Razer Blackshark V2 Pro
Power Supply	Corsair SF750
Mouse	Razer Viper
Keyboard	Asus ROG Falchion
Software	Windows 11 64bit