• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA "Blackwell" GeForce RTX to Feature Same 5nm-based TSMC 4N Foundry Node as GB100 AI GPU

Joined
May 30, 2015
Messages
1,885 (0.57/day)
Location
Seattle, WA
Nvidia can't come close to 40% increase on the same node, & has never achieved this.

Citation needed. Actually, let's just debunk this one right here and now.

TSMC 150nm, NV20 to NV25: 44% aggregate increase.
TSMC 130nm, NV38 to NV40: 63% aggregate increase.
TSMC 90nm, G71 to G80: 88.7% aggregate increase.
TSMC 65nm, G92 to GT200: 49.8% aggregate increase.
TSMC 28nm, GK110 to GM200: 49.3% aggregate increase.
 
Last edited:
Joined
May 11, 2018
Messages
1,009 (0.46/day)
Nvidia is usually staggering node change and architecture change - so people moan if it's only a node change without completely new architecture, or when it's a new architecture but on an old node - but they usually bring about the same generational uplift.

The biggest outlier in recent generations was Turing (20xx) in late 2018 on TSMC 12 nm (FinFET), which was just optimized node of 2016 Pascal (10xx), with also basically no raster uplift, the only real generational change was inclusion of tensor cores for RTX, DLSS, which took a long time for game designers to actually implement (and by that time 20xx was basically obsolete).
 
Joined
Apr 30, 2020
Messages
868 (0.58/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 16Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
Citation needed. Actually, let's just debunk this one right here and now.

TSMC 150nm, NV20 to NV25: 44% aggregate increase.
TSMC 130nm, NV38 to NV40: 63% aggregate increase.
TSMC 90nm, G71 to G80: 88.7% aggregate increase.
TSMC 65nm, G92 to GT200: 49.8% aggregate increase.
TSMC 28nm, GK110 to GM200: 49.3% aggregate increase.
how about you show & site an actual factual reference instead posting arbitrary claims.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.
 
Joined
Jan 3, 2021
Messages
2,815 (2.26/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Whatever architecture comes after Blackwell will consume 2000W at least, so it would be inappropriate to name it after a conventional (slim) scientist and use a 3-digit code. I propose Mr. Sherman Klump and no less than 4 digits. SK1000, SK2000 and so on.
 
Joined
Nov 27, 2023
Messages
1,257 (6.72/day)
System Name The Workhorse
Processor AMD Ryzen R9 5900X
Motherboard Gigabyte Aorus B550 Pro
Cooling CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory GSkill Trident Z 3200CL14
Video Card(s) NVidia GTX 1070 MSI QuickSilver
Storage Adata SX8200Pro
Display(s) LG 32GK850G
Case Fractal Design Torrent
Audio Device(s) FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply Corsair RMx850 (2018)
Mouse Razer Viper (Original)
Keyboard Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software Windows 11 Pro (23H2)
how about you show & site an actual factual reference instead posting arbitrary claims.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.
You probably should have specified from the start that your measuring stick is something that’s quite arbitrary and in all essence irrelevant. What matters is actual performance as it is delivered in a finished product. And, for example, the top GM204 (980) was 60% overall faster than the same class previous gen chip in its top version (680/770) while staying on the same node. Anything else is splitting hairs.
I mean, by the same-ish metric Zen 4 is what, only a couple of percent faster than Zen 3? Since if we lock two single-CCD chips to same frequency and run CB or something that would be the result. However, nobody sane is saying that Zen 4 is a minor at best improvement over Zen 3, right?
 
Joined
May 30, 2015
Messages
1,885 (0.57/day)
Location
Seattle, WA
how about you show & site an actual factual reference instead posting arbitrary claims.

AnandTech's review database for the GeForce4 Ti 4600, GeForce 6800 Ultra, GeForce 8800 GTX, GeForce GTX 280, and GeForce GTX Titan X. This is a really simple task of looking at the performance reviews, and also having lived through each era and owned each of those generations.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.

Aggregate means combination of all elements. Manufacturing improvements, clock speed, pipeline/shader block size, architecture improvements, shader optimization, software optimization, API improvements, per-application optimization. Everything rolled into one figure.

If you want a great history lesson, and I highly recommend that you might, check out reviews on NV40 and NV45 in relation to NV38. There you will find your 40% clock-for-clock, millimeter-for-millimeter increase.
 
Joined
Nov 27, 2023
Messages
1,257 (6.72/day)
System Name The Workhorse
Processor AMD Ryzen R9 5900X
Motherboard Gigabyte Aorus B550 Pro
Cooling CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory GSkill Trident Z 3200CL14
Video Card(s) NVidia GTX 1070 MSI QuickSilver
Storage Adata SX8200Pro
Display(s) LG 32GK850G
Case Fractal Design Torrent
Audio Device(s) FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply Corsair RMx850 (2018)
Mouse Razer Viper (Original)
Keyboard Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software Windows 11 Pro (23H2)
If you want a great history lesson, and I highly recommend that you might, check out reviews on NV40 and NV45 in relation to NV38. There you will find your 40% clock-for-clock, millimeter-for-millimeter increase.
I mean, if we are really being nerdy and pedantic, I seem to remember that NV40/45 were significantly larger chips than NV38. I think 1.5 times larger physically and nearly double the transistors. I may be not entirely correct here, I am hazy on the Rankine/Curie era, even though it was precisely when I seriously got into hardware.
 
Joined
May 30, 2015
Messages
1,885 (0.57/day)
Location
Seattle, WA
I mean, if we are really being nerdy and pedantic, I seem to remember that NV40/45 were significantly larger chips than NV38. I think 1.5 times larger physically and nearly double the transistors. I may be not entirely correct here, I am hazy on the Rankine/Curie era, even though it was precisely when I seriously got into hardware.

Just shy of 1.4x die size, ~1.6x transistors, but also 4x logical pipelines with associated 1:1 TMU count, and double the vector pipelines, AND clocked lower with a mere 7W (~9%) increase in rated power. If only we had an excellent and detailed database of graphics card specs to use. :)

I pulled aggregate increases off launch-day reviews. Obviously in some of those performance metrics the 6800 did not do well, because driver maturity is a big factor. That's something Rankine never received because it was stuck on its lopsided implementation of DX9a and required per-game tuning to achieve proper scaling from the architecture. Curie is full DX9c and received plentiful driver and software improvements, allowing later performance to eclipse Rankine's by as much as 2.2x. This is why the improvement is aggregated; architectural changes exceed just more transistors more better. NVIDIA was still designing chips using EDL programming and that allowed fundamental changes for very little transistor cost every time the programming model was updated. Designs for SM3.0 were a paradigm shift in that regard.

Rankine's FP forward architecture and dual-issue (2fp/1int) scalar pipelines are an interesting rabbit hole to fall down if you want to see the pitfalls of ASIC design by programming limits. NVIDIA could only ever extract 8px/clock in one or two extremely niche scenarios while the TMU arrangement languished waiting for tex fetches.
 
Last edited:
Joined
Nov 27, 2023
Messages
1,257 (6.72/day)
System Name The Workhorse
Processor AMD Ryzen R9 5900X
Motherboard Gigabyte Aorus B550 Pro
Cooling CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory GSkill Trident Z 3200CL14
Video Card(s) NVidia GTX 1070 MSI QuickSilver
Storage Adata SX8200Pro
Display(s) LG 32GK850G
Case Fractal Design Torrent
Audio Device(s) FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply Corsair RMx850 (2018)
Mouse Razer Viper (Original)
Keyboard Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software Windows 11 Pro (23H2)
Just shy of 1.4x die size, ~1.6x transistors, but also 4x logical pipelines with associated 1:1 TMU count, and double the vector pipelines, AND clocked lower with a mere 7W (~9%) increase in rated power. If only we had an excellent and detailed database of graphics card specs to use. :)
I’d look it up on the database, I use it often, but I am currently on my phone and for some reason whenever I start opening several entries to compare it hits me with a captcha thinking I am a killbot from the future and asks me to prove I am not here for Sarah Connor. This gets annoying. And maybe I AM a killbot, what’s with this discrimination? So yeah, that’s why I was using my hazy memory here. Not a bad recollection, actually, seeing how it was 20 years ago.
 
Joined
Jul 4, 2018
Messages
110 (0.05/day)
Location
Seattle area, Wa
System Name We call it 'x79', aka The x79 system
Processor E5 1680 v2
Motherboard Rampage IV Extreme
Cooling Soft tube loop w/ Black Ice GTX 360 and EK Supremacy Evo
Memory 32gb (4x8gb) 2400mhz cl11 Corsair Dominator
Video Card(s) Radeon VII, XFX with Samsung HBM dies
Storage Samsung 860 EVO 1tb
Display(s) old 27" Viewsonic 1080p, Asus 1080p, Viewsonic 4k
Case Thermaltake Core x9
Power Supply Corsair HX1200
Benchmark Scores Cinebench r15, w/ 1680v2 @ 4.6ghz and XMP enabled, 1648 1680v2 @ 4.7ghz RAM @ stock 1333mhz, 1696
Well so much for the rumor of the RTX 5090 being 100% faster than the 4090. Maybe in Ray-Tracing though.
This is more like the jump from Kepler to Maxwell.

I do think there's a fair amount of room to extract more performance from the same node, though, but not 100% like that one leaker on twitter claimed.

It did seem like with the density increase from Sam 8nm to 4N that Nvidia was not able to extract all the performance they could out of that node. As far as the die size goes, they can go bigger but not much more than 20% bigger. 20% bigger put the GB202 die into TU102 territory.
 
Last edited:
Joined
Apr 14, 2022
Messages
671 (0.86/day)
Location
London, UK
Processor AMD Ryzen 7 5800X3D
Motherboard ASUS B550M-Plus WiFi II
Cooling Noctua U12A chromax.black
Memory Corsair Vengeance 32GB 3600Mhz
Video Card(s) Palit RTX 4080 GameRock OC
Storage Samsung 970 Evo Plus 1TB + 980 Pro 2TB
Display(s) Asus XG35VQ
Case Asus Prime AP201
Audio Device(s) Creative Gigaworks - Razer Blackshark V2 Pro
Power Supply Corsair SF750
Mouse Razer Viper
Software Windows 11 64bit
I don’t think nVidia release a beastly 5090 when AMD can’t even match the 4090.
A cut down GB202, 20-25% faster than 4090 and they call it a day.
See you in 2027 again.
 
Top