• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA "Blackwell" GeForce RTX to Feature Same 5nm-based TSMC 4N Foundry Node as GB100 AI GPU

Nvidia can't come close to 40% increase on the same node, & has never achieved this.

Citation needed. Actually, let's just debunk this one right here and now.

TSMC 150nm, NV20 to NV25: 44% aggregate increase.
TSMC 130nm, NV38 to NV40: 63% aggregate increase.
TSMC 90nm, G71 to G80: 88.7% aggregate increase.
TSMC 65nm, G92 to GT200: 49.8% aggregate increase.
TSMC 28nm, GK110 to GM200: 49.3% aggregate increase.
 
Last edited:
Nvidia is usually staggering node change and architecture change - so people moan if it's only a node change without completely new architecture, or when it's a new architecture but on an old node - but they usually bring about the same generational uplift.

The biggest outlier in recent generations was Turing (20xx) in late 2018 on TSMC 12 nm (FinFET), which was just optimized node of 2016 Pascal (10xx), with also basically no raster uplift, the only real generational change was inclusion of tensor cores for RTX, DLSS, which took a long time for game designers to actually implement (and by that time 20xx was basically obsolete).
 
Citation needed. Actually, let's just debunk this one right here and now.

TSMC 150nm, NV20 to NV25: 44% aggregate increase.
TSMC 130nm, NV38 to NV40: 63% aggregate increase.
TSMC 90nm, G71 to G80: 88.7% aggregate increase.
TSMC 65nm, G92 to GT200: 49.8% aggregate increase.
TSMC 28nm, GK110 to GM200: 49.3% aggregate increase.
how about you show & site an actual factual reference instead posting arbitrary claims.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.
 
Whatever architecture comes after Blackwell will consume 2000W at least, so it would be inappropriate to name it after a conventional (slim) scientist and use a 3-digit code. I propose Mr. Sherman Klump and no less than 4 digits. SK1000, SK2000 and so on.
 
how about you show & site an actual factual reference instead posting arbitrary claims.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.
You probably should have specified from the start that your measuring stick is something that’s quite arbitrary and in all essence irrelevant. What matters is actual performance as it is delivered in a finished product. And, for example, the top GM204 (980) was 60% overall faster than the same class previous gen chip in its top version (680/770) while staying on the same node. Anything else is splitting hairs.
I mean, by the same-ish metric Zen 4 is what, only a couple of percent faster than Zen 3? Since if we lock two single-CCD chips to same frequency and run CB or something that would be the result. However, nobody sane is saying that Zen 4 is a minor at best improvement over Zen 3, right?
 
how about you show & site an actual factual reference instead posting arbitrary claims.

AnandTech's review database for the GeForce4 Ti 4600, GeForce 6800 Ultra, GeForce 8800 GTX, GeForce GTX 280, and GeForce GTX Titan X. This is a really simple task of looking at the performance reviews, and also having lived through each era and owned each of those generations.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.

Aggregate means combination of all elements. Manufacturing improvements, clock speed, pipeline/shader block size, architecture improvements, shader optimization, software optimization, API improvements, per-application optimization. Everything rolled into one figure.

If you want a great history lesson, and I highly recommend that you might, check out reviews on NV40 and NV45 in relation to NV38. There you will find your 40% clock-for-clock, millimeter-for-millimeter increase.
 
If you want a great history lesson, and I highly recommend that you might, check out reviews on NV40 and NV45 in relation to NV38. There you will find your 40% clock-for-clock, millimeter-for-millimeter increase.
I mean, if we are really being nerdy and pedantic, I seem to remember that NV40/45 were significantly larger chips than NV38. I think 1.5 times larger physically and nearly double the transistors. I may be not entirely correct here, I am hazy on the Rankine/Curie era, even though it was precisely when I seriously got into hardware.
 
I mean, if we are really being nerdy and pedantic, I seem to remember that NV40/45 were significantly larger chips than NV38. I think 1.5 times larger physically and nearly double the transistors. I may be not entirely correct here, I am hazy on the Rankine/Curie era, even though it was precisely when I seriously got into hardware.

Just shy of 1.4x die size, ~1.6x transistors, but also 4x logical pipelines with associated 1:1 TMU count, and double the vector pipelines, AND clocked lower with a mere 7W (~9%) increase in rated power. If only we had an excellent and detailed database of graphics card specs to use. :)

I pulled aggregate increases off launch-day reviews. Obviously in some of those performance metrics the 6800 did not do well, because driver maturity is a big factor. That's something Rankine never received because it was stuck on its lopsided implementation of DX9a and required per-game tuning to achieve proper scaling from the architecture. Curie is full DX9c and received plentiful driver and software improvements, allowing later performance to eclipse Rankine's by as much as 2.2x. This is why the improvement is aggregated; architectural changes exceed just more transistors more better. NVIDIA was still designing chips using EDL programming and that allowed fundamental changes for very little transistor cost every time the programming model was updated. Designs for SM3.0 were a paradigm shift in that regard.

Rankine's FP forward architecture and dual-issue (2fp/1int) scalar pipelines are an interesting rabbit hole to fall down if you want to see the pitfalls of ASIC design by programming limits. NVIDIA could only ever extract 8px/clock in one or two extremely niche scenarios while the TMU arrangement languished waiting for tex fetches.
 
Last edited:
Just shy of 1.4x die size, ~1.6x transistors, but also 4x logical pipelines with associated 1:1 TMU count, and double the vector pipelines, AND clocked lower with a mere 7W (~9%) increase in rated power. If only we had an excellent and detailed database of graphics card specs to use. :)
I’d look it up on the database, I use it often, but I am currently on my phone and for some reason whenever I start opening several entries to compare it hits me with a captcha thinking I am a killbot from the future and asks me to prove I am not here for Sarah Connor. This gets annoying. And maybe I AM a killbot, what’s with this discrimination? So yeah, that’s why I was using my hazy memory here. Not a bad recollection, actually, seeing how it was 20 years ago.
 
Well so much for the rumor of the RTX 5090 being 100% faster than the 4090. Maybe in Ray-Tracing though.
This is more like the jump from Kepler to Maxwell.

I do think there's a fair amount of room to extract more performance from the same node, though, but not 100% like that one leaker on twitter claimed.

It did seem like with the density increase from Sam 8nm to 4N that Nvidia was not able to extract all the performance they could out of that node. As far as the die size goes, they can go bigger but not much more than 20% bigger. 20% bigger put the GB202 die into TU102 territory.
 
Last edited:
I don’t think nVidia release a beastly 5090 when AMD can’t even match the 4090.
A cut down GB202, 20-25% faster than 4090 and they call it a day.
See you in 2027 again.
 
Back
Top