Tuesday, March 19th 2024

NVIDIA "Blackwell" GeForce RTX to Feature Same 5nm-based TSMC 4N Foundry Node as GB100 AI GPU

Following Monday's blockbuster announcements of the "Blackwell" architecture and NVIDIA's B100, B200, and GB200 AI GPUs, all eyes are now on its client graphics derivatives, or the GeForce RTX GPUs that implement "Blackwell" as a graphics architecture. Leading the effort will be the new GB202 ASIC, a successor to the AD102 powering the current RTX 4090. This will be NVIDIA's biggest GPU with raster graphics and ray tracing capabilities. The GB202 is rumored to be followed by the GB203 in the premium segment, the GB205 a notch lower, and the GB206 further down the stack. Kopite7kimi, a reliable source with NVIDIA leaks, says that the GB202 silicon will be built on the same TSMC 4N foundry node as the GB100.

TSMC 4N is a derivative of the company's mainline N4P node, the "N" in 4N stands for NVIDIA. This is a nodelet that TSMC designed with optimization for NVIDIA SoCs. TSMC still considers the 4N as a derivative of the 5 nm EUV node. There is very little public information on the power- and transistor density improvements of the TSMC 4N over TSMC N5. For reference, the N4P, which TSMC regards as a 5 nm derivative, offers a 6% transistor-density improvement, and a 22% power efficiency improvement. In related news, Kopite7kimi says that with "Blackwell," NVIDIA is focusing on enlarging the L1 caches of the streaming multiprocessors (SM), which suggests a design focus on increasing the performance at an SM-level.
Sources: Kopite7kimi (Twitter), #2, VideoCardz
Add your own comment

60 Comments on NVIDIA "Blackwell" GeForce RTX to Feature Same 5nm-based TSMC 4N Foundry Node as GB100 AI GPU

#51
Bwaze
Nvidia is usually staggering node change and architecture change - so people moan if it's only a node change without completely new architecture, or when it's a new architecture but on an old node - but they usually bring about the same generational uplift.

The biggest outlier in recent generations was Turing (20xx) in late 2018 on TSMC 12 nm (FinFET), which was just optimized node of 2016 Pascal (10xx), with also basically no raster uplift, the only real generational change was inclusion of tensor cores for RTX, DLSS, which took a long time for game designers to actually implement (and by that time 20xx was basically obsolete).
Posted on Reply
#52
DemonicRyzen666
FouquinCitation needed. Actually, let's just debunk this one right here and now.

TSMC 150nm, NV20 to NV25: 44% aggregate increase.
TSMC 130nm, NV38 to NV40: 63% aggregate increase.
TSMC 90nm, G71 to G80: 88.7% aggregate increase.
TSMC 65nm, G92 to GT200: 49.8% aggregate increase.
TSMC 28nm, GK110 to GM200: 49.3% aggregate increase.
how about you show & site an actual factual reference instead posting arbitrary claims.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.
Posted on Reply
#53
Wirko
Whatever architecture comes after Blackwell will consume 2000W at least, so it would be inappropriate to name it after a conventional (slim) scientist and use a 3-digit code. I propose Mr. Sherman Klump and no less than 4 digits. SK1000, SK2000 and so on.
Posted on Reply
#54
Onasi
DemonicRyzen666how about you show & site an actual factual reference instead posting arbitrary claims.

1. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.
You probably should have specified from the start that your measuring stick is something that’s quite arbitrary and in all essence irrelevant. What matters is actual performance as it is delivered in a finished product. And, for example, the top GM204 (980) was 60% overall faster than the same class previous gen chip in its top version (680/770) while staying on the same node. Anything else is splitting hairs.
I mean, by the same-ish metric Zen 4 is what, only a couple of percent faster than Zen 3? Since if we lock two single-CCD chips to same frequency and run CB or something that would be the result. However, nobody sane is saying that Zen 4 is a minor at best improvement over Zen 3, right?
Posted on Reply
#55
Fouquin
DemonicRyzen666how about you show & site an actual factual reference instead posting arbitrary claims.
AnandTech's review database for the GeForce4 Ti 4600, GeForce 6800 Ultra, GeForce 8800 GTX, GeForce GTX 280, and GeForce GTX Titan X. This is a really simple task of looking at the performance reviews, and also having lived through each era and owned each of those generations.
DemonicRyzen6661. If that includes Increase to die size, it's not an aggerate since.
2. if that include an increase in clock speed, it not aggerate either.
Aggregate means combination of all elements. Manufacturing improvements, clock speed, pipeline/shader block size, architecture improvements, shader optimization, software optimization, API improvements, per-application optimization. Everything rolled into one figure.

If you want a great history lesson, and I highly recommend that you might, check out reviews on NV40 and NV45 in relation to NV38. There you will find your 40% clock-for-clock, millimeter-for-millimeter increase.
Posted on Reply
#56
Onasi
FouquinIf you want a great history lesson, and I highly recommend that you might, check out reviews on NV40 and NV45 in relation to NV38. There you will find your 40% clock-for-clock, millimeter-for-millimeter increase.
I mean, if we are really being nerdy and pedantic, I seem to remember that NV40/45 were significantly larger chips than NV38. I think 1.5 times larger physically and nearly double the transistors. I may be not entirely correct here, I am hazy on the Rankine/Curie era, even though it was precisely when I seriously got into hardware.
Posted on Reply
#57
Fouquin
OnasiI mean, if we are really being nerdy and pedantic, I seem to remember that NV40/45 were significantly larger chips than NV38. I think 1.5 times larger physically and nearly double the transistors. I may be not entirely correct here, I am hazy on the Rankine/Curie era, even though it was precisely when I seriously got into hardware.
Just shy of 1.4x die size, ~1.6x transistors, but also 4x logical pipelines with associated 1:1 TMU count, and double the vector pipelines, AND clocked lower with a mere 7W (~9%) increase in rated power. If only we had an excellent and detailed database of graphics card specs to use. :)

I pulled aggregate increases off launch-day reviews. Obviously in some of those performance metrics the 6800 did not do well, because driver maturity is a big factor. That's something Rankine never received because it was stuck on its lopsided implementation of DX9a and required per-game tuning to achieve proper scaling from the architecture. Curie is full DX9c and received plentiful driver and software improvements, allowing later performance to eclipse Rankine's by as much as 2.2x. This is why the improvement is aggregated; architectural changes exceed just more transistors more better. NVIDIA was still designing chips using EDL programming and that allowed fundamental changes for very little transistor cost every time the programming model was updated. Designs for SM3.0 were a paradigm shift in that regard.

Rankine's FP forward architecture and dual-issue (2fp/1int) scalar pipelines are an interesting rabbit hole to fall down if you want to see the pitfalls of ASIC design by programming limits. NVIDIA could only ever extract 8px/clock in one or two extremely niche scenarios while the TMU arrangement languished waiting for tex fetches.
Posted on Reply
#58
Onasi
FouquinJust shy of 1.4x die size, ~1.6x transistors, but also 4x logical pipelines with associated 1:1 TMU count, and double the vector pipelines, AND clocked lower with a mere 7W (~9%) increase in rated power. If only we had an excellent and detailed database of graphics card specs to use. :)
I’d look it up on the database, I use it often, but I am currently on my phone and for some reason whenever I start opening several entries to compare it hits me with a captcha thinking I am a killbot from the future and asks me to prove I am not here for Sarah Connor. This gets annoying. And maybe I AM a killbot, what’s with this discrimination? So yeah, that’s why I was using my hazy memory here. Not a bad recollection, actually, seeing how it was 20 years ago.
Posted on Reply
#59
grammar_phreak
Well so much for the rumor of the RTX 5090 being 100% faster than the 4090. Maybe in Ray-Tracing though.
This is more like the jump from Kepler to Maxwell.

I do think there's a fair amount of room to extract more performance from the same node, though, but not 100% like that one leaker on twitter claimed.

It did seem like with the density increase from Sam 8nm to 4N that Nvidia was not able to extract all the performance they could out of that node. As far as the die size goes, they can go bigger but not much more than 20% bigger. 20% bigger put the GB202 die into TU102 territory.
Posted on Reply
#60
gffermari
I don’t think nVidia release a beastly 5090 when AMD can’t even match the 4090.
A cut down GB202, 20-25% faster than 4090 and they call it a day.
See you in 2027 again.
Posted on Reply
Add your own comment
May 16th, 2024 19:31 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts