• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

NVIDIA GB202 "Blackwell" Die Exposed, Shows the Massive 24,576 CUDA Core Configuration

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
3,141 (1.10/day)
A die-shot of NVIDIA's GB202, the silicon powering the RTX 5090, has surfaced online, providing detailed insights into the "Blackwell" architecture's physical layout. The annotated images, shared by hardware analyst Kurnal and provided by ASUS China general manager Tony Yu, compare the GB202 to its AD102 predecessor and outline key architectural components. The die's central region houses 128 MB of L2 cache (96 MB enabled on RTX 5090), surrounded by memory interfaces. Eight 64-bit memory controllers support the 512-bit GDDR7 interface, with physical interfaces positioned along the top, left, and right edges of the die. Twelve graphics processing clusters (GPCs) surround the central cache. Each GPC contains eight texture processing clusters (TPCs), with each GPC housing 16 streaming multiprocessors (SMs). The complete die configuration enables 24,576 CUDA cores, arranged as 128 cores per SM across 192 SMs. With RTX 5090 offering "only" 21,760 CUDA cores, this means that the full GB202 die is reserved for workstation GPUs.

The SM design includes four slices sharing 128 KB of L1 cache and four texture mapping units (TMUs). Individual SM slices contain dedicated register files, L0 instruction caches, warp schedulers, load-store units, and special function units. Central to the die's layout is a vertical strip containing the media processing components—NVENC and NVDEC units—running from top to bottom. The RTX 5090 implementation enables three of four available NVENC encoders and two of four NVDEC decoders. The die includes twelve raster engine/3D FF blocks for geometry processing. At the bottom edge sits the PCIe 5.0 x16 interface and display controller components. Despite its substantial size, the GB202 remains smaller than NVIDIA's previous GH100 and GV100 dies, which exceeded 814 mm². Each SM integrates specialized hardware, including new 5th-generation Tensor cores and 4th-generation RT cores, contributing to the die's total of 192 RT cores, 768 Tensor cores, and 768 texture units.



View at TechPowerUp Main Site | Source
 
GEEZUZ looks like a wicked construction :nutkick:
 
Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.
 
Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.
seems to me the real start of Ai GPU presentation
 
Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.
The chip is simply too large to realistically segment products like that because of yields.
 
The chip is simply too large to realistically segment products like that because of yields.
Here are the last few generation chip sizes. Looks like normal gen to gen variation to me. I don’t see anything out of the ordinary.

1737979677741.png
 
News heading mentions wrong no - 756 instead of 576 :)
 
How so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.
That's not how this works, throwing away 50% of the wafer to turn a 5090 into a 5060 is ridiculous, yields are much better if you simply make a chip 50% smaller.

By the way -50% shaders wouldn't mean a 5060, it would mean a 5080, which is exactly what that GPU is, half of a GB202. Except that they didn't chose to simply disable half of a GB202 but instead they made a different chip because that's way more cost effective.
 
That's not how this works, throwing away 50% of the wafer to turn a 5090 into a 5060 is ridiculous, yields are much better if you simply make a chip 50% smaller.

What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?

By the way -50% shaders wouldn't mean a 5060, it would mean a 5080

According to the greedy black-leather-jacketed shitshow products? I guess he tests his client's intelligence.
 
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting
Yes. If it made sense to use a GB102 for lower tier products the greedy black-leather-jacketed CEO would have done that instead, it's obvious.
 
Future 5090 Ti GPU? For only 3000$ MSRP!
 
Yes. If it made sense to use a GB102 for lower tier products the greedy black-leather-jacketed CEO would have done that instead, it's obvious.

It doesn't make sense. It was estimated that the cost to make one RTX 5090 is between 450$ and 500$. Selling the defective dies for anything above those values is profits, still higher than throwing the materials (expensive wafers) in the bin.
 
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?
Depends on what exactly yield and defect patterns are. Generally, if it is mass-produced and sold as a product the yield numbers for dies suitable for some SKU are not as bad as you'd expect. GPUs are huge but contain a lot of identical parallel units. Disable a few and there you go. If indeed you need to resort to disabling half a chip, then producing that thing in the first place is pretty suspect. Not that one or another company has not manufactured dies with horrible-horrible yields but these are exceptions rather than a rule.
 
5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.
 
@AleksandarK Brother, that math ain't mathin'. 128 CUDA cores*192 SM = 24576, not 24756 ;)

5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.

AD102 never shipped in a full configuration even at the enterprise segment, wouldn't be surprised if this happened again tbh
 
Selling the defective dies for anything above those values is profits bin.
No, you still don't understand. In order to sell those defective dies it must make sense to waste that much of a wafer vs a wafer with much smaller chips. The yields don't scale linearly, you waste way more space with bigger chips because you can have defects which make the entire die unusable and you can't just simply disable SMs and then use it in a lower end SKU, so instead of losing 350mm^2, you lose 750mm^2, or whatever.
 
Here are the last few generation chip sizes. Looks like normal gen to gen variation to me. I don’t see anything out of the ordinary.
Next in line the 6090 with a 600mm2 die with 24576 enabled out of 30720 and 384 bit bus because it's impossible to fit 512. That's the evolution. Which probably means 12288 for the 6080.
 
How so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.
The main idea of a line is that you won't even be getting defect rates that high, meaning that your GB202 chip will be at worst something like 70~80% defective. If you often get chips worse than that, then there's no reason to even fab that chip to begin with.
5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.
That's considering only the consumer RTX parts, those same chips are also used for their (née) Tesla/Quadro cards.
I thought it did... Was it really almost impossible to make a fully functional chip?
The high-end AD102 only had 2SMs disabled IIRC. I guess that's a good safety margin on such a big chip, or maybe they indeed couldn't get it 100% functioning often enough to give it a proper product.
 
I thought it did... Was it really almost impossible to make a fully functional chip?

I believe so. The RTX 6000 Ada Generation has 142 out of the 144 SMs enabled, with the RTX 4090 coming in at 128 out of 144. A full L2 cache slice is also disabled on the 4090, reducing L2 from 96 to 72 MB.
 
AD102 never shipped in a full configuration even at the enterprise segment, wouldn't be surprised if this happened again tbh
Truth.

The number of defects in the silicon, during lithography and production on a chip this complex rules out a full chip being feasible. I’m sure they get some that have all their parts working, I would guess the my keep that for internal use.

All it take is a few atoms of carbon or aluminum at these node sizes.
 
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?

Technically you could call the 5090 a salvage part, as it is not fully enabled for yield reasons.
 
The main idea of a line is that you won't even be getting defect rates that high, meaning that your GB202 chip will be at worst something like 70~80% defective. If you often get chips worse than that, then there's no reason to even fab that chip to begin with.

That's considering only the consumer RTX parts, those same chips are also used for their (née) Tesla/Quadro cards.

The high-end AD102 only had 2SMs disabled IIRC. I guess that's a good safety margin on such a big chip, or maybe they indeed couldn't get it 100% functioning often enough to give it a proper product.
We know that TSMC's N5 had a defect rate of 0.1 per square centimeter in the summer of 2020. Plugging in the numbers for the 5090 suggests a yield of 49% for fully functional dies. After harvesting defective dies and fusing off damaged portions, the yields must be fairly high.
 
Back
Top