NVIDIA GB202 "Blackwell" Die Exposed, Shows the Massive 24,576 CUDA Core Configuration

AleksandarK · Jan 27, 2025

A die-shot of NVIDIA's GB202, the silicon powering the RTX 5090, has surfaced online, providing detailed insights into the "Blackwell" architecture's physical layout. The annotated images, shared by hardware analyst Kurnal and provided by ASUS China general manager Tony Yu, compare the GB202 to its AD102 predecessor and outline key architectural components. The die's central region houses 128 MB of L2 cache (96 MB enabled on RTX 5090), surrounded by memory interfaces. Eight 64-bit memory controllers support the 512-bit GDDR7 interface, with physical interfaces positioned along the top, left, and right edges of the die. Twelve graphics processing clusters (GPCs) surround the central cache. Each GPC contains eight texture processing clusters (TPCs), with each GPC housing 16 streaming multiprocessors (SMs). The complete die configuration enables 24,576 CUDA cores, arranged as 128 cores per SM across 192 SMs. With RTX 5090 offering "only" 21,760 CUDA cores, this means that the full GB202 die is reserved for workstation GPUs.

The SM design includes four slices sharing 128 KB of L1 cache and four texture mapping units (TMUs). Individual SM slices contain dedicated register files, L0 instruction caches, warp schedulers, load-store units, and special function units. Central to the die's layout is a vertical strip containing the media processing components—NVENC and NVDEC units—running from top to bottom. The RTX 5090 implementation enables three of four available NVENC encoders and two of four NVDEC decoders. The die includes twelve raster engine/3D FF blocks for geometry processing. At the bottom edge sits the PCIe 5.0 x16 interface and display controller components. Despite its substantial size, the GB202 remains smaller than NVIDIA's previous GH100 and GV100 dies, which exceeded 814 mm². Each SM integrates specialized hardware, including new 5th-generation Tensor cores and 4th-generation RT cores, contributing to the die's total of 192 RT cores, 768 Tensor cores, and 768 texture units.

View at TechPowerUp Main Site | Source

r.h.p · Jan 27, 2025

GEEZUZ looks like a wicked construction :nutkick:

Bruno Vieira · Jan 27, 2025

Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.

r.h.p · Jan 27, 2025

Bruno Vieira said:
Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.

seems to me the real start of Ai GPU presentation

Vya Domus · Jan 27, 2025

Bruno Vieira said:
Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.

The chip is simply too large to realistically segment products like that because of yields.

Daven · Jan 27, 2025

Vya Domus said:
The chip is simply too large to realistically segment products like that because of yields.

Here are the last few generation chip sizes. Looks like normal gen to gen variation to me. I don’t see anything out of the ordinary.

3valatzy · Jan 27, 2025

Vya Domus said:
The chip is simply too large to realistically segment products like that because of yields.

How so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.

rodneyhchef · Jan 27, 2025

News heading mentions wrong no - 756 instead of 576

Vya Domus · Jan 27, 2025

3valatzy said:
How so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.

That's not how this works, throwing away 50% of the wafer to turn a 5090 into a 5060 is ridiculous, yields are much better if you simply make a chip 50% smaller.

By the way -50% shaders wouldn't mean a 5060, it would mean a 5080, which is exactly what that GPU is, half of a GB202. Except that they didn't chose to simply disable half of a GB202 but instead they made a different chip because that's way more cost effective.

3valatzy · Jan 27, 2025

Vya Domus said:
That's not how this works, throwing away 50% of the wafer to turn a 5090 into a 5060 is ridiculous, yields are much better if you simply make a chip 50% smaller.

What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?

Vya Domus said:
By the way -50% shaders wouldn't mean a 5060, it would mean a 5080

According to the greedy black-leather-jacketed shitshow products? I guess he tests his client's intelligence.

Vya Domus · Jan 27, 2025

3valatzy said:
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting

Yes. If it made sense to use a GB102 for lower tier products the greedy black-leather-jacketed CEO would have done that instead, it's obvious.

Prima.Vera · Jan 27, 2025

Future 5090 Ti GPU? For only 3000$ MSRP!

3valatzy · Jan 27, 2025

Vya Domus said:
Yes. If it made sense to use a GB102 for lower tier products the greedy black-leather-jacketed CEO would have done that instead, it's obvious.

It doesn't make sense. It was estimated that the cost to make one RTX 5090 is between 450$ and 500$. Selling the defective dies for anything above those values is profits, still higher than throwing the materials (expensive wafers) in the bin.

Daven · Jan 27, 2025

Prima.Vera said:
Future 5090 Ti GPU? For only 3000$ MSRP!

Don’t you mean future 5090 price hike to $3000 to make up for AI business loss to China?

londiste · Jan 27, 2025

3valatzy said:
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?

Depends on what exactly yield and defect patterns are. Generally, if it is mass-produced and sold as a product the yield numbers for dies suitable for some SKU are not as bad as you'd expect. GPUs are huge but contain a lot of identical parallel units. Disable a few and there you go. If indeed you need to resort to disabling half a chip, then producing that thing in the first place is pretty suspect. Not that one or another company has not manufactured dies with horrible-horrible yields but these are exceptions rather than a rule.

BoggledBeagle · Jan 27, 2025

5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.

Dr. Dro · Jan 27, 2025

@AleksandarK Brother, that math ain't mathin'. 128 CUDA cores*192 SM = 24576, not 24756

BoggledBeagle said:
5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.

AD102 never shipped in a full configuration even at the enterprise segment, wouldn't be surprised if this happened again tbh

BoggledBeagle · Jan 27, 2025

Dr. Dro said:
AD102 never shipped in a full configuration even at the enterprise segment

I thought it did... Was it really almost impossible to make a fully functional chip?

Vya Domus · Jan 27, 2025

3valatzy said:
Selling the defective dies for anything above those values is profits bin.

No, you still don't understand. In order to sell those defective dies it must make sense to waste that much of a wafer vs a wafer with much smaller chips. The yields don't scale linearly, you waste way more space with bigger chips because you can have defects which make the entire die unusable and you can't just simply disable SMs and then use it in a lower end SKU, so instead of losing 350mm^2, you lose 750mm^2, or whatever.

N/A · Jan 27, 2025

Daven said:
Here are the last few generation chip sizes. Looks like normal gen to gen variation to me. I don’t see anything out of the ordinary.

Next in line the 6090 with a 600mm2 die with 24576 enabled out of 30720 and 384 bit bus because it's impossible to fit 512. That's the evolution. Which probably means 12288 for the 6080.

igormp · Jan 27, 2025

3valatzy said:
How so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.

The main idea of a line is that you won't even be getting defect rates that high, meaning that your GB202 chip will be at worst something like 70~80% defective. If you often get chips worse than that, then there's no reason to even fab that chip to begin with.

BoggledBeagle said:
5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.

That's considering only the consumer RTX parts, those same chips are also used for their (née) Tesla/Quadro cards.

BoggledBeagle said:
I thought it did... Was it really almost impossible to make a fully functional chip?

The high-end AD102 only had 2SMs disabled IIRC. I guess that's a good safety margin on such a big chip, or maybe they indeed couldn't get it 100% functioning often enough to give it a proper product.

Dr. Dro · Jan 27, 2025

BoggledBeagle said:
I thought it did... Was it really almost impossible to make a fully functional chip?

I believe so. The RTX 6000 Ada Generation has 142 out of the 144 SMs enabled, with the RTX 4090 coming in at 128 out of 144. A full L2 cache slice is also disabled on the 4090, reducing L2 from 96 to 72 MB.

Steevo · Jan 27, 2025

Dr. Dro said:
AD102 never shipped in a full configuration even at the enterprise segment, wouldn't be surprised if this happened again tbh

Truth.

The number of defects in the silicon, during lithography and production on a chip this complex rules out a full chip being feasible. I’m sure they get some that have all their parts working, I would guess the my keep that for internal use.

All it take is a few atoms of carbon or aluminum at these node sizes.

Visible Noise · Jan 27, 2025

3valatzy said:
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?

Technically you could call the 5090 a salvage part, as it is not fully enabled for yield reasons.

AnotherReader · Jan 27, 2025

igormp said:
The main idea of a line is that you won't even be getting defect rates that high, meaning that your GB202 chip will be at worst something like 70~80% defective. If you often get chips worse than that, then there's no reason to even fab that chip to begin with.

That's considering only the consumer RTX parts, those same chips are also used for their (née) Tesla/Quadro cards.

The high-end AD102 only had 2SMs disabled IIRC. I guess that's a good safety margin on such a big chip, or maybe they indeed couldn't get it 100% functioning often enough to give it a proper product.

We know that TSMC's N5 had a defect rate of 0.1 per square centimeter in the summer of 2020. Plugging in the numbers for the 5090 suggests a yield of 49% for fully functional dies. After harvesting defective dies and fusing off damaged portions, the yields must be fairly high.

System Name	schweinestalle1 and schweinestalle 2
Processor	AMD Ryzen 7 5700X3D / AMD Ryzen 3200G
Motherboard	Asus Prime - Pro X570 + Asus PCI -E AC68 Adapter / Asus Prime B450 M-K
Cooling	iD cooling frozen 620 / AMD Wraith
Memory	Kingston HyperX 2 x 16 gb DDR 4 3200mhz / Kingston HyperX 2x 8Gb DDR 3200mhz
Video Card(s)	AMD Radeon RX 7800 XT 16GB Pulse / AMD Reference Vega 64 8GB
Storage	Crucial 1TB M.2 SSD and WD Blue 500gb Nand SSD / WD Blue 240gb M.2 SSD
Display(s)	Asus XG 32 V ROG and LG ultra gear 32gs75q / TCL TV
Case	Corsair AIR ATX / Corsair Air Mini ATX
Audio Device(s)	Realtech standard / Realtech standard
Power Supply	Corsair 850 Modular / Corsair 750 Modular
Mouse	CM Havoc / Microsoft Wireless
Keyboard	Corsair Cherry Mechanical / Razor piece of shit
Software	Win 10 / win 10
Benchmark Scores	Soon ! whateva

System Name	schweinestalle1 and schweinestalle 2
Processor	AMD Ryzen 7 5700X3D / AMD Ryzen 3200G
Motherboard	Asus Prime - Pro X570 + Asus PCI -E AC68 Adapter / Asus Prime B450 M-K
Cooling	iD cooling frozen 620 / AMD Wraith
Memory	Kingston HyperX 2 x 16 gb DDR 4 3200mhz / Kingston HyperX 2x 8Gb DDR 3200mhz
Video Card(s)	AMD Radeon RX 7800 XT 16GB Pulse / AMD Reference Vega 64 8GB
Storage	Crucial 1TB M.2 SSD and WD Blue 500gb Nand SSD / WD Blue 240gb M.2 SSD
Display(s)	Asus XG 32 V ROG and LG ultra gear 32gs75q / TCL TV
Case	Corsair AIR ATX / Corsair Air Mini ATX
Audio Device(s)	Realtech standard / Realtech standard
Power Supply	Corsair 850 Modular / Corsair 750 Modular
Mouse	CM Havoc / Microsoft Wireless
Keyboard	Corsair Cherry Mechanical / Razor piece of shit
Software	Win 10 / win 10
Benchmark Scores	Soon ! whateva

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Steve
Processor	Ryzen 7 5800X3D
Motherboard	Gigabyte B550 Aorus Elite V2
Cooling	Arctic P14 PWM PST (3 pull and 3 push) 2 pushing through an Arctic Freezer II 280 AIO
Memory	TeamGroup Dark Pro 8 Pack 2 x16Gb dual rank B-die 3733MHz CL16
Video Card(s)	MSI RTX 4080 Suprim X
Storage	WD SN850X 1Tb + WD SN770 2Tb
Display(s)	MSI MPG321URX
Case	Phanteks P500A
Audio Device(s)	Realtek ALC1200/1220
Power Supply	750W Corsair RM750
VR HMD	PSVR2

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	KUROUTOSHIKOU RTX 5080 GALAKURO
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock OC GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

Processor	E5-4627 v4
Motherboard	VEINEDA X99
Memory	32 GB
Video Card(s)	2080 Ti
Storage	NE-512
Display(s)	G27Q
Case	MATREXX 50
Power Supply	SF850L

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

System Name	XPS, Lenovo and HP Laptops, HP Xeon Mobile Workstation, HP Servers, Dell Desktops
Processor	Everything from Turion to 13900kf
Motherboard	MSI - they own the OEM market
Cooling	Air on laptops, lots of air on servers, AIO on desktops
Memory	I think one of the laptops is 2GB, to 64GB on gamer, to 128GB on ZFS Filer
Video Card(s)	A pile up to my knee, with a RTX 4090 teetering on top
Storage	Rust in the closet, solid state everywhere else
Display(s)	Laptop crap, LG UltraGear of various vintages
Case	OEM and a 42U rack
Audio Device(s)	Headphones
Power Supply	Whole home UPS w/Generac Standby Generator
Software	ZFS, UniFi Network Application, Entra, AWS IoT Core, Splunk
Benchmark Scores	1.21 GigaBungholioMarks

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

NVIDIA GB202 "Blackwell" Die Exposed, Shows the Massive 24,576 CUDA Core Configuration

News Editor