AMD Releases its CDNA2 MI250X "Aldebaran" HPC GPU Block Diagram

btarunr · Aug 24, 2022

AMD in its HotChips 22 presentation released a block-diagram of its biggest AI-HPC processor, the Instinct MI250X. Based on the CDNA2 compute architecture, at the heart of the MI250X is the "Aldebaran" MCM (multi-chip module). This MCM contains two logic dies (GPU dies), and eight HBM2E stacks, four per GPU die. The two GPU dies are connected by a 400 GB/s Infinity Fabric link. They each have up to 500 GB/s of external Infinity Fabric bandwidth for inter-socket communications; and PCI-Express 4.0 x16 as the host system bus for AIC form-factors. The two GPU dies together make up 58 billion transistors, and are fabricated on the TSMC N6 (6 nm) node.

The component hierarchy of each GPU die sees eight Shader Engines share a last-level L2 cache. The eight Shader Engines total 112 Compute Units, or 14 CU per engine. The CDNA2 compute unit contains 64 stream processors making up the Shader Core, and four Matrix Core Units. These are specialized hardware for matrix/tensor math operations. There are hence 7,168 stream processors per GPU die, and 14,336 per package. AMD claims a 100% increase in double-precision compute performance over CDNA (MI100). AMD attributes this to increases in frequencies, efficient data paths, extensive operand reuse and forwarding; and power-optimization enabling those higher clocks. The MI200 is already powering the Frontier supercomputer, and is working for more design wins in the HPC space. The company also dropped a major hint that the MI300, based on CDNA3, will be an APU. It will incorporate GPU dies, core-logic, and CPU CCDs onto a single package, in what is a rival solution to NVIDIA Grace Hopper Superchip.

View at TechPowerUp Main Site | Source

P4-630 · Aug 24, 2022

AMD: We are better
Nvidia: No we are!!
:nutkick:

thegnome · Aug 24, 2022

In package Infinity Fabric slower than external? I expected the in package fabric to be faster than the HBM, but then again, it's compute.

Punkenjoy · Aug 24, 2022

thegnome said:
In package Infinity Fabric slower than external? I expected the in package fabric to be faster than the HBM, but then again, it's compute.

That 400 GB could be lower power/latency. But yes that is strange. It's probably also why it's still recognized as 2 independent chip and not single chip (among other things like scheduler etc...)

Also, some feature could be enabled on the 400 GB that would require additional bandwidth for control. Still, they will have to improve that in the future because Apple and IBM have way better die to die interface than AMD right now.

The double (or half the HBM bandwidth per die) would have made more sense. From initial benchmark laying around the internet, they are super fast when your code can run independently on each tiles, but perf start to collapse if you need die to die access.

Operandi · Aug 24, 2022

Pretty cool to see Nvidia and AMD going at (and AMD actually getting some wins) it in this space, going for sort of the same overall design but from each others opposite areas of expertise.

Tropick · Aug 24, 2022

Looks promising but those odd IF bandwidth numbers might point to some continuing inter-die latency issues, with higher external fabric speeds to compensate. Either way very nice to see team red get serious about HPC.

Chrispy_ · Aug 24, 2022

Bodes well for RDNA3 which is also MCP and TSMC 6nm

thegnome said:
In package Infinity Fabric slower than external? I expected the in package fabric to be faster than the HBM, but then again, it's compute.

HBM2 is exceptionally wide but quite slow, so whilst the bandwidth HBM2 offers is very good, that bandwidth comes mostly from the bus width, meaning that latencies will likely be order(s) of magnitude higher than Infinity Fabric.

delshay · Aug 24, 2022

I want to see HBM3 products.

AnarchoPrimitiv · Aug 24, 2022

Chrispy_ said:
Bodes well for RDNA3 which is also MCP and TSMC 6nm

HBM2 is exceptionally wide but quite slow, so whilst the bandwidth HBM2 offers is very good, that bandwidth comes mostly from the bus width, meaning that latencies will likely be order(s) of magnitude higher than Infinity Fabric.

RDNA3 GCDs (graphics core dies) are 5nm while the cache dies and the IOD are 6nm, at least on Navi 31 and 32 (which each have their own unique GCD, in other words, Navi 31 is NOT just two Navi 32 GCDs like many initially believed), Navi 33 is monolithic and is on 6nm...at least according to the most recent, agreed upon leaks.

The tile structure should allow RDNA3 to be relatively much cheaper to manufacture than Nvidia's monolithic Lovelace. In the latest leaks, the RDNA3 GCDs for Navi 31. and 32 are really small, less than 250mm^2 if I remember correctly

Minus Infinity · Aug 25, 2022

AnarchoPrimitiv said:
RDNA3 GCDs (graphics core dies) are 5nm while the cache dies and the IOD are 6nm, at least on Navi 31 and 32 (which each have their own unique GCD, in other words, Navi 31 is NOT just two Navi 32 GCDs like many initially believed), Navi 33 is monolithic and is on 6nm...at least according to the most recent, agreed upon leaks.

The tile structure should allow RDNA3 to be relatively much cheaper to manufacture than Nvidia's monolithic Lovelace. In the latest leaks, the RDNA3 GCDs for Navi 31. and 32 are really small, less than 250mm^2 if I remember correctly

Already RDNA3 for desktop is much cheaper for AMD to produce than Lovelace is for Nvidia. AMD won't be under any pressure on price, Nvidia will need to slash margins to compete on price.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	AlderLake
Processor	Intel i7 12700K P-Cores @ 5Ghz
Motherboard	Gigabyte Z690 Aorus Master
Cooling	Noctua NH-U12A 2 fans + Thermal Grizzly Kryonaut Extreme + 5 case fans
Memory	32GB DDR5 Corsair Dominator Platinum RGB 6000MT/s CL36
Video Card(s)	MSI RTX 2070 Super Gaming X Trio
Storage	Samsung 980 Pro 1TB + 970 Evo 500GB + 850 Pro 512GB + 860 Evo 1TB x2
Display(s)	23.8" Dell S2417DG 165Hz G-Sync 1440p
Case	Be quiet! Silent Base 600 - Window
Audio Device(s)	Panasonic SA-PMX94 / Realtek onboard + B&O speaker system / Harman Kardon Go + Play / Logitech G533
Power Supply	Seasonic Focus Plus Gold 750W
Mouse	Logitech MX Anywhere 2 Laser wireless
Keyboard	RAPOO E9270P Black 5GHz wireless
Software	Windows 11
Benchmark Scores	Cinebench R23 (Single Core) 1936 @ stock Cinebench R23 (Multi Core) 23006 @ stock

System Name	Incomplete thing 1.0
Processor	Ryzen 2600
Motherboard	B450 Aorus Elite
Cooling	Gelid Phantom Black
Memory	HyperX Fury RGB 3200 CL16 16GB
Video Card(s)	Gigabyte 2060 Gaming OC PRO
Storage	Dual 1TB 970evo
Display(s)	AOC G2U 1440p 144hz, HP e232
Case	CM mb511 RGB
Audio Device(s)	Reloop ADM-4
Power Supply	Sharkoon WPM-600
Mouse	G502 Hero
Keyboard	Sharkoon SGK3 Blue
Software	W10 Pro
Benchmark Scores	2-5% over stock scores

System Name	Trackstar
Processor	AMD Ryzen 7 5800X3D -20 All Core CO (on Corsair XC5 block)
Motherboard	Gigabyte B550 AORUS Elite V2 Rev 1.0 (F17 BIOS)
Cooling	Corsair XD5 pump / Corsair XR5 1x 360mm (front) + 1x 420mm (top) rads
Memory	32GB G.Skill DDR4-3600 CL14 1:1 (F4-3600C14Q-32GVKA kit)
Video Card(s)	ASRock RX 6950XT OC Formula (on Bykski A-AR6900XTOCF-X block)
Storage	WD_BLACK SN850X 2TB w/HS (FW ver. 620361WD)
Display(s)	Dell S3222DGM 32" 1440p/165Hz FreeSync cap @ 160Hz
Case	Fractal Design Meshify S2
Audio Device(s)	Realtek ALC1200 Integrated Audio
Power Supply	Super Flower Leadex Platinum SE 1200W on Liebert GXT4-1500RT120 UPS
Mouse	Corsair Nightsword RGB
Keyboard	Corsair K60 RGB PRO
VR HMD	N/A
Software	Windows 11 Pro 23H2 (Build 22631.3958)
Benchmark Scores	https://www.3dmark.com/spy/53932022

System Name	Bragging Rights
Processor	Atom Z3735F 1.33GHz
Motherboard	It has no markings but it's green
Cooling	No, it's a 2.2W processor
Memory	2GB DDR3L-1333
Video Card(s)	Gen7 Intel HD (4EU @ 311MHz)
Storage	32GB eMMC and 128GB Sandisk Extreme U3
Display(s)	10" IPS 1280x800 60Hz
Case	Veddha T2
Audio Device(s)	Apparently, yes
Power Supply	Samsung 18W 5V fast-charger
Mouse	MX Anywhere 2
Keyboard	Logitech MX Keys (not Cherry MX at all)
VR HMD	Samsung Oddyssey, not that I'd plug it into this though....
Software	W10 21H1, barely
Benchmark Scores	I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.

System Name	Lightbringer
Processor	Ryzen 7 2700X
Motherboard	Asus ROG Strix X470-F Gaming
Cooling	Enermax Liqmax Iii 360mm AIO
Memory	G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s)	Sapphire RX 5700XT Nitro+
Storage	Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s)	LG 34BK95U-W 34" 5120 x 2160
Case	Lian Li PC-O11 Dynamic (White)
Power Supply	BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse	Glorious Model O (Matte White)
Keyboard	Royal Kludge RK71
Software	Windows 10

AMD Releases its CDNA2 MI250X "Aldebaran" HPC GPU Block Diagram

Editor & Senior Moderator