• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD CDNA2 "Aldebaran" MI200 HPC Accelerator with 256 CU (16,384 cores) Imagined

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,362 (7.68/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
AMD Instinct MI200 will be an important product for the company in the HPC and AI supercomputing market. It debuts the CDNA2 compute architecture, and is based on a multi-chip module (MCM) codenamed "Aldebaran." PC enthusiast Locuza, who conjures highly detailed architecture based on public information, imagined what "Aldebaran" could look like. The MCM contains two logic dies, and eight HBM2E stacks. Each of the two dies has a 4096-bit HBM2E interface, which talks to 64 GB of memory (128 GB per package). A silicon interposer provides microscopic wiring among the ten dies.

Each of the two logic dies, or chiplets, has sixteen shader engines that have 16 compute units (CU), each. The CDNA2 compute unit is capable of full-rate FP64, packed FP32 math, and Matrix Engines V2 (fixed function hardware for matrix multiplication, accelerating DNN building, training, and AI inference). With 128 CUs per chiplet, assuming the CDNA2 CU has 64 stream processors, one arrives at 8,192 SP. Two such dies add up to a whopping 16,384, more than three times that of the "Navi 21" RDNA2 silicon. Each die further features its independent PCIe interface, and XGMI (AMD's rival to CXL), an interconnect designed for high-density HPC scenarios. A rudimentary VCN (Video CoreNext) component is also present. It's important to note here, that the CDNA2 CU, as well as the "Aldebaran" MCM itself, doesn't have a dual-use as a GPU, since it lacks much of the hardware needed for graphics processing. The MI200 is expected to launch later this year.



View at TechPowerUp Main Site
 
Joined
Oct 6, 2020
Messages
15 (0.01/day)
System Name Da Rig
Processor AMD Ryzen 2600x
Motherboard Gigabyte Aorus Gaming Ultra X470
Cooling Cryorig R5
Memory G. Skill Flare X 3200mhz 14CL
Video Card(s) Sapphire Nitro+ Rx 570
Display(s) Acer XF240H 144hz 1080p 24"
Case Fractal Meshify C
Power Supply Evga SuperNOVA 750W G+ 80 Plus Gold
Mouse CoolerMaster
Keyboard CoolerMaster
I wonder how well this could mine, just curious of course.:twitch:
 
Joined
Apr 24, 2020
Messages
2,560 (1.76/day)
Two such dies add up to a whopping 16,384, more than three times that of the "Navi 21" RDNA2 silicon

RDNA and CDNA cannot be easily compared with each other in this manner. CDNA uses the compute units of old (the same system from GCN 1.0 through Vega and now CDNA). That is: 16-wide native x 4 clock ticks x 4 ALUs per CU == 64 physical CUs executing 256 threads every 4 clock ticks.

RDNA had extremely major changes: 32-wide native x 4 ALUs per WGP which executes 128 threads every 1 clock tick. In RDNA terms, they call the WGP a "dual-compute unit", because 128-threads per RDNA clock tick is kinda-sorta like 2x256-threads every 4 CDNA clock ticks.

--------

RDNA2 also has 1024 x 32-bit registers per ALU. CDNA only has 256 x 32bit registers per hardware thread (but given the 4x clock ticks for 4x different threads: its kinda-sorta like having 1024 registers across 4 different threads). There are similarities between the two because they're both made by AMD, but... the differences are quite striking and will probably lead to major performance differences between the two platforms.

RDNA2 quite possibly is faster in some scenarios, while CDNA is faster in other scenarios. Its really difficult to compare the two on any microarchitectural level. AMD really did make a huge number of changes.
 
Joined
Mar 10, 2010
Messages
11,878 (2.30/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
RDNA and CDNA cannot be easily compared with each other in this manner. CDNA uses the compute units of old (the same system from GCN 1.0 through Vega and now CDNA). That is: 16-wide native x 4 clock ticks x 4 ALUs per CU == 64 physical CUs executing 256 threads every 4 clock ticks.

RDNA had extremely major changes: 32-wide native x 4 ALUs per WGP which executes 128 threads every 1 clock tick. In RDNA terms, they call the WGP a "dual-compute unit", because 128-threads per RDNA clock tick is kinda-sorta like 2x256-threads every 4 CDNA clock ticks.

--------

RDNA2 also has 1024 x 32-bit registers per ALU. CDNA only has 256 x 32bit registers per hardware thread (but given the 4x clock ticks for 4x different threads: its kinda-sorta like having 1024 registers across 4 different threads). There are similarities between the two because they're both made by AMD, but... the differences are quite striking and will probably lead to major performance differences between the two platforms.

RDNA2 quite possibly is faster in some scenarios, while CDNA is faster in other scenarios. Its really difficult to compare the two on any microarchitectural level. AMD really did make a huge number of changes.
I have not seen CDNA architecture analyzed do you have links? also given this is CDNA gen 2 maybe 3? it is possible they have made more changes at least this time, this is a projection.
 
Joined
May 30, 2015
Messages
1,872 (0.58/day)
Location
Seattle, WA
I have not seen CDNA architecture analyzed do you have links? also given this is CDNA gen 2 maybe 3? it is possible they have made more changes at least this time, this is a projection.

This includes some good technical info amongst the marketing.

 
Joined
Mar 10, 2010
Messages
11,878 (2.30/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
This includes some good technical info amongst the marketing.

I just read the AMD whitepaper here.


between the two sources the two state some things are clearly essentially the same as classical GCN but every element has been considered cut or improved and they are vague on some areas, I am only suggesting some minor updates could have been done with them having to retouch everything anyway.

cheers though more info is always nice
 
Joined
Apr 24, 2020
Messages
2,560 (1.76/day)
I have not seen CDNA architecture analyzed do you have links? also given this is CDNA gen 2 maybe 3? it is possible they have made more changes at least this time, this is a projection.


CDNA 1.0 has had its ISA released late last year. Its clear that AMD believes that GCN (16 SIMD-lanes x 4 clock ticks x 4 per compute unit) is a worthwhile architecture (even if RDNA / Graphics require something with lower latency). The ISA doc only has basic information on performance, and the ISA itself is almost identical to GCN documents from the past.

CDNA has new matrix multiplication instructions, and that's about it. Otherwise, its programming model is much more like GCN than RDNA.
 
Joined
Jul 3, 2019
Messages
300 (0.17/day)
Location
Bulgaria
Processor 6700K
Motherboard M8G
Cooling D15S
Memory 16GB 3k15
Video Card(s) 2070S
Storage 850 Pro
Display(s) U2410
Case Core X2
Audio Device(s) ALC1150
Power Supply Seasonic
Mouse Razer
Keyboard Logitech
Software 21H2
What's this beast build on 7nm or 5nm?
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,362 (7.68/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Top