AMD Launches Instinct MI325X Accelerator for AI Workloads: 256 GB HBM3E Memory and 2.6 PetaFLOPS FP8 Compute

AleksandarK · Oct 10, 2024

During its "Advancing AI" conference today, AMD has updated its AI accelerator portfolio with the Instinct MI325X accelerator, designed to succeed its MI300X predecessor. Built on the CDNA 3 architecture, Instinct MI325X brings a suite of improvements over the old SKU. Now, the MI325X features 256 GB of HBM3E memory running at 6 TB/s bandwidth. The capacity memory alone is a 1.8x improvement over the old MI300 SKU, which features 192 GB of regular HBM3 memory. Providing more memory capacity is crucial as upcoming AI workloads are training models with parameter counts measured in trillions, as opposed to billions with current models we have today. When it comes to compute resources, the Instinct MI325X provides 1.3 PetaFLOPS at FP16 and 2.6 PetaFLOPS at FP8 training and inference. This represents a 1.3x improvement over the Instinct MI300.

A chip alone is worthless without a good platform, and AMD decided to make the Instinct MI325X OAM modules a drop-in replacement for the current platform designed for MI300X, as they are both pin-compatible. In systems packing eight MI325X accelerators, there are 2 TB of HBM3E memory running at 48 TB/s memory bandwidth. Such a system achieves 10.4 PetaFLOPS of FP16 and 20.8 PetaFLOPS of FP8 compute performance. The company uses NVIDIA's H200 HGX as reference claims for its performance competitiveness, where the company claims that the Instinct MI325X outperforms NVIDIA H200 HGX system by 1.3x across the board in memory bandwidth, FP16 / FP8 compute performance and 1.8x in memory capacity.

At the core of the accelerator is the ROCm software stack. We recently covered AMD's plan of ROCm coming to every GPU, even consumer models. The company reiterated that point. Another important disclaimer made was working with the open-source community to integrate all the latest features into its ROCm stack, especially from frameworks like PyTorch, Triton, ONNX, etc. AMD also pointed out that the company is preparing Instinct MI350X family for second half of 2025. By then, we should be getting a CDNA 4 Instinct MI355X accelerator built on TSMC 3 nm node, running with 288 GB of HBM3E. The new CDNA 4 architecture brings support for lower-lever data types such as FP4 and FP6. The upcoming chip will yield a massive 2.3 PetaFLOPS of FP16, and 4.6 PetaFLOPS of FP8 compute capability. The new FP4 and FP6 formats will allow a single CDNA 4 Instinct MI355X to reach 9.2 PetaFLOPS of compute capability.

View at TechPowerUp Main Site | Source

natr0n · Oct 10, 2024

Whos going to try to play Crysis on this with some passthrough gimmick ?

Also,Lisa faces looks like my Mom- grey hair

Steevo · Oct 10, 2024

natr0n said:
Whos going to try to play Crysis on this with some passthrough gimmick ?

Also,Lisa faces looks like my Mom- grey hair

AMD has aged her, taking a company from the verge of bankruptcy to thriving. They are a few steps away from greatness. She has some hard decisions in the next 18-24 months that will make them a serious competitive company or another option. I don't envy her position now more than ever, they are down to the nut cutting. I hope she makes the right decisions to ensure AMDs continued success.

AleksandarK · Oct 10, 2024

Ive added a section about future MI355X, that is a true beast

darkangel0504 · Oct 10, 2024

MI 355X

Chomiq · Oct 10, 2024

This is why you won't get a high end AMD gpu this next generation.

Wirko · Oct 10, 2024

AleksandarK said:
In systems packing eight MI325X accelerators, there are 20 TB of HBM3E memory

A small (although the missing 18 TB are costly) mistake here

Steevo said:
AMD has aged her, taking a company from the verge of bankruptcy to thriving. They are a few steps away from greatness. She has some hard decisions in the next 18-24 months that will make them a serious competitive company or another option. I don't envy her position now more than ever, they are down to the nut cutting. I hope she makes the right decisions to ensure AMDs continued success.

I'm not saying she doesn't care but she doesn't have to. The technology of golden parachutes has improved greatly.

Minus Infinity · Oct 11, 2024

No fp4 like Blackwell. What about fp2 and fp 1 to pump those Petaflops.

TumbleGeorge · Oct 11, 2024

Minus Infinity said:
No fp4 like Blackwell. What about fp2 and fp 1 to pump those Petaflops.

AI doesn't get dumber over time, it just runs on less and less precise hardware.

Neo_Morpheus · Oct 11, 2024

Chomiq said:
This is why you won't get a high end AMD gpu this next generation.

Its called tough love and i’m glad they are doing it.

The fanbois are already lined up to pay $2500 for a 5090 and even if AMD released a gpu that was faster AND cheaper than the 5090, they will still buy the 5090.

This one at least is honest and upfront about it:

.

Prima.Vera · Oct 11, 2024

There is never going to be a video card such as AMD HD 5870, one of their best ever.
Also most people do not buy AMD since their FSR implementation is garbage when compared to DLSS, quality wise.

Chomiq · Oct 11, 2024

Neo_Morpheus said:
Its called tough love and i’m glad they are doing it.

The fanbois are already lined up to pay $2500 for a 5090 and even if AMD released a gpu that was faster AND cheaper than the 5090, they will still buy the 5090.

This one at least is honest and upfront about it:

.View attachment 367046

Nothing to do with fanboys, it's all about wafer space allocation.

Neo_Morpheus · Oct 11, 2024

Chomiq said:
Nothing to do with fanboys, it's all about wafer space allocation.

You are correct, they allocated the wafers to the more profitable corporate offerings because they knew that no matter what they released, nobody was going to buy.

So in other words, my original post stands.

tpuuser256 · Oct 11, 2024

The theme is now blue, good call

System Name	natr0n-PC
Processor	Ryzen 5950x-5600x \| 9600k
Motherboard	B450 AORUS M \| Z390 UD
Cooling	EK AIO 360 - 6 fan action \| AIO
Memory	Patriot - Viper Steel DDR4 (B-Die)(4x8GB) \| Samsung DDR4 (4x8GB)
Video Card(s)	EVGA 3070ti FTW
Storage	Various
Display(s)	Pixio PX279 Prime
Case	Thermaltake Level 20 VT \| Black bench
Audio Device(s)	LOXJIE D10 + Kinter Amp + 6 Bookshelf Speakers Sony+JVC+Sony
Power Supply	Super Flower Leadex III ARGB 80+ Gold 650W \| EVGA 700 Gold
Software	XP/7/8.1/10
Benchmark Scores	http://valid.x86.fr/79kuh6

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

System Name	Windows 10 Pro 64 bit
Processor	Ryzen 5 5600 @4.65 GHz
Motherboard	Asus ROG X570-E
Cooling	Thermalright
Memory	32 GB 3200 MHz
Video Card(s)	Asus RX 6700XT 12 GB Dual
Storage	1TB Samsung 970 EVO Plus
Display(s)	SS QHD 144Hz + LG 55 Inch 4K
Case	Corsair 4000D
Power Supply	Superflower 850

Processor	Ryzen 7 5800X3D
Motherboard	Gigabyte X570 Aorus Elite
Cooling	Thermalright Phantom Spirit 120 SE
Memory	2x16 GB Crucial Ballistix 3600 CL16 Rev E @ 3600 CL14
Video Card(s)	RTX3080 Ti FE
Storage	SX8200 Pro 1 TB, Plextor M6Pro 256 GB, WD Blue 2TB
Display(s)	LG 34GN850P-B
Case	SilverStone Primera PM01 RGB
Audio Device(s)	SoundBlaster G6 \| Fidelio X2 \| Sennheiser 6XX
Power Supply	SeaSonic Focus Plus Gold 750W
Mouse	Endgame Gear XM1R
Keyboard	Wooting Two HE

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

AMD Launches Instinct MI325X Accelerator for AI Workloads: 256 GB HBM3E Memory and 2.6 PetaFLOPS FP8 Compute

AleksandarK

News Editor

natr0n

Steevo

AleksandarK

News Editor

darkangel0504

Chomiq

Wirko

Minus Infinity

TumbleGeorge

Neo_Morpheus

Prima.Vera

Chomiq

Neo_Morpheus

tpuuser256

System Name	GameStation
Processor	AMD R5 5600X
Motherboard	Gigabyte B550
Cooling	Artic Freezer II 120
Memory	16 GB
Video Card(s)	Sapphire Pulse 7900 XTX
Storage	2 TB SSD
Case	Cooler Master Elite 120

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	KUROUTOSHIKOU RTX 5080 GALAKURO
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals