Thursday, July 6th 2023

Two-ExaFLOP El Capitan Supercomputer Starts Installation Process with AMD Instinct MI300A

When Lawrence Livermore National Laboratory (LLNL) announced the creation of a two-ExaFLOP supercomputer named El Capitan, we heard that AMD would power it with its Instinct MI300 accelerator. Today, LNLL published a Tweet that states, "We've begun receiving & installing components for El Capitan, @NNSANews' first #exascale #supercomputer. While we're still a ways from deploying it for national security purposes in 2024, it's exciting to see years of work becoming reality." As published images show, HPE racks filled with AMD Instinct MI300 are showing up now at LNLL's facility, and the supercomputer is expected to go operational in 2024. This could mean that November 2023 TOP500 list update wouldn't feature El Capitan, as system enablement would be very hard to achieve in four months until then.

The El Capitan supercomputer is expected to run on AMD Instinct MI300A accelerator, which features 24 Zen4 cores, CDNA3 architecture, and 128 GB of HBM3 memory. All paired together in a four-accelerator configuration goes inside each node from HPE, also getting water cooling treatment. While we don't have many further details on the memory and storage of El Capitan, we know that the system will exceed two ExFLOPS at peak and will consume close to 40 MW of power.
Source: LNLL (Twitter)
Add your own comment

29 Comments on Two-ExaFLOP El Capitan Supercomputer Starts Installation Process with AMD Instinct MI300A

#1
TumbleGeorge
WoW! This employees are in very dirty clothes and shoes?
Posted on Reply
#2
john_
And I guess they might replace some 300As with 300Xs if the GPU compute is more important based on how much AI has skyrocketed lately.
Posted on Reply
#3
Daven
john_And I guess they might replace some 300As with 300Xs if the GPU compute is more important based on how much AI has skyrocketed lately.
I wonder how that would work since the 300Xs have no CPUs in them. Would the motherboard need to be replaced with one that also has Epyc sockets?
Posted on Reply
#4
Patriot
TumbleGeorgeWoW! This employees are in very dirty clothes and shoes?
What? How bad are your...
TumbleGeorgeI didn't recognise formula on the image. My science is very bad. Eyes too.
Oh, yeah that makes more sense.
Clean clothes, worn knees from... gasp, kneeling to get to the bottom servers.
What an odd take...

This is is aiming for the same >2 exaflops Aurora is aiming at but at 40MW instead of 70MW.
Curious to see how far off both systems will be, the slingshot networking doesn't seem to scale as well as expected (frontier hit a bit lower than expected), but its also ground breaking and factors of scale not previously encountered are sure to be popping up.

They won't and can't just pop in mi300x as per reasons stated, these are purpose built for El Capitan and the cpu "on die" is supposed to help with scaling. the 128gb vs 192gb doesn't matter when you scale to this node count... keeping scaling as linear as possible does.

The mi250x is showing 70-80% a100 performance in ai, and absolutely obliterates it in FP64/ traditional HPC work, the claimed 8x ai improvements the mi300a is bringing should make it very competitive against the H100.
AMD's datacenter show was clearly too technical for investors to grasp, the 55B parameter model on 1 gpu was absolutely insane.
Posted on Reply
#5
Wirko
In a liquid-cooled installation like this, does air cooling (by means of fans and fins) exist at all? Probably everything has to be cooled by liquid, including the chips in network switches, the SSDs in storage nodes and the power supplies.
Posted on Reply
#6
Daven
As much as Nvidia is dominating, you don’t hear too much about exascale deployments of Nvidia accelerators. The all Intel Aurora and the all AMD frontier and El Capitan are the only ones so far.
Posted on Reply
#7
david salsero
AMD Instinct MI300A, which has 24 Zen4 cores, CDNA3 architecture and 128 GB of HBM3 memory. The captain will continue to command this sector with this gigantic change.
We all know that this Instinct MI300A is superior to Nvidia.
We will have to see what functions the great CAPTAIN will do
Posted on Reply
#8
Wirko
AleksandarKThe El Capitan supercomputer is expected to run on AMD Instinct MI300A accelerator
The illustration actually shows an MI300X, the one with four GPU chiplets and no CPU part. The MI300A is this:

What's also interesting is that the frame looks like a ... socket! Strange but apparently AMD is planning to also release socketed variants of the chip, or else they wouldn't have made this illustration.
Posted on Reply
#9
Daven
WirkoThe illustration actually shows an MI300X, the one with four GPU chiplets and no CPU part. The MI300A is this:

What's also interesting is that the frame looks like a ... socket! Strange but apparently AMD is planning to also release socketed variants of the chip, or else they wouldn't have made this illustration.
I could be completely wrong but I believe the metal parts are part of the top stiffner of an OCP Accelerator Module (OAM).

www.opencompute.org/documents/ocp-accelerator-module-design-specification-v1p5-final-20220223-docx-1-pdf

See page 10.

Edit: Nevermind, it looks more like an SP6 socket.
Posted on Reply
#10
chaoshusky
Its AMD, couldn't care less. Hopefully they don't go nuclear and melt their own copper interconnects, silicon and crack the die! ;)
Posted on Reply
#11
kapone32
chaoshuskyIts AMD, couldn't care less. Hopefully they don't go nuclear and melt their own copper interconnects, silicon and crack the die! ;)
Here comes Hot Wheels vs Matchbox! What an obtuse statement.
Posted on Reply
#12
AleksandarK
News Editor
WirkoThe illustration actually shows an MI300X, the one with four GPU chiplets and no CPU part. The MI300A is this:

What's also interesting is that the frame looks like a ... socket! Strange but apparently AMD is planning to also release socketed variants of the chip, or else they wouldn't have made this illustration.
Posted on Reply
#14
kondamin
What are national security workloads that need a super computer?
Posted on Reply
#15
kapone32
kondaminWhat are national security workloads that need a super computer?
There are so many uses. As an example to discover threats to fuel delivery systems.
Posted on Reply
#16
TumbleGeorge
kapone32There are so many uses. As an example to discover threats to fuel delivery systems.
No needed supercomputer for this.
Posted on Reply
#17
TheDeeGee
TumbleGeorgeWoW! This employees are in very dirty clothes and shoes?
All AMD money going to FSR exlusives, no more money for new clothing.
Posted on Reply
#18
AleksandarK
News Editor
WirkoSorry, Twitter has left us non-birds staring at a closed door. Can you add some description? Thanks.
The text from Tweet:
MI300A #APU @AMDInstinct the bedrock of #ElCap getting delivered. A long-time co-design effort with @ENERGY @HPE_Cray becoming a reality. Our #ROCm stack will continue to enable the #MI300 series GPUs seamlessly. Stay tuned for exciting stack updates this fall. @Livermore_Comp
Posted on Reply
#19
Avro Arrow
Well, that's going to be quite the beast. I only hope that they're using a clean energy source for it like hydro or nuclear.
Posted on Reply
#20
Tek-Check
kondaminWhat are national security workloads that need a super computer?
Wikipedia: "Its principal responsibility is ensuring the safety, security and reliability of the nation's nuclear weapons through the application of advanced science, engineering, and technology. The laboratory also applies its special expertise and multidisciplinary capabilities towards preventing the proliferation and use of weapons of mass destruction, bolstering homeland security, and solving other nationally important problems, including energy and environmental needs, scientific research and outreach, and economic competitiveness. "

So, lots of simulations for nuclear and other weapons, their impact and development, but also environmental disasters, etc. Such simulations and calculations need a lot of horse power, both CPU and GPU... MI300 is a perfect tool for this job.
john_And I guess they might replace some 300As with 300Xs if the GPU compute is more important based on how much AI has skyrocketed lately.
That depends. We do not know exactly the structure of the system. It might be APUs only, as they do not do LLMs but complex simulations with hundreds of variables, so they need both CPU and GPU power.
WirkoIn a liquid-cooled installation like this, does air cooling (by means of fans and fins) exist at all? Probably everything has to be cooled by liquid, including the chips in network switches, the SSDs in storage nodes and the power supplies.
No air-cooling. Too loud, too dusty.
DavenAs much as Nvidia is dominating, you don’t hear too much about exascale deployments of Nvidia accelerators. The all Intel Aurora and the all AMD frontier and El Capitan are the only ones so far.
Nvidia has a few too.
www.eenewseurope.com/en/nvidia-launches-first-commercial-exascale-supercomputer/
Posted on Reply
#21
dragontamer5788
kondaminWhat are national security workloads that need a super computer?
Designing a COVID19 vaccine in one week.

Weather modeling.

Nuclear research (shhhhhhhh, that one's on the hush-hush except everyone knows that Department of Energy is the USA's nuke experts. And given that a lot of these supercomputers are top-secret, we can only assume what's going on...)

Like, what do a bunch of nuclear scientists want with a top-secret supercomputer that they aren't allowed to tell us the details of? Hmmm, I wonder.... fortunately, these strategic supercomputers have plenty of downtime from their main mission so that the rest of the scientific community can run on them on their spare cycles. I've heard of obscure mathematical theories being tested on these supercomputers, Ph.D thesis being written on data discovered in these, etc. etc. So its still to the benefit of the general USA's scientific community (at least when its not doing whatever nuclear research is going on...)
Posted on Reply
#22
Tek-Check
chaoshuskyIts AMD, couldn't care less. Hopefully they don't go nuclear and melt their own copper interconnects, silicon and crack the die! ;)
Humour is always welcomed, stupid comments are spam.
Avro ArrowWell, that's going to be quite the beast. I only hope that they're using a clean energy source for it like hydro or nuclear.
Nuclear is clean as soon as you are not one of countries that needs to store nuclear waste for centuries...
Posted on Reply
#23
Patriot
Tek-CheckNvidia has a few too.
www.eenewseurope.com/en/nvidia-launches-first-commercial-exascale-supercomputer/
Nvidia likes comparing apples to oranges, DLSS3 with native.
In the same manner it is impossible for 256 or 1024 Grace superchips to be anywhere near an Exaflop. as they are 67Tflops a pop FP32 which is how supercomputers are measured.
They could at most hit... 67 Petaflops with that announced and undeployed Euro cluster.

If we apply Nvidia metrics to AMD's MI300A El Capitan it should measure >64 exaflops but Nvidia isn't listing their metric, if that is fp16, bfloat16, fp8 or int8 or int4 even since they say exascale rather than exaflop...

My numbers are based on AMD's mi300a 228cu vs mi250 220cu scaling fp32 performance and 8x ai improvement claim, but like Nvidia's claim, we don't know what precision that is in.
Posted on Reply
#24
trparky
Tek-CheckNuclear is clean as soon as you are not one of countries that needs to store nuclear waste for centuries...
There's ways of reusing the spent fuel but nobody wants to actually do it.
Posted on Reply
#25
TumbleGeorge
Off topic: Could someone explain to the others what percentage of the uranium contained in the fuel is actually "burned". The truth is that fuel is used quite inefficiently than in modern nuclear weapon...
Posted on Reply
Add your own comment
May 8th, 2024 11:10 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts