Tuesday, May 7th 2019

AMD Collaborates with US DOE to Deliver the Frontier Supercomputer

The U.S. Department of Energy today announced a contract with Cray Inc. to build the Frontier supercomputer at Oak Ridge National Laboratory, which is anticipated to debut in 2021 as the world's most powerful computer with a performance of greater than 1.5 exaflops.

Scheduled for delivery in 2021, Frontier will accelerate innovation in science and technology and maintain U.S. leadership in high-performance computing and artificial intelligence. The total contract award is valued at more than $600 million for the system and technology development. The system will be based on Cray's new Shasta architecture and Slingshot interconnect and will feature high-performance AMD EPYC CPU and AMD Radeon Instinct GPU technology.
By solving calculations up to 50 times faster than today's top supercomputers-exceeding a quintillion, or 10^18, calculations per second-Frontier will enable researchers to deliver breakthroughs in scientific discovery, energy assurance, economic competitiveness, and national security. As a second-generation AI system-following the world-leading Summit system deployed at ORNL in 2018-Frontier will provide new capabilities for deep learning, machine learning and data analytics for applications ranging from manufacturing to human health.

"Frontier's record-breaking performance will ensure our country's ability to lead the world in science that improves the lives and economic prosperity of all Americans and the entire world," said U.S. Secretary of Energy Rick Perry. "Frontier will accelerate innovation in AI by giving American researchers world-class data and computing resources to ensure the next great inventions are made in the United States."

Since 2005, Oak Ridge National Laboratory has deployed Jaguar, Titan, and Summit, each the world's fastest computer in its time. The combination of traditional processors with graphics processing units to accelerate the performance of leadership-class scientific supercomputers is an approach pioneered by ORNL and its partners and successfully demonstrated through ORNL's No.1 ranked Titan and Summit supercomputers.

"ORNL's vision is to sustain the nation's preeminence in science and technology by developing and deploying leadership computing for research and innovation at an unprecedented scale," said ORNL Director Thomas Zacharia. "Frontier follows the well-established computing path charted by ORNL and its partners that will provide the research community with an exascale system ready for science on day one."

Researchers with DOE's Exascale Computing Project are developing exascale scientific applications today on ORNL's 200-petaflop Summit system and will seamlessly transition their scientific applications to Frontier in 2021. In addition, the lab's Center for Accelerated Application Readiness is now accepting proposals from scientists to prepare their codes to run on Frontier.
Researchers will harness Frontier's powerful architecture to advance science in such applications as systems biology, materials science, energy production, additive manufacturing and health data science. Visit the Frontier website to learn more about what researchers plan to accomplish in these and other scientific fields.

Frontier will offer best-in-class traditional scientific modeling and simulation capabilities while also leading the world in artificial intelligence and data analytics. Closely integrating artificial intelligence with data analytics and modeling and simulation will drastically reduce the time to discovery by automatically recognizing patterns in data and guiding simulations beyond the limits of traditional approaches.

"We are honored to be part of this historic moment as we embark on supporting extreme-scale scientific endeavors to deliver the next U.S. exascale supercomputer to the Department of Energy and ORNL," said Peter Ungaro, president and CEO of Cray. "Frontier will incorporate foundational new technologies from Cray and AMD that will enable the new exascale era-characterized by data-intensive workloads and the convergence of modeling, simulation, analytics, and AI for scientific discovery, engineering and digital transformation."

Frontier will incorporate several novel technologies co-designed specifically to deliver a balanced scientific capability for the user community. The system will be composed of more than 100 Cray Shasta cabinets with high density compute blades powered by HPC and AI- optimized AMD EPYC processors and Radeon Instinct GPU accelerators purpose-built for the needs of exascale computing. The new accelerator-centric compute blades will support a 4:1 GPU to CPU ratio with high speed AMD Infinity Fabric links and coherent memory between them within the node. Each node will have one Cray Slingshot interconnect network port for every GPU with streamlined communication between the GPUs and network to enable optimal performance for high-performance computing and AI workloads at exascale.

To make this performance seamless to consume by developers, Cray and AMD are co-designing and developing enhanced GPU programming tools optimized for performance, productivity and portability. This will include new capabilities in the Cray Programming Environment and AMD's ROCm open compute platform that will be integrated together into the Cray Shasta software stack for Frontier.

"AMD is proud to be working with Cray, Oak Ridge National Laboratory and the Department of Energy to push the boundaries of high performance computing with Frontier," said Lisa Su, AMD president and CEO. "Today's announcement represents the power of collaboration between private industry and public research institutions to deliver groundbreaking innovations that scientists can use to solve some of the world's biggest problems."

Frontier leverages a decade of exascale technology investments by DOE. The contract award includes technology development funding, a center of excellence, several early-delivery systems, the main Frontier system, and multi-year systems support. The Frontier system is expected to be delivered in 2021, and acceptance is anticipated in 2022.

Frontier will be part of the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility. ORNL is managed by UT-Battelle for DOE's Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE's Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit DOE's webiste.
Add your own comment

47 Comments on AMD Collaborates with US DOE to Deliver the Frontier Supercomputer

#1
NdMk2o1o
What an advert for epyc and radeon instinct! massive contract, government contract, no doubt others will follow suit as I think the uptake of epyc hasn't been as swift as amd probably would have liked though if its good enough for the doe I suspect other government agencies, education institutes etc will begin to look at epyc as a viable xeon alternative.
Posted on Reply
#2
Wavetrex
Whatever Lisa is doing, she's doing good.
Grats to AMD for the contract !

It was about dam' time to get them into the driver seat... even if only in some areas of the market.
Posted on Reply
#4
SoNic67
US DOE spreads the "wealth" around :)
AMD, nVidia, Intel , PowerPC... my tax money at work.

31MW of power. Talking about "global warming"? :laugh:

Also AMD is only saying that the GPUs are “based on the Radeon Instinct family” and have “yet to be announced."
Posted on Reply
#5
R-T-B
SoNic67, post: 4043523, member: 152626"
PS: 31MW of power. Talking about "global warming"
It'd be chump change for a meaningful understanding of what's going on up in the atmosphere so the world can stop bickering and make a decision... I mean honestly compared to the worlds heat output in a day, you are aware how little this is, right?

SoNic67, post: 4043523, member: 152626"
US DOE spreads the "wealth" around :)
AMD, nVidia, Intel , PowerPC... my tax money at work.
China's been kicking our ass here, bad. It started with Loongson and spiraled out of control since then... We need advances here or one day we'll be on the tail end of basic important technology like encryption. You DO NOT want that.

Far be it for me to SUPPORT Trump, but I think him boosting funding to projects like this is something he accidentally got right.
Posted on Reply
#6
SoNic67
I was sarcastic.

As a EE, I would love to be in the design team for the support building and utilities - power, HVAC, water... We are used to see something like max 30kW per cabinet :)
And I would rather have the tax money spent here in US than in rebuilding failed countries.
Posted on Reply
#7
Vayra86
This really is a confirmation AMD is back in the CPU game. A huge win for them in mindshare.

Well played.
Posted on Reply
#8
Vya Domus
SoNic67, post: 4043523, member: 152626"
Oh, no, selling a mirage!
You know well enough any dedicated GPU is yet to see the light of day from Intel. There's no marriage here mate.
Posted on Reply
#9
yakk
Nice Halo project win for AMD. Important one too for their server & data center efforts.
Posted on Reply
#10
TheGuruStud
SoNic67, post: 4043523, member: 152626"
US DOE spreads the "wealth" around :)
AMD, nVidia, Intel , PowerPC... my tax money at work.

31MW of power. Talking about "global warming"? :laugh:

Also AMD is only saying that the GPUs are “based on the Radeon Instinct family” and have “yet to be announced."

Will the AMD fans cry now like they cried when the Intel announcement was made on a similar note? Oh, no, selling a mirage! This is collusion, evil Intel AMD at work!
You're a bit special aren't ya? Intel has no product, period. AMD currently has one, another releasing first quarter of next year and then the next one in the pipe. What does intel have? Marketing lies lol.
Posted on Reply
#11
R-T-B
SoNic67, post: 4043535, member: 152626"
I was sarcastic.

As a EE, I would love to be in the design team for the support building and utilities - power, HVAC, water... We are used to see something like max 30kW per cabinet :)
And I would rather have the tax money spent here in US than in rebuilding failed countries.
I agree with all that, short of the "failed countries" bit. I'd like to believe all countries can one day succeed.

TheGuruStud, post: 4043583, member: 42692"
You're a bit special aren't ya? Intel has no product, period. AMD currently has one, another releasing first quarter of next year and then the next one in the pipe. What does intel have? Marketing lies lol.
Xeon... is a product? They do have those, say what you will about them.
Posted on Reply
#12
nickbaldwin86
going to need a nuclear power plant to run all those AMD chips :p
Posted on Reply
#13
notb
I'm not surprised by the EPYC part, but using Radeon Instinct is a slight concern.

Maybe Cray will help AMD write a proper API. Or port CUDA...
Posted on Reply
#14
R-T-B
notb, post: 4043621, member: 165619"
Maybe Cray will help AMD write a proper API. Or port CUDA...
I have a feeling they are using OpenCL, which AMD has excellent support for.
Posted on Reply
#15
Mark Little
R-T-B, post: 4043602, member: 41983"
I agree with all that, short of the "failed countries" bit. I'd like to believe all countries can one day succeed.



Xeon... is a product? They do have those, say what you will about them.
He's talking about Intel Xe (unreleased) vs. Instinct/Epyc series which have been released.

notb, post: 4043621, member: 165619"
I'm not surprised by the EPYC part, but using Radeon Instinct is a slight concern.

Maybe Cray will help AMD write a proper API. Or port CUDA...
From the Anandtech article,

"And as the principle processor provider, AMD will also be taking on a lot of the responsibility for developing the software stack as well, with the company working with Cray to develop an enhanced version of their ROCm environment to best extract performance from the massive cluster of CPUs and GPUs. "
Posted on Reply
#16
notb
R-T-B, post: 4043626, member: 41983"
I have a feeling they are using OpenCL, which AMD has excellent support for.
Who's "they"? :-)
These supercomputers are used by researchers. You have a project, you apply for access and they decide whether you're worthy or not. ;-)

It seems like going for Nvidia GPUs would be more flexible. Also, majority of their clusters use Nvidia GPUs already.
Suddenly ORNL ordered 2 supercomputers with GPUs made by Intel and AMD. It's slightly surprising - that's all.
Posted on Reply
#17
SoNic67
R-T-B, post: 4043602, member: 41983"
I'd like to believe all countries can one day succeed.
Maybe "one day". But for now some countries are in tribal, medieval time, with no signs of bettering.

R-T-B, post: 4043626, member: 41983"
I have a feeling they are using OpenCL
I doubt they will use open solutions.
Posted on Reply
#18
Vayra86
notb, post: 4043658, member: 165619"
Who's "they"? :)
These supercomputers are used by researchers. You have a project, you apply for access and they decide whether you're worthy or not. ;-)

It seems like going for Nvidia GPUs would be more flexible. Also, majority of their clusters use Nvidia GPUs already.
Suddenly ORNL ordered 2 supercomputers with GPUs made by Intel and AMD. It's slightly surprising - that's all.
Might be a versatility move. They now have two different setups with I reckon the AMD being noticeably cheaper but perhaps equally good at getting a job done. Also, new Nvidia GPUs carry different hardware that may or may not be useful for the objectives they have in mind; and I'm not sure they still had the possibility for major requests of Pascal GPUs for example.
Posted on Reply
#19
notb
Vayra86, post: 4043680, member: 152404"
Might be a versatility move. They now have two different setups with I reckon the AMD being noticeably cheaper but perhaps equally good at getting a job done.
Well, the reality is that Nvidia cluster can run CUDA and this can't. That covers the "versatility" issue.
Whether this is cheaper or not - I have no idea. Maybe they simply wanted a customized GPU, in which case AMD is an easier partner.
Also, new Nvidia GPUs carry different hardware that may or may not be useful for the objectives they have in mind; and I'm not sure they still had the possibility for major requests of Pascal GPUs for example.
Once again: this is an all-round cluster, not built for a particular task. So the "additional hardware" is a plus. Especially when it's made for machine learning (it's quite popular, really :-P).
Anyway, both V100 and P100 are still offered by Nvidia. I'm not sure about K80 - maybe it's limited to existing clients.
Posted on Reply
#20
phill
I love seeing posts and news about AMD winning over contracts like this, as many have said, AMD are back in the game and so rightfully too :)

AMD, hats off to you sir/maam :)
Posted on Reply
#21
R-T-B
SoNic67, post: 4043678, member: 152626"
I doubt they will use open solutions.
A lot of mainframe/supercomputer projects depend on open source so it would surprise me if they didn't.

notb, post: 4043658, member: 165619"
Who's "they"?
Unless you want to code to metal, you use the framework provided.

So effectively everyone.

notb, post: 4043697, member: 165619"
Well, the reality is that Nvidia cluster can run CUDA and this can't. That covers the "versatility" issue.
There is also the other versatility, you know, the one of being able to operate on more platforms and with more software (open vs closed drivers).


notb, post: 4043658, member: 165619"
It's slightly surprising - that's all.
I don't disagree. That's why I suspect a strange software stack that needs an open source driver for some part of the solution. I don't see another justification.
Posted on Reply
#22
mtcn77
phill, post: 4043710, member: 96013"
I love seeing posts and news about AMD winning over contracts like this, as many have said, AMD are back in the game and so rightfully too :)

AMD, hats off to you sir/maam :)
This is some deep necropost. I have been discussing this previously a lot, AMD is not in the back, AMD is in the lead.
It is all there in fine print.
Posted on Reply
#23
R-T-B
mtcn77, post: 4043728, member: 85046"
This is some deep necropost. I have been discussing this previously a lot, AMD is not in the back, AMD is in the lead.
It is all there in fine print.
What? A bulldozer era article?

*Walks away confused*
Posted on Reply
#24
notb
R-T-B, post: 4043717, member: 41983"
There is also the other versatility, you know, the one of being able to operate on more platforms and with more software (open vs closed drivers).
I know you're advocating open source a lot, but this argument makes no sense.
If you're moving to a platform with different API, you have to rewrite everything. It doesn't matter if it's open or closed.

Who already has access to existing Nvidia clusters will likely stay there (especially for AI-related computing). New users will be moved to Frontier.

People have been using CUDA for a decade. It's the de facto standard.
Sure, I'd rather have something market wide in case Thanos snaps fingers and we're unlucky enough to lose the whole Nvidia team. But this standard should be CUDA. It's excellent. And everyone already uses it.
AMD and Intel should simply pay Nvidia and port it instead of wasting money on developing alternatives.

Anyway, we're going to see one more exa cluster announcement in USA (for LLNL). One went to Intel, one to AMD. Maybe that was the idea: provide 3 different architectures.
Posted on Reply
#25
R-T-B
notb, post: 4043763, member: 165619"
It doesn't matter if it's open or closed.
And if the driver is a closed binary unavailable for your platform?

Least resistance. It is arguably easier to port a driver than pay for a new closed one to integrate with a moving target (OSS kernel).

I do advocate opensource, but only when it's actaully helpful. Their build design choices suggest it must be, or they'd probably have went with a CUDA based system for the versatility of the end users running code.
Posted on Reply
Add your own comment