Friday, February 16th 2024

NVIDIA Unveils "Eos" to Public - a Top Ten Supercomputer

Providing a peek at the architecture powering advanced AI factories, NVIDIA released a video that offers the first public look at Eos, its latest data-center-scale supercomputer. An extremely large-scale NVIDIA DGX SuperPOD, Eos is where NVIDIA developers create their AI breakthroughs using accelerated computing infrastructure and fully optimized software. Eos is built with 576 NVIDIA DGX H100 systems, NVIDIA Quantum-2 InfiniBand networking and software, providing a total of 18.4 exaflops of FP8 AI performance. Revealed in November at the Supercomputing 2023 trade show, Eos—named for the Greek goddess said to open the gates of dawn each day—reflects NVIDIA's commitment to advancing AI technology.

Eos Supercomputer Fuels Innovation
Each DGX H100 system is equipped with eight NVIDIA H100 Tensor Core GPUs. Eos features a total of 4,608 H100 GPUs. As a result, Eos can handle the largest AI workloads to train large language models, recommender systems, quantum simulations and more. It's a showcase of what NVIDIA's technologies can do, when working at scale. Eos is arriving at the perfect time. People are changing the world with generative AI, from drug discovery to chatbots to autonomous machines and beyond. To achieve these breakthroughs, they need more than AI expertise and development skills. They need an AI factory—a purpose-built AI engine that's always available and can help ramp their capacity to build AI models at scale Eos delivers. Ranked No. 9 in the TOP 500 list of the world's fastest supercomputers, Eos pushes the boundaries of AI technology and infrastructure.
It includes NVIDIA's advanced accelerated computing and networking alongside sophisticated software offerings such as NVIDIA Base Command and NVIDIA AI Enterprise.


Eos's architecture is optimized for AI workloads demanding ultra-low-latency and high-throughput interconnectivity across a large cluster of accelerated computing nodes, making it an ideal solution for enterprises looking to scale their AI capabilities. Based on NVIDIA Quantum-2 InfiniBand with In-Network Computing technology, its network architecture supports data transfer speeds of up to 400 Gb/s, facilitating the rapid movement of large datasets essential for training complex AI models.

At the heart of Eos lies the groundbreaking DGX SuperPOD architecture powered by NVIDIA's DGX H100 systems. The architecture is built to provide the AI and computing fields with tightly integrated full-stack systems capable of computing at an enormous scale. As enterprises and developers worldwide seek to harness the power of AI, Eos stands as a pivotal resource, promising to accelerate the journey towards AI-infused applications that fuel every organization.
Sources: NVIDIA Blog, ServeTheHome
Add your own comment

20 Comments on NVIDIA Unveils "Eos" to Public - a Top Ten Supercomputer

#1
xrli
TOP 500 should make a separate ranking based on FP16 performance. This and other H100 super computers are clearly not trying to compete in FP64 HPC workload but is focusing only on AI.
Posted on Reply
#2
GreiverBlade
awwww, no mention of the CPU used?

iirc DGX superPOD use AMD Rome Epyc, do they still?

interesting top 10 nonetheless
Posted on Reply
#4
Patriot
P4-630Probably Nvidia Grace CPU's?...
www.nvidia.com/en-us/data-center/grace-cpu/
The nvidia grace superchip is 2 cpus or 1 cpu 1 gpu, not a dgx superpod.
The A100 DGX was AMD based, rumor is they wouldn't give them a discount this time around so they went Intel who would.
Posted on Reply
#5
Daven
At this rate Aurora won’t even be in the top ten for more than one list.
Posted on Reply
#6
AnarchoPrimitiv
PatriotThe nvidia grace superchip is 2 cpus or 1 cpu 1 gpu, not a dgx superpod.
The A100 DGX was AMD based, rumor is they wouldn't give them a discount this time around so they went Intel who would.
I have a feeling that a deep discount is the primary reason anyone chooses Intel in these applications
Posted on Reply
#7
evanevanevan
GreiverBladeawwww, no mention of the CPU used?

iirc DGX superPOD use AMD Rome Epyc, do they still?

interesting top 10 nonetheless
Celeron, quad core
Posted on Reply
#8
Patriot
AnarchoPrimitivI have a feeling that a deep discount is the primary reason anyone chooses Intel in these applications
Yes and no, not every application requires heavy cpu usage. Sometimes fewer higher clocked cores are better. From what I understand, even with the mi300x, Initial testing showed SP outperforming Genoa.
My guess is they need to try some mid cored Genoa with higher clocks, but it might be architectural and scheduler issues. And divide by 60 to get fp64 rating.
Posted on Reply
#9
Daven
PatriotYes and no, not every application requires heavy cpu usage. Sometimes fewer higher clocked cores are better. From what I understand, even with the mi300x, Initial testing showed SP outperforming Genoa.
My guess is they need to try some mid cored Genoa with higher clocks, but it might be architectural and scheduler issues. And divide by 60 to get fp64 rating.
AMD sells lower core count Epyc SKUs. You don’t have to buy the 96 core version. So if what you are saying is true, the only motivation to buy Intel over AMD is still discounts.
Posted on Reply
#10
mechtech
serious question

how long does it take to build a supercomputer - from tech spec to fully commissioned and operational?
Posted on Reply
#11
evernessince
PatriotYes and no, not every application requires heavy cpu usage. Sometimes fewer higher clocked cores are better. From what I understand, even with the mi300x, Initial testing showed SP outperforming Genoa.
My guess is they need to try some mid cored Genoa with higher clocks, but it might be architectural and scheduler issues. And divide by 60 to get fp64 rating.
The article is about supercomputers which inherently means that workloads are designed for high parallelization. Absolutely Nvidia is not putting out the best product by going with the cheaper Intel CPUs, that will increase the TCO over time for it's customers.
mechtechserious question

how long does it take to build a supercomputer - from tech spec to fully commissioned and operational?
It varies a lot. $100 million to $1 Billion plus. Those numbers are prior AI boom mind you and given Nvidia's prices I would not be surprised if it exceeds that figure.
Posted on Reply
#12
mechtech
evernessinceThe article is about supercomputers which inherently means that workloads are designed for high parallelization. Absolutely Nvidia is not putting out the best product by going with the cheaper Intel CPUs, that will increase the TCO over time for it's customers.



It varies a lot. $100 million to $1 Billion plus. Those numbers are prior AI boom mind you and given Nvidia's prices I would not be surprised if it exceeds that figure.
Not dollars.......time....years?
Posted on Reply
#13
evernessince
mechtechNot dollars.......time....years?
Years usually.
Posted on Reply
#14
atomek
To put things into perspective, most powerful supercomputer from 2000 (Ascii White - 12 TFlops ) was ~ 1 400 000 times slower than this one.
Posted on Reply
#15
Count von Schwalbe
mechtechNot dollars.......time....years?
I think Frontier (ORNL) was around 4 years, IIRC.
Posted on Reply
#16
Wirko
mechtechNot dollars.......time....years?
Depends on how fast you're able to burn those dollars, hehe.

But if Nvidia decides to make Eos a sellable physical product, equal to this first Eos for the most part, then it shouldn't take more than a few months. Large companies might be interested, they would get a field-tested system with a predictable performance and a relatively short delivery time.
Posted on Reply
#17
evernessince
WirkoDepends on how fast you're able to burn those dollars, hehe.

But if Nvidia decides to make Eos a sellable physical product, equal to this first Eos for the most part, then it shouldn't take more than a few months. Large companies might be interested, they would get a field-tested system with a predictable performance and a relatively short delivery time.
Depends on how long planning takes and whether a suitable location needs to be built. Often the facility that houses a supercomputer is purpose built / furbished.
Posted on Reply
#18
Denver
4608 GPUs(H100) x 3.95Pflops = 18.2 Exaflops FP8
4000 GPUs(mi300x) x 5.2Pflops = 20.8 Exaflops FP8
4608 mi300x = 23,9Exaflops

:cool:
Posted on Reply
#20
Leiesoldat
lazy gamer & woodworker
mechtechserious question

how long does it take to build a supercomputer - from tech spec to fully commissioned and operational?
The actual procurement, install, and initial optimization prior to the Top500 run of Frontier took around 5 to 6 years. The project started in FY2016 and the install occurred in December 2021.

The Top500 run was in May of 2022. The first design talks for an exascale supercomputer started at the beginning of the 2010's and the primary concern at the time was whether or not an exascale computer could be built and only consume 25 MW of electricity or less. This was a constraint imposed by the US Department of Energy due to the government not wanting to spend a buttload of money on energy costs. Cost for Frontier was around 500 to 600 million USD. The cost of the actual Exascale Computing Project (updating many large software and application products to use CPU/GPUs at these large scales) is 1.8 billion USD.

Source: Al Geist's (corporate fellow, ORNL) presentation talk at the Exascale Computing Project's 2023 Independent Project Review
Source2: I work in the project office for the ECP
Posted on Reply
Add your own comment
May 5th, 2024 07:15 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts