Monday, May 22nd 2023

Frontier Remains As Sole Exaflop Machine on TOP500 List

Increasing its HPL score from 1.02 Eflop/s in November 2022 to an impressive 1.194 Eflop/s on this list, Frontier was able to improve upon its score after a stagnation between June 2022 and November 2022. Considering exascale was only a goal to aspire to just a few years ago, a roughly 17% increase here is an enormous success. Additionally, Frontier earned a score of 9.95 Eflop/s on the HLP-MxP benchmark, which measures performance for mixed-precision calculation. This is also an increase over the 7.94 EFlop/s that the system achieved on the previous list and nearly 10 times more powerful than the machine's HPL score. Frontier is based on the HPE Cray EX235a architecture and utilizes AMD EPYC 64C 2 GHz processors. It also has 8,699,904 cores and an incredible energy efficiency rating of 52.59 Gflops/watt. It also relies on gigabit ethernet for data transfer.
The Fugaku system at the Riken Center for Computational Science (R-CCS) in Kobe, Japan, also remained at the No. 2 spot that it earned on the previous list. The system held steady at its previous HPL score of 0.442 Eflop/s.

The LUMI system at EuroHPC/CSC in Finland entered the list in June 2022 at No. 3. It is listed as No. 3 after an upgrade of the system last november and has an HPL score of 0.3091 Eflop/s. With this it remains the largest system in Europe.

The Leonardo system at EuroHPC/CINECA in Bologna, Italy, remains at the No. 4 spot. It also saw upgrades that allowed it to improve upon its score, achieving an HPL score of 0.239 Eflop/s in comparison to its previous score of 0.174 EFlop/s.

Here is a summary of the system in the Top 10:
  • Frontier is the No. 1 system in the TOP500. This HPE Cray EX system is the first US system with a performance exceeding one Exaflop/s. It is installed at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, where it is operated for the Department of Energy (DOE). It currently has achieved 1.194 Eflop/s using 8,699,904 cores. The HPE Cray EX architecture combines 3rd Gen AMD EPYC CPUs optimized for HPC and AI, with AMD Instinct 250X accelerators, and Slingshot-10 interconnect.
  • Fugaku, the No. 2 system, is installed at the RIKEN Center for Computational Science (R-CCS) in Kobe, Japan. It has 7,630,848 cores which allowed it to achieve an HPL benchmark score of 442 Pflop/s.
  • The LUMI system, another HPE Cray EX system installed at EuroHPC center at CSC in Finland is the No. 3 with a performance of 0.3091 Eflop/s. The European High-Performance Computing Joint Undertaking (EuroHPC JU) is pooling European resources to develop top-of-the-range Exascale supercomputers for processing big data. One of the pan-European pre-Exascale supercomputers, LUMI, is located in CSC's data center in Kajaani, Finland.
  • The No. 4 system Leonardo is installed at a different EuroHPC site in CINECA, Italy. It is an Atos BullSequana XH2000 system with Xeon Platinum 8358 32C 2.6GHz as main processors, NVIDIA A100 SXM4 40 GB as accelerators, and Quad-rail NVIDIA HDR100 Infiniband as interconnect. It achieved a Linpack performance of 238.7 Pflop/s.
  • Summit, an IBM-built system at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, is again listed at the No. 5 spot worldwide with a performance of 148.8 Pflop/s on the HPL benchmark, which is used to rank the TOP500 list. Summit has 4,356 nodes, each one housing two POWER9 CPUs with 22 cores each and six NVIDIA Tesla V100 GPUs each with 80 streaming multiprocessors (SM). The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.
  • Sierra, a system at the Lawrence Livermore National Laboratory, CA, USA is at No. 6. Its architecture is very similar to the #5 system's Summit. It is built with 4,320 nodes with two POWER9 CPUs and four NVIDIA Tesla V100 GPUs. Sierra achieved 94.6 Pflop/s.
  • Sunway TaihuLight, a system developed by China's National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, which is in China's Jiangsu province is listed at the No. 7 position with 93 Pflop/s.
  • Perlmutter at No. 8 is based on the HPE Cray "Shasta" platform and a heterogeneous system with AMD EPYC-based nodes and 1,536 NVIDIA A100 accelerated nodes. Perlmutter achieved 64.6 Pflop/s
  • Selene now at No. 9 is an NVIDIA DGX A100 SuperPOD installed inhouse at NVIDIA in the USA. The system is based on AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as network and achieved 63.4 Pflop/s.
  • Tianhe-2A (Milky Way-2A), a system developed by China's National University of Defense Technology (NUDT) and deployed at the National Supercomputer Center in Guangzhou, China is now listed as the No. 10 system with 61.4 Pflop/s.
Other TOP500 Highlights
The TOP500 list shows that AMD, Intel, and IBM processors are the preferred choice for HPC systems. Out of the TOP10, four systems use AMD processors (Frontier, LUMI, Perlmutter, and Selene), two use Intel processors (Leonardo and Tianhe-2A), and two use IBM processors (Summit and Sierra.)

Much like the previous list, China and the United States earned most of the entries on the entire TOP500 list. The United States increased its lead from 126 machines on the last list to 150 on the current list, while China dropped from 162 systems to 134. In terms of entire continents, Asia as a whole saw 192 machines on the list, North America added 160 systems, and Europe offered 133 systems.

In terms of system interconnects, ethernet was still the clear winner despite dropping from 233 machines to 227. Infiniband interconnects increased their presence on the list from 194 machines to 200, and Omnipath dropped from 36 machines to 35. Custom interconnects saw a massive increase from 4 systems to 31.

GREEN500 Results
The No. 1 spot on the GREEN500 was again earned by the Henri system at the Flatiron Institute in New York City, United States, with an energy efficiency of 65.40 Gflops/Watt. What's more, improvements to the system allowed it to achieve an impressive jump on the TOP500 list from the No. 405 spot to No. 255 with a current HPL score of 2.88 Pflop/s - an increase over last list's score of 2.038 Pflop/s. Henri has 8,288 and is a Lenovo ThinkSystem SR670 with Intel Xeon Platinum and Nvidia H100.

The No. 2 spot was achieved by the Frontier Test & Development System (TDS) at ORNL in the United States with an energy efficiency rating of 62.20 Gflops/Watt. The Frontier TDS system is simply a single rack identical to the actual Frontier system and has an HPL score of 19.2 Pflop/s.

The No. 3 spot was taken by the Adastra system. A HPE Cray EX235a system with AMD EPYC and AMD Instinct MI250X.

Additionally, the actual Frontier system deserves an honorable mention in terms of its energy efficiency. Despite its No.1 spot on the TOP500 list with an HPL score of 1.194 Eflop/s, this machine was still able to achieve a No. 6 spot on the GREEN500 with an energy efficiency rating of 52.59 Gflops/Watt.

The HPL performance of each of these systems proves that immense power does not have to come at the cost of inefficient energy usage.

HPCG Results
The TOP500 list has incorporated the High-Performance Conjugate Gradient (HPCG) benchmark results, which provide an alternative metric for assessing supercomputer performance. This score is meant to complement the HPL measurement to give a fuller understanding of the machine.

The Fugaku system once again achieved the top position on the HPCG by holding to its previous score of 16.0 HPCG-Pflop/s. Frontier claimed the No. 2 spot with 14.05 HPCG-Pflop/s and No. 3 was captured by LUMI with a score of 3.41 HPCG-Pflop/s.

HPL-MxP Results (Formally HPL-AI)
The HPL-MxP benchmark seeks to highlight the use of mixed precision computations. Traditional HPC uses 64-bit floating point computations. Today, we see hardware with various levels of floating-point precisions - 32-bit, 16-bit, and even 8-bit. The HPL-MxP benchmark demonstrates that by using mixed precision during computation, much higher performance is possible. By using mathematical techniques, the same accuracy can be computed with a mixed-precision technique when compared with straight 64-bit precision.

The clear winner of the HPL-MxP benchmark is Frontier, with a stunning score of 9.95 Eflop/s that improves heavily upon its previous score of 7.9 EFlop/s. Second place was awarded to LUMI with a score of 2.2 Eflop/s and third place was earned by Fugaku with a score of 2.0 Eflop/s.

About the TOP500 List
The first version of what became today's TOP500 list started as an exercise for a small conference in Germany in June 1993. Out of curiosity, the authors decided to revisit the list in November 1993 to see how things had changed. About that time they realized they might be onto something and decided to continue compiling the list, which is now a much-anticipated, much-watched and much-debated twice-yearly event.
Add your own comment

13 Comments on Frontier Remains As Sole Exaflop Machine on TOP500 List

#1
simlariver
It also relies on gigabit ethernet for data transfer.
Hmmm, Typo ?
Posted on Reply
#2
A Computer Guy
It also relies on gigabit ethernet for data transfer.
simlariverHmmm, Typo ?
LOL I knew it. It's just an oversized Synology NAS.
Posted on Reply
#3
Warigator
How would a zettaflops supercomputer even be built? How would it work?

An average supercomputer is still around 1 petaflops. Let first make the average 100 petaflops and then start thinking about zettascale.
Posted on Reply
#4
trsttte
simlariverHmmm, Typo ?
Probably the WAN connection to the outside world, inside the cluster is definitely much more than that. Not that impressive nowadays but not terrible.
Posted on Reply
#5
Warigator
How do these HPL petaflops translate to real-world actual useful performance? I guess not very well. It's easier to increase flops than to increase actual performance of useful workloads.
Posted on Reply
#6
Patriot
trsttteProbably the WAN connection to the outside world, inside the cluster is definitely much more than that. Not that impressive nowadays but not terrible.
It also says slingshot 10 which is 100Gbe, Slingshot-11 would be 200Gbe
Posted on Reply
#7
Redwoodz
WarigatorHow do these HPL petaflops translate to real-world actual useful performance? I guess not very well. It's easier to increase flops than to increase actual performance of useful workloads.
:slap: It's a mixed load computation. Real world is likely better. Just ask Intel how useful it is.
Posted on Reply
#8
TumbleGeorge
WarigatorHow would a zettaflops supercomputer even be built? How would it work?

An average supercomputer is still around 1 petaflops. Let first make the average 100 petaflops and then start thinking about zettascale.
1.9+ PFlops in last list.
But top 500 is political tool and information inside it is partially correct.
Posted on Reply
#9
dragontamer5788
WarigatorHow do these HPL petaflops translate to real-world actual useful performance? I guess not very well. It's easier to increase flops than to increase actual performance of useful workloads.
For dense workloads which are limited by matrix-multiplication operations, its a perfect benchmark. Alas: Supercomputers do more than just simple matrix-multiplications these days. For other workloads, its a terrible benchmark.

So perfect benchmark for some, highly imperfect... flawed even... for other workloads.
Posted on Reply
#10
Daven
Where is Aurora?
Posted on Reply
#13
trsttte
dragontamer5788Should be ready in.... 2018 or so.
To be fair the projected computing power was also scaled up by a lot (from 180petaFlops in 2018, 1 exaFlop in 2021 and now planned for 2 exaFlops this year).

But point still stands, increasing targets doesn't matter if you never deliver
Posted on Reply
Add your own comment
Dec 7th, 2024 17:47 CST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts