Thursday, February 8th 2024

NVIDIA CG100 "Grace" Server Processor Benchmarked by Academics

The Barcelona Supercomputing Center (BSC) and the State University of New York (Stony Brook and Buffalo campuses) have pitted NVIDIA's relatively new CG100 "Grace" Superchip against several rival products in a "wide variety of HPC and AI benchmarks." Team Green marketing material has focused mainly on the overall GH200 "Grace Hopper" package—so it is interesting to see technical institutes concentrate on the company's "first true" server processor (ARM-based), rather than the ever popular GPU aspect. The Next Platform's article summarized the chip's internal makeup: "(NVIDIA's) Grace CPU has a relatively high core count and a relatively low thermal footprint, and it has banks of low-power DDR5 (LPDDR5) memory—the kind used in laptops but gussied up with error correction to be server class—of sufficient capacity to be useful for HPC systems, which typically have 256 GB or 512 GB per node these days and sometimes less."

Benchmark results were revealed at last week's HPC Asia 2024 conference (in Nagoya, Japan)—Barcelona Supercomputing Center (BSC) and the State University of New York also uploaded their findings to the ACM Digital Library (link #1 & #2). BSC's MareNostrum 5 system contains an experimental cluster portion—consisting of NVIDIA Grace-Grace and Grace-Hopper superchips. We have heard plenty about the latter (in press releases), but the former is a novel concept—as outlined by The Next Platform: "Put two Grace CPUs together into a Grace-Grace superchip, a tightly coupled package using NVLink chip-to-chip interconnects that provide memory coherence across the LPDDR5 memory banks and that consumes only around 500 watts, and it gets plenty interesting for the HPC crowd. That yields a total of 144 Arm Neoverse "Demeter" V2 cores with the Armv9 architecture, and 1 TB of physical memory with 1.1 TB/sec of peak theoretical bandwidth. For some reason, probably relating to yield on the LPDDR5 memory, only 960 GB of that memory capacity and only 1 TB/sec of that memory bandwidth is actually available."
BSC's older MareNostrum 4 supercomputer is based on "nodes comprised of a pair of 24-core Skylake-X Xeon SP-8160 Platinum processors running at 2.1 GHz." The almost seven year old Team Blue-based system was bested by the NVIDIA-fortified MareNostrum 5—the latter's worst performance results were still 67% faster, while its best was indicated a 4.49x performance advantage. The Upstate New York Institute fielded a wider ranger of rival solutions against its own NVIDIA setup—in "Grace-Grace" (CPU-CPU pair) and "Grace-Hopper" (CPU-GPU pair) configurations. The competition included: Intel Sapphire Rapids and Ice Lake, AMD Milan, plus the ARM-based Amazon Graviton 3 and Fujitsu A64FX processors. Tom's Hardware checked SUNY's comparison data: "The Grace Superchip easily beat the Graviton 3, the A64FX, an 80-core Ice Lake setup, and even a 128-core configuration of Milan in all benchmarks. However, the Sapphire Rapids server with two 48-core Xeon Max 9468s stopped Grace's winning streak."
They continued: "Against Sapphire Rapids in HBM mode, Grace only won in three of the eight tests—though it was able to outperform in five tests when in DDR5 mode. It's a surprisingly mixed bag for Nvidia considering that Grace has 50% more cores and uses TSMC's more advanced 4 nm node instead of Intel's aging Intel 7 (formerly 10 nm) process. It's not entirely out of left field, though: Sapphire Rapids also beat AMD's EPYC Genoa chips for a spot in a MI300X-powered Azure instance, indicating that, despite Sapphire Rapid's shortcomings, it still has plenty of potency for HPC...On the other hand, NVIDIA might have a crushing victory in efficiency. The Grace Superchip is rated for 500 watts, while the Xeon Max 9468 is rated for 350 watts, which means two would have a TDP of 700 watts. The paper doesn't detail power consumption on either chip, but if we assume each chip was running at its TDP, then the comparison becomes very favorable for NVIDIA."

The Next Platform believes that Team Green's CG100 server processor is truly bolstered by its onboard neighbor: "any CPU paired with the same Hopper GPU would probably do as well. On the CPU-only Grace-Grace unit, the Gromacs performance is almost as potent as a pair of 'Sapphire Rapids' Xeon Max Series CPUs. It is noteworthy that the HBM memory on this chip doesn't help that much for Gromacs. Hmmmm. Anyway, that is some food for thought about the Grace CPU and HPC workloads."
Sources: Next Platform, Tom's Hardware, ACM Digital Library #1, ACM Digital Library #2
Add your own comment

11 Comments on NVIDIA CG100 "Grace" Server Processor Benchmarked by Academics

#1
AnotherReader
I wonder how it would compare to Bergamo which has the advantage of AVX-512 and higher memory bandwidth over Milan.
Posted on Reply
#2
ScaLibBDP
Attention! There are a lot of issues and inconsistencies in the article and results.

- It is from academia and in some cases academia researches are good at writing publications rather than delivering high quality HPC production codes
Note: Show me a piece of code and I'll tell you if it is implemented by a PhD computer scientist or by a highly experienced software engineer

- In HPC we do Not measure Peak Processing Power ( PPP ) in FLOPS per Clock! It is always measured in FLOPS per second. Take a look at www.top500.org numbers and supercomputer specs and you'll see that core clocks of CPUs and GPUs are always different.

- Is that right to compare performance of 48-core processor against 144-core processor with different core clock frequencies without Normalization results?
Note: It is absolutely useless without normalizing results! If I normalize a result of 48-core processor against 144-core processor ( multiply by 3 ) than the ARM processor is faster!
Is that right to compare fuel efficiency of hybrid cars of different sizes and masses?

>>...HBM memory on this chip doesn't help that much for Gromacs...
- This is because processing in Gromacs is CPU-bond rather than RAM-bound.
Posted on Reply
#3
TumbleGeorge
Off/How many academics does it take to change a light bulb?/end off
Posted on Reply
#4
Denver
What an unflattering comparison the university has presented... They depict the EPYC 7763 (Zen3) as having 128 cores, when in reality, it only has 64 physical cores. They've mixed desktop and server components, among other discrepancies. It would be more interesting to see mi300 vs H100, both paired with Genoa vs Nvidia "Super" Chip.

At this point I can only say that something smells bad. lol
Posted on Reply
#5
AnotherReader
DenverWhat an unflattering comparison the university has presented... They depict the EPYC 7763 (Zen3) as having 128 cores, when in reality, it only has 64 physical cores. They've mixed desktop and server components, among other discrepancies. It would be more interesting to see mi300 vs H100, both paired with Genoa vs Nvidia "Super" Chip.

At this point I can only say that something smells bad. lol
They may have been using a dual socket system. Still, to be fair, they should have included Zen 4 based SKUs like the EPYC 9754 (128 cores) or EPYC 9654 (96 cores).
Posted on Reply
#6
AnarchoPrimitiv
AnotherReaderI wonder how it would compare to Bergamo which has the advantage of AVX-512 and higher memory bandwidth over Milan.
Yeah, or I don't know, how about we compare it to the NEW epyc chips instead of the Zen3 ones? Here's what I'm interested in, Phoronix compared the Xeon Max with HBM against Genoa-X (the large cache variants) and Genoa-X easily beat the Xeons with HBM.....so if the Xeons with HBM beat the Nvidia CPUs, and Epyc Genoa-X with extra cache beat the Xeons with HBM, does that mean Epyc Genoa-X will beat the Nvidia CPU?

That's why I was so disappointed they tested with Zen3 epyc
Posted on Reply
#7
Minus Infinity
AnotherReaderI wonder how it would compare to Bergamo which has the advantage of AVX-512 and higher memory bandwidth over Milan.
Why not compare it to MI300's the direct competitors?
Posted on Reply
#9
ScaLibBDP
DenverAt this point I can only say that something smells bad. lol
I don't think you're the only one who thinks about it!

Intel, AMD and ARM are very concerned that NVIDIA stepped into the CPU-server market with a new generation system ( CPU+GPU ). It is possible that the work was financially supported by one of these companies, of course Not directly.

Another thing is that all these companies are absolutely jealous of regarding current NVIDIA revenues and hardware advances. All of them could only dream about hardware orders similar to an order by Meta from NVIDIA, that is, 350,000 NVIDIA H100 GPUs of 10.5B US dollars! It is possible that the publication is an attempt to harm NVIDIA reputation, something like, "...look, our 3rd Gen CPUs are better than the latest most advanced system from NVIDIA..." in order to boost number of orders of older Intel Xeon and AMD EPYC CPUs.

That is why Microsoft and OpenAI are talking about investing of billions of dollars into new chip making factories. Once again, all of them are simply jealous and dream about revenues of NVIDIA.

Also, take a look at an article on www.hpcwire.com:

www.hpcwire.com/2024/02/06/nvidias-dominance-in-ai-chips-challenged-by-big-tech-companies
Posted on Reply
#10
Denver
ScaLibBDPI don't think you're the only one who thinks about it!

Intel, AMD and ARM are very concerned that NVIDIA stepped into the CPU-server market with a new generation system ( CPU+GPU ). It is possible that the work was financially supported by one of these companies, of course Not directly.

Another thing is that all these companies are absolutely jealous of regarding current NVIDIA revenues and hardware advances. All of them could only dream about hardware orders similar to an order by Meta from NVIDIA, that is, 350,000 NVIDIA H100 GPUs of 10.5B US dollars! It is possible that the publication is an attempt to harm NVIDIA reputation, something like, "...look, our 3rd Gen CPUs are better than the latest most advanced system from NVIDIA..." in order to boost number of orders of older Intel Xeon and AMD EPYC CPUs.

That is why Microsoft and OpenAI are talking about investing of billions of dollars into new chip making factories. Once again, all of them are simply jealous and dream about revenues of NVIDIA.

Also, take a look at an article on www.hpcwire.com:

www.hpcwire.com/2024/02/06/nvidias-dominance-in-ai-chips-challenged-by-big-tech-companies
The tests show the Nvidia chip ahead in some cases. This test does not favor AMD, I think it was just done poorly.
Posted on Reply
#11
Redwoodz
Cherry-picked bench slides,lmao.
Posted on Reply
Add your own comment
May 7th, 2024 09:06 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts