• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Unveils Grace CPU Superchip with 144 Cores and 1 TB/s Bandwidth

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,224 (0.91/day)
NVIDIA has today announced its Grace CPU Superchip, a monstrous design focused on heavy HPC and AI processing workloads. Previously, team green has teased an in-house developed CPU that is supposed to go into servers and create an entirely new segment for the company. Today, we got a more detailed look at the plan with the Grace CPU Superchip. The Superchip package represents a package of two Grace processors, each containing 72 cores. These cores are based on Arm v9 in structure set architecture iteration and two CPUs total for 144 cores in the Superchip module. These cores are surrounded by a now unknown amount of LPDDR5x with ECC memory, running at 1 TB/s total bandwidth.

NVIDIA Grace CPU Superchip uses the NVLink-C2C cache coherent interconnect, which delivers 900 GB/s bandwidth, seven times more than the PCIe 5.0 protocol. The company targets two-fold performance per Watt improvement over today's CPUs and wants to bring efficiency and performance together. We have some preliminary benchmark information provided by NVIDIA. In the SPECrate2017_int_base integer benchmark, the Grace CPU Superchip scores over 740 points, which is just the simulation for now. This means that the performance target is not finalized yet, teasing a higher number in the future. The company expects to ship the Grace CPU Superchip in the first half of 2023, with an already supported ecosystem of software, including NVIDIA RTX, HPC, NVIDIA AI, and NVIDIA Omniverse software stacks and platforms.


View at TechPowerUp Main Site
 
Joined
Jun 29, 2018
Messages
456 (0.21/day)
This will make NVIDIA probably fully independent in terms of building supercomputers. They have most of the hardware in-house (CPU, GPU, NVLink for internal and Mellanox for external networking) with only storage being 3rd party. The iron grip on software is nothing new with CUDA&co.
I am not sure how to feel about this... On one hand it's quite an achievement, but on the other it's a bit monopolistic.
 
Joined
Oct 6, 2021
Messages
1,424 (1.54/day)
When I saw the title I thought for a moment it was something really special.
 
Joined
Jun 25, 2021
Messages
146 (0.14/day)
This will make NVIDIA probably fully independent in terms of building supercomputers. They have most of the hardware in-house (CPU, GPU, NVLink for internal and Mellanox for external networking) with only storage being 3rd party. The iron grip on software is nothing new with CUDA&co.
I am not sure how to feel about this... On one hand it's quite an achievement, but on the other it's a bit monopolistic.

Monolithic, yes. Monopolistic... not yet. Intel has a full stack if their Arc GPU works as intended, and AMD bought Xilinx to flesh out the networking end of their own stack. Nvidia's greatest advantage in the space is their headstart with CUDA adoption; Teams Red and Blue will have to convince customers that it's worth their while to recode for their own hardware.
 
Joined
Jun 29, 2018
Messages
456 (0.21/day)
Monolithic, yes. Monopolistic... not yet. Intel has a full stack if their Arc GPU works as intended, and AMD bought Xilinx to flesh out the networking end of their own stack. Nvidia's greatest advantage in the space is their headstart with CUDA adoption; Teams Red and Blue will have to convince customers that it's worth their while to recode for their own hardware.
Intel has been trying to break the CUDA software monopoly for quite some time, and they are not succeeding. What is AMD's answer to it even? OpenCL? HIP?
The head start of CUDA looks almost insurmountable at this point.

Edit: You also mentioned Xilinx as an equivalent of Mellanox, but this is not the case. Xilinx's top offering is a FPGA-based 2x100GbE network card, while NVIDIA has announced a 64 x 800GbE port switch and is selling 400G adapters since 2021.
 
Last edited:
Joined
Jul 16, 2014
Messages
8,116 (2.28/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
Ahhh, I see why Nvidia wanted ARM, this chip was in R&D prior to that broken deal. I wonder how bad licensing fees are for Nvidia.

Is anyone familiar with this benchmark tool? is 740 good?
 
Joined
Feb 20, 2020
Messages
9,340 (6.14/day)
Location
Louisiana
System Name Ghetto Rigs z490|x99|Acer 17 Nitro 7840hs/ 5600c40-2x16/ 4060/ 1tb acer stock m.2/ 4tb sn850x
Processor 10900k w/Optimus Foundation | 5930k w/Black Noctua D15
Motherboard z490 Maximus XII Apex | x99 Sabertooth
Cooling oCool D5 res-combo/280 GTX/ Optimus Foundation/ gpu water block | Blk D15
Memory Trident-Z Royal 4000c16 2x16gb | Trident-Z 3200c14 4x8gb
Video Card(s) Titan Xp-water | evga 980ti gaming-w/ air
Storage 970evo+500gb & sn850x 4tb | 860 pro 256gb | Acer m.2 1tb/ sn850x 4tb| Many2.5" sata's ssd 3.5hdd's
Display(s) 1-AOC G2460PG 24"G-Sync 144Hz/ 2nd 1-ASUS VG248QE 24"/ 3rd LG 43" series
Case D450 | Cherry Entertainment center on Test bench
Audio Device(s) Built in Realtek x2 with 2-Insignia 2.0 sound bars & 1-LG sound bar
Power Supply EVGA 1000P2 with APC AX1500 | 850P2 with CyberPower-GX1325U
Mouse Redragon 901 Perdition x3
Keyboard G710+x3
Software Win-7 pro x3 and win-10 & 11pro x3
Benchmark Scores Are in the benchmark section
Hi,
The next super miner :clap:
 
Joined
May 2, 2017
Messages
7,762 (3.05/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Ahhh, I see why Nvidia wanted ARM, this chip was in R&D prior to that broken deal. I wonder how bad licensing fees are for Nvidia.

Is anyone familiar with this benchmark tool? is 740 good?
740 is pretty good, even for 144 cores. Anandtech's latest latest Ice Lake Xeon review has a comparison:

So, 2x the score of 2x64 cores of EPYC?


Scratch that: 2x64 threads of EPYC. The 7F53 is a 32c64t chip.

Makes me wonder why they're only quoting integer though - are they thinking that all fp compute be offloaded to the GPUs?


As for the relation of this to the ARM acquisition attempt: this is just more proof that the world dodged a bullet on that one. This is clear indication that if Nvidia controlled ARM and their licences, they would be strongly incentivized to use these in anticompetitive ways to bolster their own performance and competitiveness. The only reason x86 works decently is that the two major are completely bound up in cross-licencing deals, meaning neither can strong-arm the other or limit their access to anything without hurting themselves too. Nothing similar exists for ARM.

Edit: derp, see above.

Edit2: Here you go:

Still good, though I have to wonder how much of this is due to the package being designed specifically for the power delivery and cooling of SXM5 though. EPYC tops out at 280W no matter what; I'd expect this to go higher.
 
Last edited:
Joined
Jun 29, 2018
Messages
456 (0.21/day)
Still good, though I have to wonder how much of this is due to the package being designed specifically for the power delivery and cooling of SXM5 though. EPYC tops out at 280W no matter what; I'd expect this to go higher.
Is it SXM5? It looks different from the H100 photo. I haven't yet read the entirety of the white paper, but on page 10 there is a similar package that contains one Hopper and one Grace together:
hopper.png

It too looks different from the H100 modules and has very special looking VRMs also on the Hopper side (which probably are expensive and different from H100).
 
Joined
May 2, 2017
Messages
7,762 (3.05/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Is it SXM5? It looks different from the H100 photo. I haven't yet read the entirety of the white paper, but on page 10 there is a similar package that contains one Hopper and one Grace together:
View attachment 240873
It too looks different from the H100 modules and has very special looking VRMs also on the Hopper side (which probably are expensive and different from H100).
Good question, but it would be really weird if Nvidia designed these around several mezzanine card standards. SXM is their own (AMD uses the open OAM standard), so I would expect every HPC part Nvidia makes that isn't for PCIe to be for that - they want you to buy the full system from them, after all.

That combo package is really interesting. The trace density between those two packages must be absolutely insane. Also a bit weird, given that mezzanine cards are typically meant for grids of 2x4 cards or more - maybe this is for bespoke installations where it's of specific value to have very low latency between CPU and GPU?

Hmmmm... a thought: do these cards cover two SXM5 ports/sockets/whatever they're called? That would kind of make sense, and the PCB-to-GPU scale would seem to indicate that.
 
Joined
Jun 29, 2018
Messages
456 (0.21/day)
Good question, but it would be really weird if Nvidia designed these around several mezzanine card standards. SXM is their own (AMD uses the open OAM standard), so I would expect every HPC part Nvidia makes that isn't for PCIe to be for that - they want you to buy the full system from them, after all.

That combo package is really interesting. The trace density between those two packages must be absolutely insane. Also a bit weird, given that mezzanine cards are typically meant for grids of 2x4 cards or more - maybe this is for bespoke installations where it's of specific value to have very low latency between CPU and GPU?
They also show a third version:
hopper.png


Hmmmm... a thought: do these cards cover two SXM5 ports/sockets/whatever they're called? That would kind of make sense, and the PCB-to-GPU scale would seem to indicate that.
I don't think so. Looks like they made 2 mezzanine module types and a dedicated high density board. Also the Grace elements are not going to be launched together with H100 - they are aimed at H1 2023. Which makes sense, they are introducing a whole new CPU platform so keeping the GPUs the same while the CPUs changes makes the transition easier. We still don't know what is going to power the H100 boards in DGX boxes, is it some Intel Xeon with PCIe 5? Is it Zen 4? Or maybe POWER10 with built-in NVLinks? Scratch that, the whitepaper says:
Using its PCIe Gen 5 interface, H100 can interface with the highest performing x86 CPUs and SmartNICs / DPUs [...]
 
Last edited:
Joined
Apr 17, 2021
Messages
523 (0.48/day)
System Name Jedi Survivor Gaming PC
Processor AMD Ryzen 7800X3D
Motherboard Asus TUF B650M Plus Wifi
Cooling ThermalRight CPU Cooler
Memory G.Skill 32GB DDR5-5600 CL28
Video Card(s) MSI RTX 3080 10GB
Storage 2TB Samsung 990 Pro SSD
Display(s) MSI 32" 4K OLED 240hz Monitor
Case Asus Prime AP201
Power Supply FSP 1000W Platinum PSU
Mouse Logitech G403
Keyboard Asus Mechanical Keyboard
The reason nVidia wanted to buy ARM is because they want to heavily invest in ARM's ecosystem just like with the Grace CPU chip. If ARM is owned by them they can directly access all those employees, all that engineering power to enhance their product. You can't just easily hire people yourself with the same expertise. They are not looking for "anti-competitive" advantages.

Another point is money. A good way to look at it is like buying the land around a new subway station. If you are investing billions of dollars in the subway station you can capture that added value in the surrounding properties. In the same way if you are investing billions of dollars in making ARM CPUs you are enhancing ARM the company itself and you can capture that value and make money off it. nVidia is now spending huge amounts towards making ARM the company stronger but not getting all the value of their investment. Someone else owns those surrounding properties. That's not ideal. In fact, now that ARM is not part of nVidia, their best play is to actually offer attractive pay to ARM employees and poach them for their own CPU design efforts. Can't buy the company, buy all the employees instead.

Imagine if Microsoft was a software behemoth without an OS. If they decide to invest billions in making Mac OS software, they might want to own a piece of Apple. In the same way Sony has bought a large stake in Epic for Unreal engine.
 
Joined
May 2, 2017
Messages
7,762 (3.05/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
They also show a third version:
View attachment 240880


I don't think so. Looks like they made 2 mezzanine module types and a dedicated high density board. Also the Grace elements are not going to be launched together with H100 - they are aimed at H1 2023. Which makes sense, they are introducing a whole new CPU platform so keeping the GPUs the same while the CPUs changes makes the transition easier. We still don't know what is going to power the H100 boards in DGX boxes, is it some Intel Xeon with PCIe 5? Is it Zen 4? Or maybe POWER10 with built-in NVLinks? Scratch that, the whitepaper says:
That third board I am 99% sure is for automotive use - previous Nvidia automotive boards have used similar connectors, at least. As for the medium sized one, it's essentially anyone's guess. Given that they're developing proprietary platforms they can do what they want, but I would be very surprised if the CPUs weren't mounted in the same Nvlink array as everything else - longer traces are harder to work with and more expensive, after all. I guess we'll see when they announce the actual products.
The reason nVidia wanted to buy ARM is because they want to heavily invest in ARM's ecosystem just like with the Grace CPU chip. If ARM is owned by them they can directly access all those employees, all that engineering power to enhance their product. You can't just easily hire people yourself with the same expertise. They are not looking for "anti-competitive" advantages.

Another point is money. A good way to look at it is like buying the land around a new subway station. If you are investing billions of dollars in the subway station you can capture that added value in the surrounding properties. In the same way if you are investing billions of dollars in making ARM CPUs you are enhancing ARM the company itself and you can capture that value and make money off it. nVidia is now spending huge amounts towards making ARM the company stronger but not getting all the value of their investment. Someone else owns those surrounding properties. That's not ideal. In fact, now that ARM is not part of nVidia, their best play is to actually offer attractive pay to ARM employees and poach them for their own CPU design efforts. Can't buy the company, buy all the employees instead.

Imagine if Microsoft was a software behemoth without an OS. If they decide to invest billions in making Mac OS software, they might want to own a piece of Apple. In the same way Sony has bought a large stake in Epic for Unreal engine.
Wow, this actually made me laugh out loud. Well done. First, "they're not looking for "anti-competitive" advantages" - why the scare quotes? Are you insinuating that there is no such thing? But then, the real kicker, you go on to aptly metaphorize the exact kind of anticompetitive practices that people have been warning of. This is one of the better self-owns I've seen in quite some time.

So: buying the land around a new public transit hub makes sense why? Because of the perceived demand and increased attractivity of the area allowing you to hike up prices. Buying up the land means forcing out competitors, allowing you to set the rents and prices. This also rests on an (uninterrogated in this case) assumption that hiking up prices because you can is unproblematic, which... Put it this way: if that transit hub got built and nobody started pushing up prices, they would stay the same. There is no inherent causal relation here - that's base level predatory capitalism. The only way this logic applies to ARM is if you are arguing that Nvidia was planning to increase licencing prices. Which, again, would be anticompetitive when they make ARM chips themselves and own the licence, as they would be disadvantaging competitors for their own benefits. I mean, the naivete behind arguing that this isn't anticompetitive is downright staggering.

As for what you're saying about Nvidia "investing in ARM", that is pure nonsense. If they bought ARM and wanted to invest in their R&D, they would need to hire more engineers - more money by itself doesn't create innovation. Where would those come from? Outside of both companies, obviously. And that same source is available to Nvidia now. Heck, Apple, Qualcomm, Mediatek, AMD, Ampere, Amazon and a bunch of others have engineers with experience making high performance ARM designs. And those companies also got their engineers from somewhere.

And Nvidia making fast ARM designs mainly helps Nvidia. Sure, ARM is strengthened by there being more good designs on their licences, but there's no lack of those. Nvidia would just be another one on the list.
 
Joined
Jul 25, 2009
Messages
147 (0.03/day)
Location
AZ
Processor AMD Threadripper 3970x
Motherboard Asus Prime TRX40-Pro
Cooling Custom loop
Memory GSkil Ripjaws 8x32GB DDR4-3600
Video Card(s) Nvidia RTX 3090 TI
Display(s) Alienware AW3420DW
Case Thermaltake Tower 900
Power Supply Corsair HX1200
Im still on the fence about ARM as the new successor in the foreseeable future.

Team blue is investing heavily on RISC-V, from manufacturing them, to supplying them with their own IP.

RISC-V still has advantage that its royalty free.

Moving forward, it depends on how much software gets adopted for which ISA, which would determine which one would dominate.

Only reason ARM is big right now is because of smartphones.
 
Joined
Oct 6, 2021
Messages
1,424 (1.54/day)
That third board I am 99% sure is for automotive use - previous Nvidia automotive boards have used similar connectors, at least. As for the medium sized one, it's essentially anyone's guess. Given that they're developing proprietary platforms they can do what they want, but I would be very surprised if the CPUs weren't mounted in the same Nvlink array as everything else - longer traces are harder to work with and more expensive, after all. I guess we'll see when they announce the actual products.

Wow, this actually made me laugh out loud. Well done. First, "they're not looking for "anti-competitive" advantages" - why the scare quotes? Are you insinuating that there is no such thing? But then, the real kicker, you go on to aptly metaphorize the exact kind of anticompetitive practices that people have been warning of. This is one of the better self-owns I've seen in quite some time.

So: buying the land around a new public transit hub makes sense why? Because of the perceived demand and increased attractivity of the area allowing you to hike up prices. Buying up the land means forcing out competitors, allowing you to set the rents and prices. This also rests on an (uninterrogated in this case) assumption that hiking up prices because you can is unproblematic, which... Put it this way: if that transit hub got built and nobody started pushing up prices, they would stay the same. There is no inherent causal relation here - that's base level predatory capitalism. The only way this logic applies to ARM is if you are arguing that Nvidia was planning to increase licencing prices. Which, again, would be anticompetitive when they make ARM chips themselves and own the licence, as they would be disadvantaging competitors for their own benefits. I mean, the naivete behind arguing that this isn't anticompetitive is downright staggering.

As for what you're saying about Nvidia "investing in ARM", that is pure nonsense. If they bought ARM and wanted to invest in their R&D, they would need to hire more engineers - more money by itself doesn't create innovation. Where would those come from? Outside of both companies, obviously. And that same source is available to Nvidia now. Heck, Apple, Qualcomm, Mediatek, AMD, Ampere, Amazon and a bunch of others have engineers with experience making high performance ARM designs. And those companies also got their engineers from somewhere.

And Nvidia making fast ARM designs mainly helps Nvidia. Sure, ARM is strengthened by there being more good designs on their licences, but there's no lack of those. Nvidia would just be another one on the list.
Well, ARM engineers have not done a good job on the latest cores, in practice small gains and loss of efficiency compared to the predecessor (Something strange for a mobile SOC).
 
Joined
Jul 16, 2014
Messages
8,116 (2.28/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
is it some Intel Xeon with PCIe 5? Is it Zen 4?
I think it would be ironic if its Zen4.

740 is pretty good, even for 144 cores. Anandtech's latest latest Ice Lake Xeon review has a comparison:

So, 2x the score of 2x64 cores of EPYC?


Scratch that: 2x64 threads of EPYC. The 7F53 is a 32c64t chip.

Makes me wonder why they're only quoting integer though - are they thinking that all fp compute be offloaded to the GPUs?


As for the relation of this to the ARM acquisition attempt: this is just more proof that the world dodged a bullet on that one. This is clear indication that if Nvidia controlled ARM and their licences, they would be strongly incentivized to use these in anticompetitive ways to bolster their own performance and competitiveness. The only reason x86 works decently is that the two major are completely bound up in cross-licencing deals, meaning neither can strong-arm the other or limit their access to anything without hurting themselves too. Nothing similar exists for ARM.

Edit: derp, see above.

Edit2: Here you go:

Still good, though I have to wonder how much of this is due to the package being designed specifically for the power delivery and cooling of SXM5 though. EPYC tops out at 280W no matter what; I'd expect this to go higher.
I tired to find those graphs without luck, where did you find those?
 
Joined
May 2, 2017
Messages
7,762 (3.05/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Well, ARM engineers have not done a good job on the latest cores, in practice small gains and loss of efficiency compared to the predecessor (Something strange for a mobile SOC).
I'm not arguing with that, but that is entirely besides the point. There is literally zero reason to expect this would change if Nvidia bought ARM - heck, that merger would likely have taken 5-10 years to complete in the first place. The assumption that ARM would be able to make better core designs if they were bought by Nvidia is just that - an assumption, without any factual basis whatsoever. It's pure speculation, and speculation of a particularly bad kind: naively optimistic speculation, that ignores obvious negative effects of the same thing.
I think it would be ironic if its Zen4.
They used EPYC for their previous generation, but they've also stated that they don't like being reliant on competitors for CPUs, so it's essentially anyone's guess what they'll go for for what looks to be a hold-over generation until Grace arrives. Given the performance difference I would kind of expect Zen4 - Ice Lake servers are good, but not spectacular, and EPYC still has the performance crown. Though the launch timing might make Sapphire Rapids an option? Or maybe they'll go Ampere, to smooth the transition to ARM? There are plenty of options out there regardless.
I tired to find those graphs without luck, where did you find those?
Here :) Here too. Not sure which of the two I used. From some of their most recent reviews in the Enterprise & IT section.
 
Last edited:
Joined
Sep 17, 2014
Messages
20,906 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
Have Nvidia just trumped Apples M1

Certainly in cost per unit and epeen value :)

The reason nVidia wanted to buy ARM is because they want to heavily invest in ARM's ecosystem just like with the Grace CPU chip. If ARM is owned by them they can directly access all those employees, all that engineering power to enhance their product. You can't just easily hire people yourself with the same expertise. They are not looking for "anti-competitive" advantages.

Another point is money. A good way to look at it is like buying the land around a new subway station. If you are investing billions of dollars in the subway station you can capture that added value in the surrounding properties. In the same way if you are investing billions of dollars in making ARM CPUs you are enhancing ARM the company itself and you can capture that value and make money off it. nVidia is now spending huge amounts towards making ARM the company stronger but not getting all the value of their investment. Someone else owns those surrounding properties. That's not ideal. In fact, now that ARM is not part of nVidia, their best play is to actually offer attractive pay to ARM employees and poach them for their own CPU design efforts. Can't buy the company, buy all the employees instead.

Imagine if Microsoft was a software behemoth without an OS. If they decide to invest billions in making Mac OS software, they might want to own a piece of Apple. In the same way Sony has bought a large stake in Epic for Unreal engine.

Valantar said it all :D What you're describing as 'not ideal' is the whole problem of hypercapitalism and the reason why we have regulators and parties that can block such mergers. The actual fact is, consolidation like that has already gone on for far too long and competition is dying because of it, (multinational) corporations have gained power over governments and with that, our entire society and system of democracy suffers. Microsoft is already under a magnifying glass wrt what they do in and with Windows, precisely because they have such a major market influence that nobody can realistically compete with. It is because of this that they can push their cloud ahead of everyone else - not because its the best software solution or service, but because whole generations have grown up knowing the company and its OS. An idea that has absolutely nothing to do with 'free markets' and 'fair competition'. Similar things apply to Intel that can push arguably worse product out and still come out winning YoY - this is actually happening right now.

What you are describing is literally a dystopia and nodding in agreement as if that is the way to do business. A winner takes all mentality that is fundamental to many, if not most issues we have today, casually omitting that for every winner, someone else has to lose. Its not a world you'll like living in, that's guaranteed.
 
Last edited:
Joined
May 2, 2017
Messages
7,762 (3.05/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Taking a look at their video, some of the claims about Grace seem ... well, dubious. They say "We expect the Grace superchip to be the highest performance, and twice the energy efficiency of the best CPU at that time." (Around 42:10 in the video.) Now, there's leeway here - if "the best CPU" means the fastest, then that's likely not (by far) the most efficient of the competition. Maybe they have access to early Sapphire Rapids samples, and know they perform great but gobble down power? That's an entirely plausible explanation for their numbers, though that would be pretty questionable still: 2x280W EPYC a year ago delivered 72.6% of that performance. I mean ... that's literally already invalidating those claims. That 2x280W EPYC at 537.3 SPECint delivers 0.96 SPECintRate/W, while the estimated 500W Grace Superchip at 740SPECintRate delivers 1,48SPECintRate/W. That's way less than 2x the efficiency - it's just 54% better, and that includes a full node advantage in Nvidia's estimates. And at the time Grace hits the market, EPYC will be on Zen4 (and possibly Zen4c). If Zen4 delivers a 15% IPC increase, keeps the same clocks and power consumption, they'll hit ~618SPECintRate for 128c256t 560W, leaving Nvidia with a 34% efficiency advantage. That's still good, but far from a 2x efficiency advantage - and given that Zen4 will be 5nm (if not some special version like what Nvidia is using here), it's likely for there to be either power reductions, clock increases, or core count increases in the same power envelope at the same clocks.

Looking a bit closer at AT's EPYC testing, they're actually reporting less than 280W real-world power draw for those EPYCs under load too - 265W package power for the EPYC 7763 on a Gigabyte production motherboard (as opposed to early tests run on an updated older-generation motherboard that had elevated idle and I/O power for some reason). They list 1.037 SPECintRate/W, which leaves the Grace at just a 42% advantage against a year-old platform.

Now, as I said, it's quite possible that Nvidia knows Sapphire Rapids will kick butt, but gobble down power. Let's say it comes close, maybe 700 SPECintRate for a 2S system? It would need to consume 945W for that to be half the efficiency of 740 at 500W. Is that even remotely plausible? I have no idea. It would certainly only be feasible with water cooling, but that's true for Grace as well, as the renders show.

This is still really impressive for a first-generation product, but ... yeah, it's not blowing anything out of the water.

They also show a third version:
View attachment 240880


I don't think so. Looks like they made 2 mezzanine module types and a dedicated high density board. Also the Grace elements are not going to be launched together with H100 - they are aimed at H1 2023. Which makes sense, they are introducing a whole new CPU platform so keeping the GPUs the same while the CPUs changes makes the transition easier. We still don't know what is going to power the H100 boards in DGX boxes, is it some Intel Xeon with PCIe 5? Is it Zen 4? Or maybe POWER10 with built-in NVLinks? Scratch that, the whitepaper says:
Looking at the video again, it seems that (thanks to those massive interconnects of theirs) they're actually separating out the CPUs to a separate board, so those Grace superchips or Grace-Hopper superchips live entirely separately from the SXM grid where the GPUs live. Seems like that carrier board is some entirely other form factor. It's really impressive that they have the bandwidth and latency to do so, and it lets them make some really flexible configurations.
 
Joined
May 2, 2017
Messages
7,762 (3.05/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
This is an interesting observation from ServeTheHome's coverage (my emphasis):
For some context, the AMD EPYC 7773X Milan-X that we just looked at we are getting over 824 in our SPECrate2017_int_base already, and there is room for improvement. Those two AMD chips have 1.5GB of L3 cache (likely NVIDIA’s 396MB is including more.) While AMD has two 280W TDP parts, and they do not include the 1TB and 1TB/s LPDDR5X memory subsystem, nor NVLink/ PCIe Gen5, they are also available today with up to 160 PCIe Gen4 lanes in a system. As a 2023 500W TDP CPU, if NVIDIA does not have significant acceleration onboard, the Grace CPU Superchip has the Integer compute of 2020 AMD CPUs and the 2021 Ampere Altra Max at 128 cores is already right at that level. The bottom line is, it would be very hard to green-light the Grace CPU Superchip unless it has some kind of heavy acceleration because it is not going to be competitive as a 2023 part versus 2022 parts on its integer performance. Still, there is “over 740” which means that 1400 could be “over 740” but it is a different claim to make.
This is ... well, interesting. Whether Nvidia's numbers are based on extrapolating from early silicon at (much) lower clocks or software simulations, and even if such estimates are nearly always on the low side, that number still seems weirdly low. Sadly it seems STH doesn't actually mention their SPEC testing in their Milan-X coverage (they're more focused on individual real-world benchmarks), but apparently they've still run it. The increase over AT's Milan testing is also interesting, though if I were to guess I'd expect STH to compile their SPEC tests with more per-platform optimizations than AT does.

They also say about that three-chip board that I thought might be for automotive that "this may just be a 2023-era supercomputer building block that we are looking at," speculating that it looks designed for a specific chassis and doesn't look like the aesthetically pleasing, symmetrical concept boards typically presented for things like this. That would definitely be an interesting use of this - though that also begs the question of who this design is for.
 
Top