• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Claims Grace CPU Superchip is 2X Faster Than Intel Ice Lake

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,229 (0.91/day)
When NVIDIA announced its Grace CPU Superchip, the company officially showed its efforts of creating an HPC-oriented processor to compete with Intel and AMD. The Grace CPU Superchip combines two Grace CPU modules that use the NVLink-C2C technology to deliver 144 Arm v9 cores and 1 TB/s of memory bandwidth. Each core is Arm Neoverse N2 Perseus design, configured to achieve the highest throughput and bandwidth. As far as performance is concerned, the only detail NVIDIA provides on its website is the estimated SPECrate 2017_int_base score of over 740. Thanks to the colleges over at Tom's Hardware, we have another performance figure to look at.

NVIDIA has made a slide about comparison with Intel's Ice Lake server processors. One Grace CPU Superchip was compared to two Xeon Platinum 8360Y Ice Lake CPUs configured in a dual-socket server node. The Grace CPU Superchip outperformed the Ice Lake configuration by two times and provided 2.3 times the efficiency in WRF simulation. This HPC application is CPU-bound, allowing the new Grace CPU to show off. This is all thanks to the Arm v9 Neoverse N2 cores pairing efficiently with outstanding performance. NVIDIA made a graph showcasing all HPC applications running on Arm today, with many more to come, which you can see below. Remember that NVIDIA provides this information, so we have to wait for the 2023 launch to see it in action.


View at TechPowerUp Main Site | Source
 

ARF

Joined
Jan 28, 2020
Messages
3,947 (2.55/day)
Location
Ex-usa
It has been known for a while that the ARM architecture is more efficient than the x86 architectures. This is the reason why all of our smartphones run ARM chips and not x86 chips..
 
Joined
Jul 5, 2013
Messages
25,559 (6.48/day)
It has been known for a while that the ARM architecture is more efficient than the x86 architectures. This is the reason why all of our smartphones run ARM chips and not x86 chips..
While that is true, X86 often can do more in less time than ARM. There are trade offs with each. ARM is taylor-made for simple instruction work-loads. X86/X65 is made for heavy, complex instruction work-loads.
 
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
Twice the cores, 1TB/s bandwidth vs 0.2TB/s. In a use case that definitely prefers bandwidth, RAM bandwidth for sure but the nice interconnect in the Grace CPU Superchip might also come quite handy.

It has been known for a while that the ARM architecture is more efficient than the x86 architectures. This is the reason why all of our smartphones run ARM chips and not x86 chips.
Are they? Atoms have been quite on par for a while and top of the line SoCs currently beat x86 with a manufacturing process that is a gen or two ahead which plays a hell of a lot larger part in mobile than it does otherwise. The reason why all our smartphones run ARM chips are not so much technical reasons but have a lot to do with ARM being cheaper and open (which in this context is not necessarily a good thing).
 
Joined
Sep 6, 2013
Messages
2,978 (0.77/day)
Location
Athens, Greece
System Name 3 desktop systems: Gaming / Internet / HTPC
Processor Ryzen 5 5500 / Ryzen 5 4600G / FX 6300 (12 years latter got to see how bad Bulldozer is)
Motherboard MSI X470 Gaming Plus Max (1) / MSI X470 Gaming Plus Max (2) / Gigabyte GA-990XA-UD3
Cooling Νoctua U12S / Segotep T4 / Snowman M-T6
Memory 16GB G.Skill RIPJAWS 3600 / 16GB G.Skill Aegis 3200 / 16GB Kingston 2400MHz (DDR3)
Video Card(s) ASRock RX 6600 + GT 710 (PhysX)/ Vega 7 integrated / Radeon RX 580
Storage NVMes, NVMes everywhere / NVMes, more NVMes / Various storage, SATA SSD mostly
Display(s) Philips 43PUS8857/12 UHD TV (120Hz, HDR, FreeSync Premium) ---- 19'' HP monitor + BlitzWolf BW-V5
Case Sharkoon Rebel 12 / Sharkoon Rebel 9 / Xigmatek Midguard
Audio Device(s) onboard
Power Supply Chieftec 850W / Silver Power 400W / Sharkoon 650W
Mouse CoolerMaster Devastator III Plus / Coolermaster Devastator / Logitech
Keyboard CoolerMaster Devastator III Plus / Coolermaster Devastator / Logitech
Software Windows 10 / Windows 10 / Windows 7
I thought everything today is twice as fast and twice more efficient than Ice lake Xeons.
 
Joined
Oct 12, 2005
Messages
682 (0.10/day)
Twice the cores, 1TB/s bandwidth vs 0.2TB/s. In a use case that definitely prefers bandwidth, RAM bandwidth for sure but the nice interconnect in the Grace CPU Superchip might also come quite handy.

Are they? Atoms have been quite on par for a while and top of the line SoCs currently beat x86 with a manufacturing process that is a gen or two ahead which plays a hell of a lot larger part in mobile than it does otherwise. The reason why all our smartphones run ARM chips are not so much technical reasons but have a lot to do with ARM being cheaper and open (which in this context is not necessarily a good thing).

Well, ARM could be more efficient if our CPU were super simple. But theses days, the power impact of running arm binary versus x86-64 binary is marginal.

For CPU that have the same goal, the uArch have minimal impact and it's really the design of the CPU that matter. People have compared High performance CPU efficiency with Cellphone CPU and declared that Arm was more efficient. But when Arm design look at high performance, they already start to be way less power efficient. The M1 by example had to run on a more advanced node than Intel and AMD to stay barely ahead. We will see how comparable Architecture on the same nodes or similar nodes (raptor lake, Zen4) will do against M1.

Like Jim Keller said, The uArch do not really matter a lot these days as CPU are so complex. Once the instruction get decoded, it's a flat field for everyone and arm have no specific advantage after that. The overhead to decode x86-64 vs arm is not significant for a complex CPU like what we have today.

Also Nvidia is joining the trend of comparing old thing with things not even released yet. If That get release in a year from now, icelake xeons will be 2 years old at that point. So i hope that it will be better. (and it's convenient that they don't compare it against EPYC, the actual performance leader right now.)
 
Joined
Jan 8, 2017
Messages
8,931 (3.35/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
It has been known for a while that the ARM architecture is more efficient than the x86 architectures. This is the reason why all of our smartphones run ARM chips and not x86 chips..
It hardly has anything to do with the fact that it's ARM or x86. You want to prioritize efficiency, you make an architecture that prioritizes efficiency, what ISA it uses is of no real importance.
 
Joined
Nov 30, 2021
Messages
128 (0.15/day)
Location
USA
System Name Star Killer
Processor Intel 9900K
Motherboard Gigabyte Aorus Master z390 v.2
Cooling Lian Li Galahad 360mm AIO
Memory 32gb Gskill Trident Z RGB 4000mhz 4x8
Video Card(s) MSI RTX 3080 12GB Gaming Trio
Storage 1TB Samsung 980 x 1 | 1TB Crucial Gen 4 SSD x 1 | 500GB Samsung 980 SSD | 1TB Teamgroup SSD x 1
Display(s) 35inch Asus TUF 1440p 100hz UWQHD, 32 inch lenovo legion 1440p 144hz.
Case Lian Li O11D White
Audio Device(s) Logitech z333 speakers, G733 wireless surround sound headset
Power Supply EVGA 1000watt G6 Gold
Mouse Razer Basalisk, Logitech g600 mmo, Logitech g602 wireless
Keyboard Corsair K65 60% Keyboard (Cherry Silver Speeds) | Custom Varmillo keyboard
Software Windows 10 pro
Benchmark Scores Cinnebench R23 26,453
The thing I hate most about Tech Companies is their graphs and fancy charts. What the heck is a "Traditional CPU" Show us some real numbers instead of a tailored chart. Literally nobody buys a product based on their fake info so why not just give us a real number?
 
Joined
Aug 6, 2020
Messages
729 (0.54/day)
Well, ARM could be more efficient if our CPU were super simple. But theses days, the power impact of running arm binary versus x86-64 binary is marginal.

For CPU that have the same goal, the uArch have minimal impact and it's really the design of the CPU that matter. People have compared High performance CPU efficiency with Cellphone CPU and declared that Arm was more efficient. But when Arm design look at high performance, they already start to be way less power efficient. The M1 by example had to run on a more advanced node than Intel and AMD to stay barely ahead. We will see how comparable Architecture on the same nodes or similar nodes (raptor lake, Zen4) will do against M1.

Like Jim Keller said, The uArch do not really matter a lot these days as CPU are so complex. Once the instruction get decoded, it's a flat field for everyone and arm have no specific advantage after that. The overhead to decode x86-64 vs arm is not significant for a complex CPU like what we have today.

Also Nvidia is joining the trend of comparing old thing with things not even released yet. If That get release in a year from now, icelake xeons will be 2 years old at that point. So i hope that it will be better. (and it's convenient that they don't compare it against EPYC, the actual performance leader right now.)


When do you expect to see Sapphire Rapids Server Chips?

Oh yeah, they haven't announced shit

Best-case, Sapphire will arrive at the end of the year (mass-availability halfway through 2023!)
 
Last edited:

aQi

Joined
Jan 23, 2016
Messages
645 (0.21/day)
While that is true, X86 often can do more in less time than ARM. There are trade offs with each. ARM is taylor-made for simple instruction work-loads. X86/X65 is made for heavy, complex instruction work-loads.
Does that apply to RISC vs CISC architecture differences as well ?
 

BrainChild510

New Member
Joined
Apr 1, 2022
Messages
5 (0.01/day)
Location
San Jose
With NVIDIA sprinting to develop a competitor to Intel's CPU and Intel running up behind AMD/NVIDIA with their own GPU's; which company do you think will reign supreme in terms of sheer compute performance? Looking forward to the Apples to Apples benchmark comparison! My guesses LTT will be one of the first to test them both side by side, but when and Which hardware cpu/gpu components do you think will lead the pack and why?
 
Joined
Jan 27, 2015
Messages
1,065 (0.32/day)
System Name loon v4.0
Processor i7-11700K
Motherboard asus Z590TUF+wifi
Cooling Custom Loop
Memory ballistix 3600 cl16
Video Card(s) eVga 3060 xc
Storage WD sn570 1tb(nvme) SanDisk ultra 2tb(sata)
Display(s) cheap 1080&4K 60hz
Case Roswell Stryker
Power Supply eVGA supernova 750 G6
Mouse eats cheese
Keyboard warrior!
Benchmark Scores https://www.3dmark.com/spy/21765182 https://www.3dmark.com/pr/1114767
The thing I hate most about Tech Companies is their graphs and fancy charts. What the heck is a "Traditional CPU" Show us some real numbers instead of a tailored chart.
Literally nobody buys a product based on their fake info so why not just give us a real number?
though the post here mentioned an icelake xeon, tom's article is pretty descriptive: (purposely editing in the transitional sentence :p )
(Beware, this is a vendor-provided benchmark result and is based on a simulation of the Grace CPU, so take Nvidia's claims with a grain of salt.)
. . .
And make no mistake, that enhanced memory throughput plays right to the strengths of the Grace CPU Superchip in the Weather Research and Forecasting (WRF) model above. Nvidia says that its simulations of the 144-core Grace chip show that it will be 2X faster and provide 2.3X the power efficiency of two 36-core 72-thread Intel 'Ice Lake' Xeon Platinum 8360Y processors in the WRF simulation. That means we're seeing 144 Arm threads (each on a physical core), facing off with 144 x86 threads (two threads per physical core).

The various permutations of WRF are real-world workloads commonly used for benchmarking, and many of the modules have been ported over for GPU acceleration with CUDA. We followed up with Nvidia about this specific benchmark, and the company says this module hasn't yet been ported over to GPUs, so it is CPU-centric. Additionally, it is very sensitive to memory bandwidth, giving Grace a leg up in both performance and efficiency. Nvidia's estimates are "based on standard NCAR WRF, version 3.9.1.1 ported to Arm, for the IB4 model (a 4km regional forecast of the Iberian peninsula)."

TLDR: it can move data fast. it really says nothing about ipc and consider nvidia is experienced enough via CUDA (they might know some [more?] shortcuts) so forget that.

those guys/gals in marketing get paychecks to earn and simulators to play with. :p
 
Joined
Oct 12, 2005
Messages
682 (0.10/day)
When do you expect to see Sapphire Rapids Server Chips?

Oh yeah, they haven't announced shit

Best-case, Sapphire will arrive at the end of the year (mass-availability halfway through 2023!)
Sapphire rapid will start to ship soon as per intel as the production is currently ramping up. Nvidia like they did with Hopper just did paper launch to try to stay in the lead.

They compare unreleased and not even taped out chip with things that are already available on the market for quite some time.

Company that does that (Not only Nvidia, Intel and AMD did too in the past) are generally company that know they will fall behind and try to get some hype before competitors get released. Company that get huge lead try to get a huge launch day that have impact to get as much as possible mind share.

And that do not resolve the main issue, they are comparing with a year old second place CPU, the lead is currently taken by Milan and Milan-X and at that time, we will have Genoa (Zen-4) available.

Does that apply to RISC vs CISC architecture differences as well ?
The thing is there are very few real RISC cpu. If you look at how large Arm instruction set grow, it's hard to declare that still a reduced instruction set.

Probably the slight advantages Arm could have for simplicity over x86-64 is the fixed instruction length versus variable instruction length of x86. This allow a bit simpler front end instruction decoding. But again, this have a marginal impact on the overall CPU these days because CPU are huge, massive and complex. If things were much simpler, like in the 90s and early 2000, that could actually made a significant difference. Things were much simpler.
 
Joined
Dec 28, 2012
Messages
3,478 (0.84/day)
System Name Skunkworks
Processor 5800x3d
Motherboard x570 unify
Cooling Noctua NH-U12A
Memory 32GB 3600 mhz
Video Card(s) asrock 6800xt challenger D
Storage Sabarent rocket 4.0 2TB, MX 500 2TB
Display(s) Asus 1440p144 27"
Case Old arse cooler master 932
Power Supply Corsair 1200w platinum
Mouse *squeak*
Keyboard Some old office thing
Software openSUSE tumbleweed/Mint 21.2
Oh yeah, we've heard THESE claims before. :rolleyes: It'll turn out to be in one specific benchmark that's be re-optimized for these specific chips and in no way reflects real world performance.

Until nvidia produces working chips that can be bought and verified by third parties I'mma call this a big fat LIE.
 
Joined
Apr 24, 2020
Messages
2,560 (1.75/day)
It hardly has anything to do with the fact that it's ARM or x86. You want to prioritize efficiency, you make an architecture that prioritizes efficiency, what ISA it uses is of no real importance.

This here, at least for 90% of situations.

Intel vs AMD vs Via/Centaur CPUs shows just how much you can vary CPU-performance and CPU-power-usage.

Similarly: ARM N1-core vs Apple M1 vs Fujitsu a64fx are all different CPU-performance vs CPU-power-usage. Fujitsu A64fx is a freaking GPU-like design of all things (heavily focused on 512-bit SVE instructions) and is the #1 beast in the world for supercomputers currently, but has strict RAM-limitations because its stuck on HBM2 RAM (so ~64GBs of RAM per chip), while other DDR4 or LPDDR5 CPU chips have access to more RAM.

-----

The 10% that matters comes down to memory-model details that almost all programmers are ignorant of. If you know about load-acquire and store-release, maybe you'll like the ARM-instruction set over the x86 instruction set. But this is extremely, extremely niche and irrelevant in the vast, vast, vast majority of programs. In fact, x86's slightly better designed AES/Crypto functions are probably more important in practice. "AESENC" single-instruction to encrypt on x86, while on ARM you gotta "AESE + AESMC" (2-instructions per AES-loop).

-------

ARM N1 / N2 / V1 even don't seem to be as good as current-generation AMD-EPYC or Intel designs IMO. Apple's M1 is the only outstanding design, but even then the M1 has a lot of tradeoffs (absolutely HUGE core, bigger than anyone else's. Apple's M1 is so physically huge it won't scale to higher core counts very easily)

Well... okay. Fujitsu A64fx is an incredible ARM-based design for supercomputers. But almost nobody wants an ARM-chip grafted onto HBM2e RAM. That's just too niche.
 
Last edited:

ARF

Joined
Jan 28, 2020
Messages
3,947 (2.55/day)
Location
Ex-usa
It hardly has anything to do with the fact that it's ARM or x86. You want to prioritize efficiency, you make an architecture that prioritizes efficiency, what ISA it uses is of no real importance.

While that is true, X86 often can do more in less time than ARM. There are trade offs with each. ARM is taylor-made for simple instruction work-loads. X86/X65 is made for heavy, complex instruction work-loads.

The x86 is overloaded with too many instruction sets: MMX, MMX+, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, x86-64, AMD-V, AES, AVX, AVX2, FMA3, SHA.
I guess this is why Intel wanted or is still in design stage of a brand new x86 architecture which will delete all those legacy modes and make the transistors work on modern apps.
 
Joined
Jul 5, 2013
Messages
25,559 (6.48/day)
Does that apply to RISC vs CISC architecture differences as well ?
That's exactly what I'm talking about. RISC=Reduced Instruction Set Computing. CISC=Complex Instruction Set Computing.
The great thing about RISC is that it's very efficient when the compiled code is properly optimized. When it's not, instructions not properly coded/optimized have to be completed in software instead of hardware, which is MUCH slower. CISC is not as efficient, but most compiled code can run on hardware instead of in software. It's FAR more complicated than this brief explanation, but you get the general idea.

Which ISA standard you choose will depend greatly on what you want your code to do and how fast.

The thing is there are very few real RISC cpu. If you look at how large Arm instruction set grow, it's hard to declare that still a reduced instruction set.
That would not be correct. Yes, RISC SOCs are more complex than they were in the past, but so too is CISC. RISC CPU hardware instructions have about doubled in last 20 years. CISC(X86/X64) has increased at least quadruple in the same amount of time.

So while ARM designs and instructions have become more complex, they are still very much "reduced" in comparison to X86/X64 and even PowerPC.
 
Last edited:
  • Like
Reactions: aQi

ARF

Joined
Jan 28, 2020
Messages
3,947 (2.55/day)
Location
Ex-usa
That's exactly what I'm talking about. RISC=Reduced Instruction Set Computing. CISC=Complex Instruction Set Computing.
The great thing about RISC is that it's very efficient when the compiled code is properly optimized. When it's not, instructions not proper coded/optimized have to be completed in software instead of hardware, which is MUCH slower. CISC is not as efficient, but most compiled code can run on hardware instead of in software. It's FAR more complicated than this brief explanation, but you get the general idea.

Which ISA standard you choose will depend greatly on what you want your code to do and how fast.


That would not be correct. Yes, RISC SOCs are more complex than they were in the past, but so too is CISC. RISC CPU hardware instructions have about doubled in last 20 years. CISC(X86/X64) has increased at least quadruple in the same amount of time.

So while ARM designs and instructions have become more complex, they are still very much "reduced" in comparison to X86/X64 and even PowerPC.

I guess it is easier and faster to pay for good software developers than to AMD or Intel to design good semiconductors.
 
Joined
Jul 5, 2013
Messages
25,559 (6.48/day)
The x86 is overloaded with too many instruction sets: MMX, MMX+, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, x86-64, AMD-V, AES, AVX, AVX2, FMA3, SHA.
Overloaded is one term. I just call it CISC.
I guess this is why Intel wanted or is still in design stage of a brand new x86 architecture which will delete all those legacy modes and make the transistors work on modern apps.
Never gonna happen. WAY too much software that need legacy ISA support, even modern apps.
 
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
The x86 is overloaded with too many instruction sets: MMX, MMX+, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, x86-64, AMD-V, AES, AVX, AVX2, FMA3, SHA.
So, that Neoverse N2 core in Grace CPU. It has ARMv9A A32/T32/A64, RAS, SVE, SVE2 (with backwards compatibility to NEON), also stuff like TMR, CCA and MME. This is likely not an exhaustive list. x86 has older and more iterated extensions like the entire MMX/SSE/AVX thread where new ones partially replaced the previous extensions, ARM has had some other extensions for the same goals but they basically standardized it with NEON/SVE/SVE2.
 
Last edited:
Joined
Apr 24, 2020
Messages
2,560 (1.75/day)
The x86 is overloaded with too many instruction sets: MMX, MMX+, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, x86-64, AMD-V, AES, AVX, AVX2, FMA3, SHA.
I guess this is why Intel wanted or is still in design stage of a brand new x86 architecture which will delete all those legacy modes and make the transistors work on modern apps.

Erm... ARM has a ton of instruction sets too? Or did you forget that ARM literally didn't have "division" or "modulus" and that division/modulus are extensions on top of its original instruction set? ARM literally had to add AES and SHA instructions to their CPUs because guess what? Modern "https" systems use AES / SHA so often that it makes sense to change our CPU-cores to specifically include "SHA" instructions or "AES" instructions.

EDIT: There's a reason why its called "ARMv8" (as well as ARMv8.1, ARMv8.2, ARMv8.3...), because there was ARMv1, ARMv2, ARMv3... ARMv7. And we have to also ignore all the death paths, like Jazelle instructions (aka: Java-instructions for ARM), Thumb-v1, Thumb-v2, etc. etc.

Even the SSE / AVX mistake is being repeated by ARM yet again, because ARM made NEON-instructions (128-bit) when Intel/AMD were working on 256-bit AVX. These NEON instructions are now obsolete as ARM is working on SVE.

Do you even work with ARM instructions? ARM is a CISC processor at this point. Do you know what the ARM "fjcvtzs" instruction does? Do you know the history of this?

--------

RISC vs CISC has played out. CISC won. All RISC-instruction sets are glorified CISC processors with macro-op fusion (ex: aese + aesmc instructions in ARM, merging two instructions to make a macro-op), SIMD-instructions (NEON and its various incarnations), multiple memory models (lol, ARMv7 started with load-consume / store-release, turned out to be an awful memory model so ARMv8 had to introduce a whole slew of new load/store commands called load-acquire / store-release), etc. etc.

CPUs are very hard. They have to constantly change their instruction sets. ARM, x86, etc. etc. The only CPUs that don't change are dead ones (ex: MIPS, may it rest in peace). CPUs are all turd piled up on more turd being used as lipsticks on very ugly pigs. ARM tried to be RISC but has effectively turned into a giant mess of a core, much like x86 has. Everything turns into CISC as time goes on, that's just the nature of this industry.

--------

EDIT: Instruction sets become complicated over time because its really easy to decode instructions compared to everything else the CPU does. CPUs today are super-scalar (multiple-instructions simultaneously executing per clock cycle, as much as 8x instructions per clock on Apple's M1 chip), hyperthreaded (each core works with 2, 4, 8 threads at a time), pipelined (each instruction gets split up into 30+ steps for other bits of the processor to handle), out-of-order (literally executing "later" instructions before "earlier" instructions), cache-coherent snooping (spies on other CPU cores to see their memory reads/writes to automatically adjust their understanding of memory) complicated beasts.

This whole RISC vs CISC thing is a question of how complicated of a decoder you wanna make. But decoders aren't even that big on today's CPUs, because everything else the CPU does is far more complicated, and costly in terms of area/power/RAM/price/silicon. I think I can safely declare RISC vs CISC to be a dead discussion. Today's debate is really CPU vs GPU (or really: single-threaded with a bit of SIMD like x86/ARM/POWER... vs SIMD-primarily like Turing/RDNA2)
 
Last edited:

ARF

Joined
Jan 28, 2020
Messages
3,947 (2.55/day)
Location
Ex-usa
Yeah, I understand everything you wrote but in the end the RISC ARM Qualcomm SM8150 Snapdragon 855 is 5 watts and is as fast as 15 watts Ryzen U.

Qualcomm Snapdragon 855 - Benchmark, Test and specs (cpu-monkey.com)

1649710305673.png


Qualcomm Snapdragon 855 SoC - Benchmarks and Specs - NotebookCheck.net Tech
 

Attachments

  • 1649710262869.png
    1649710262869.png
    176 KB · Views: 57
Joined
Apr 24, 2020
Messages
2,560 (1.75/day)
Yeah, I understand everything you wrote but in the end the RISC ARM Qualcomm SM8150 Snapdragon 855 is 5 watts and is as fast as 15 watts Ryzen U.

Are you sure?

1649714632411.png


Benchmarks are surprisingly inconsistent these days. There's a lot of conflicting reports and conflicting information. My opinion of stock-ARM is pretty low actually. Neoverse looks like they're decent cores, but they're still a bit out of date. ARM from Apple / Fujitsu are world-class processors though.

I'm all for good competition. But there's a reason why the computer industry has continued to use Intel Xeon / AMD EPYC in power-constrained datacenter workloads. Because in practice, AMD EPYC is the most power-efficient system in practice (followed by Intel Xeons as the #2. Very close competition, but AMD has the lead this year).
 
Joined
Dec 23, 2021
Messages
23 (0.03/day)
System Name SunMaster special
Processor 5950x
Motherboard Gigabyte X570 Aorus Master
Cooling Arctic Cooling Liquid Freezer 420 AIO
Memory 4x16GB@3800
Video Card(s) Nvidia 970
Storage WD Black SN850 512GB
Display(s) 2x Philips BDM3270
Case Fractal Meshify 2 XL
Power Supply EVGA Supernova GA 850
Keyboard Corsair K95
Software Windows 11
Yeah, I understand everything you wrote but in the end the RISC ARM Qualcomm SM8150 Snapdragon 855 is 5 watts and is as fast as 15 watts Ryzen U.

Qualcomm Snapdragon 855 - Benchmark, Test and specs (cpu-monkey.com)

View attachment 243267

Qualcomm Snapdragon 855 SoC - Benchmarks and Specs - NotebookCheck.net Tech

The Ryzen 3k series are 12nm 4-core cpus (with HT) - which you are comparing to Qualcomms 8-core 7nm cpu. I don't think that's comparing apples to apples.
 
Joined
Apr 24, 2020
Messages
2,560 (1.75/day)
The Ryzen 3k series are 12nm 4-core cpus (with HT) - which you are comparing to Qualcomms 8-core 7nm cpu. I don't think that's comparing apples to apples.

To be fair though... Qualcomm 855 is a 1-very-big-core + 3-big-core + 4-small core system. I'd personally describe the Qualcomm 855 to be a 4-core system frankly.

But in any case, the process difference (12nm vs 7nm) is pretty big, that's a 50% cut in power IIRC, so that's a valuable point to bring up. Manufacturing differences is the big reason why we techies are talking about nanometers so much...

-------

IIRC, there was something about cell-phones disabling their power-limiters when they detected Geekbench (!!!!), so there's also the lack of apples-to-apples when it comes to benchmarking. Don't trust the specs, something can be a 5W CPU but will disable its power-limiter to draw 10 or 15 watts during a benchmark temporarily. This leads to grossly different performance characteristics when different people run different benchmarks on their own systems.

The only way to get the truth is to hook up wires and measure the power-usage of the CPU (or system) during a benchmark, like what's done here on TPU (or also on Anandtech and other online testing sites). I don't really trust random benchmark numbers on the internet anymore.
 
Top