• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

IPC Comparisons Between Raptor Cove, Zen 4, and Golden Cove Spring Surprising Results

Dont think thats true. 5800x has one ccd measuring 85mm2. 5950x has 2 ccds measuring double that.

When it comes to cooling, what i mean by easy or hard to cool is iso wattage with the same cooler. With a tdp of 170w the 7950x will probably go above 200w even at stock.
The cores per ccd is the same (8), if anything the 5950x should put out more heat since it has two of them but the opposite is tru as AMD refined their manufacturing process.
By hard to cool eveyrone compares similar coolers in similar use conditions, not watt per watt.
Also if the TDP is 170W it will do at most that at stock.
 
The cores per ccd is the same (8), if anything the 5950x should put out more heat since it has two of them but the opposite is tru as AMD refined their manufacturing process.
By hard to cool eveyrone compares similar coolers in similar use conditions, not watt per watt.
Also if the TDP is 170W it will do at most that at stock.
You don't understand the fundamental parts of thermodynamics. The 5950x is power limited to the same wattage as the 5800x,but that wattage is spread out to double the die size, thats why its easier to cool.

Zen 4 has an even smaller die size, but even higher power draw, which will make it way harder to cool than zen 3. On the other hand Raptor will have a bigger die size than alderlake but similar power draw, which makes it easier. Assuming the zen 4 rumors are true and the 7950x draws north of 200w, it will be way harder to cool than the 13900k at 250watts. Thats just physics
 
You don't understand the fundamental parts of thermodynamics. The 5950x is power limited to the same wattage as the 5800x,but that wattage is spread out to double the die size, thats why its easier to cool.

Zen 4 has an even smaller die size, but even higher power draw, which will make it way harder to cool than zen 3. On the other hand Raptor will have a bigger die size than alderlake but similar power draw, which makes it easier. Assuming the zen 4 rumors are true and the 7950x draws north of 200w, it will be way harder to cool than the 13900k at 250watts. Thats just physics
The die size is the same, the 5950x uses two 8 core chiplets while the 5800x uses one of them.
We'll see once the chips are out, but something tells me the 13900k will be another miniature stove while the 7950x will be reasonable.
 
The die size is the same, the 5950x uses two 8 core chiplets while the 5800x uses one of them.
We'll see once the chips are out, but something tells me the 13900k will be another miniature stove while the 7950x will be reasonable.
You are confusing the ihs with the die. The ihs is the same yes, the die isn't. The 5950x has 2 ccds of 85mm2 each. The 5800x has one.
 
Raptor cove still superior on an older node.. intel architecture is more advanced
But AMD is matching Intel's performance using significantly fewer transistors so clearly AMD is still superior.

The reality is they are both very different and it looks like both have good designs and AMD and Intel will pretty much directly competing overall.
Those scores are pretty close if not within the margin of error. It's like splitting hairs here... I also think bios immaturity with RPL could be a handicap.
Its just one test but it is pretty insane just how close these very different architectures perform when normalized at the same clock, I would not have expected that at all.
 
IPC is a constant (and depends on task), and it is independent of core frequency (and why you multiple both together to approximate performance FYI).

The higher the core frequency, the higher it will be scewed by buses/IMC/DRAM performance, and higher chance of throttling based on cooling/power requirements …
This is a typical misconception.
Real IPC is a constant and is given by the architectural design, it's the architecture's ability to process instructions across "any" workload, and is measured in clocks. Real IPC isn't possible for us to measure, so we approximate it by locking clock speed far below any throttling point, choosing memory hopefully fast enough not to cause a bottleneck, and hopefully selecting a good amount of workloads able to saturate a single core. What we get is a relative IPC, which is an approximation, and the quality of this approximation is dependent on the aforementioned factors which will affect the benchmark scores.
 
This is a typical misconception.
Real IPC is a constant and is given by the architectural design, it's the architecture's ability to process instructions across "any" workload, and is measured in clocks. Real IPC isn't possible for us to measure, so we approximate it by locking clock speed far below any throttling point, choosing memory hopefully fast enough not to cause a bottleneck, and hopefully selecting a good amount of workloads able to saturate a single core. What we get is a relative IPC, which is an approximation, and the quality of this approximation is dependent on the aforementioned factors which will affect the benchmark scores.
How do you account for the fact that different instructions take different number of cycles to execute, from zero (sometimes, if the front end manages to fuse two instructions into one micro-op) to several tens (division, whose time to execute also depends of the actual data being divided)?
How do you account for the fact that, as an example, a Skylake core can do four non-vector additions at the same time (they probably execute in one cycle but I haven't checked) but only one division (which, again, takes many cycles to execute)?
 
Last edited:
How do you account for the fact that different instructions take different number of cycles to execute, from zero (sometimes, if the front end manages to fuse two instructions into one micro-op) to several tens (division, whose time to execute also depends of the actual data being divided)?
How do you account for the fact that, as an example, a Skylake core can do four non-vector additions at the same time (they probably execute in one cycle but I haven't checked) but only one division (which, again, takes many cycles to execute)?

Cause that is the actual real world effect of architecture on IPC in real world software at a set frequency so we can determine the efficiency of a architecture at a given task.


I seriously don't know how that is so hard to understand by so many.

Architecture A may be great at X software, while Y architecture may excel with Z software and its a balance act to make one great at everything, which is also why a great architecture at in order execution has a long/deep pipeline but a out of order architecture must have a either shallow pipeline and or a great predictive branching unit and lots of cache.


Why are Arm CPUs so good on phones and closed environments? They have a closed environment and can be optimized for typical handheld devices. The same program can run significantly faster on a desktop CPU through a emulator though, so which architecture is superior? Which has higher IPC.

1663467772894.png


1663467866761.png
 
arm is built on a RISC architecture which means they have less a simpler and smaller instruction set which means less space and lower power.
x86 is a CISC architecture which means they have a wider set of instructions, some of which are very complex and take a lot of hardware and power to implement.

The advantage of RISC is efficiency for small tasks, the advantage of CISC is performance on highly complex tasks, neither is superior in absolute.
in other words the x86 CPU can do the same thing with less instructions so this doesn't really reflect IPC.
 
Last edited:
Real IPC is a constant and is given by the architectural design, it's the architecture's ability to process instructions across "any" workload
So the real IPC of the Haswell or Skylake architecture is 6, is that what you mean? It's been calculated by people who seem to know the architecture well enough.

The big surprise here is just how good the "Gracemont" E-cores are in SPECint. OneRaichu made a distinction between the "Gracemont" E-cores of "Alder Lake" (GLC-12) and those of "Raptor Lake" (GLC-13,) as the latter have double the amount of shared L2 cache per E-core cluster. The E-core is fast approaching IPC levels comparable to that of "Skylake," which really is Intel's calculation in giving its processors a large number of E-cores next to a small number of P-cores. The idea is that the E-cores will soak up all the moderately-intensive compute workloads and background processes, keeping the P-cores free for gruelling compute-heavy tasks.
This was single-threaded benchmarking. While it does reveal a lot, it would have been great if it was also done with two threads and four threads.

2 threads on a single P core vs. 2 threads on the same E core cluster: each thread's performance on P should drop sharply (by 35% or so) but what about E?

4 threads on two P cores vs. 4 threads on the same E core cluster: similar but the E cores would be even more constrained because they share L2 and access to L3 and bus.

There may be optimisations (or regressions, for that matter) in how a P core handles SMT, and such benchmarking would have exposed that.
 
Back
Top