• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

Epycs are running this stuff at 2-3GHz. Limiting a desktop (or laptop) CPU to that has a pretty profound effect on any load that is single-core or depends on few cores. Games are the obvious practical example - some will work just fine with a minor performance hit but in general you'd take a sizeable one. Benchmark results for anything like that - maybe Cinebench - would also be quite devastating.

It is a bit of neverending conundrum with chips - desired optimization points. As a manufacturer, do you want or can you sell efficiency as the main point? CPUs are a little trickier at that but GPUs might be an easier example - would you want an RTX 4090 at 300W power limit? How about 150W? Given that everything that would go into such product remains the same, meaning the cost would also be the same.

There is always possibility of limiting the larger CPU (or GPU) to the desired spot. AMD even has ECO mode. Both AMD and Intel (and Nvidia) have configurable power limits and depending on specific thing and needs also frequency limits. Basically, take a 7800X3D, limit its frequency to 3GHz and set the power limit at 24W and see where it leaves you and whether you would be willing to pay the cost for the results you get. Would be an interesting test, to be honest.
C-cores are perfectly fine at 3GHz since they're just there for multithreaded performance, they dont need to boost up.
power efficiency is the key here - 16 core 32 thread laptops in a 45W power limit is entirely plausible, and that would shake up the market a lot


These are different in that even power limited and optimised they're well under the wattage you can achieve on anything else - played with a zen3+ DDR5 laptop and it was 3-4W per core in MT and 6W peak ST, 2-3x higher than these epyc cores, which again are not based on the C core design.

Dont you think the limited release 5600x3D seems like the perfect thing to pair up with a bunch of C-cores? Memory light tasks get the 3D cache, core/thread heavy tasks get the C-cores.


Big big deal here is that the C-cores are also physically smaller, they can get more in the same physical space and produce more per wafer. That helps out the bottom line a lot.
 
From everything we know about AMD C-cores they should not have a power efficiency benefit. From what AMD says, they are very much the same core - with lower L3 cache and no way to add 3D V-Cache - but that was their choice. They have a die area benefit for a clock speed boost penalty. Since their optimization point of choice is at low frequencies the penalty really does not come into play.

AMD already has a 16-core mobile CPU as 7945HX - https://www.amd.com/en/product/13016 - with a slightly higher TDP range though. the lower end of the range at 55W should match pretty well to your 45W idea given the inherent efficiency handicap from using chiplets. Granted, the laptops with it usually run something like 88W power limit but surely that is configurable.

Regarding running stuff at more optimized settings - my post might have come out more critical than intended the same line of thought has been on my mind quite a number of times. I am currently running a 5800X3D limited to 76W and with -30 curve optimizer negative offset.
 
From everything we know about AMD C-cores they should not have a power efficiency benefit. From what AMD says, they are very much the same core - with lower L3 cache and no way to add 3D V-Cache - but that was their choice. They have a die area benefit for a clock speed boost penalty. Since their optimization point of choice is at low frequencies the penalty really does not come into play.

we already have data on that


1690182072643.png


1690182080634.png


Across all of these benchmarks carried out, the EPYC 9754 2P on average had a 385 Watt power draw... In comparison the EPYC 9654 2P had a 447 Watt average and the EPYC 9684X 2P had a 464 Watt average. And need we mention the Xeon Platinum 8490H 60-core processor consuming even more power with a 568 Watt average. The EPYC 9754 power consumption results surpassed my expectations in frankly not expecting Zen 4C to deliver such power efficiency improvements while still performing so well.
 
That Phoronix review really does not tell us much about power efficiency of Zen4c vs Zen4. Considerably more cores at lower clocks and generally well-threaded tests...
9754 is 128c/256t at 2.25/3.1/3.1 GHz
9654 is 96c/192t at 2.4/3.55/3.7 GHz
Those few hundred MHz alone make a noticeable difference in efficiency.
 
That Phoronix review really does not tell us much about power efficiency of Zen4c vs Zen4. Considerably more cores at lower clocks and generally well-threaded tests...
9754 is 128c/256t at 2.25/3.1/3.1 GHz
9654 is 96c/192t at 2.4/3.55/3.7 GHz
Those few hundred MHz alone make a noticeable difference in efficiency.

efficiency is not being calculated in relation to the clocks, but in relation to the work done
 
efficiency is not being calculated in relation to the clocks, but in relation to the work done
Efficiency calculation includes power consumption that is absolutely affected by clocks. Power consumption figures around 2-3GHz are on a quite steep slope.

More cores at lower clocks will be more efficient. The details on that are hard to see from a general result like that - plus, we do not really know the clocks distribution across tests. For example, look at 9654 and 9554 in the same lineup where the former is ~20% faster at the same power draw. It is not quite the same level of difference as 9745 vs 9654 but still a noticeable efficiency difference (also, both of these are probably running at 3+GHz but we do not know exactly).

9654 is 96c/192t at 2.4/3.55/3.7 GHz
9554 is 64c/128t at 3.1/3.75/3.75 GHz

Edit:
I am not saying that Zen4c is not more efficient but this is not the data point that would show that in any clear manner, much less getting some idea how much more efficient.
 
Zen 3D - add more cache
This is Zen 1D, less cache!

(For certain workloads the cache matters less, so the product makes sense)
I myself thought about this aswell.
Since it seems as if Zen4c cores use 20-30% less power at the same core count, if they added 3dvcache to such a processor, it would result in 1,5 times the cache and better power efficiency which would appeal to everyone, but in order to work better than the 7900x3d and 7950x3d have resulting in worse perfomance in games than the 7800x3d, they should put it on both ccd`s. I doubt it would happen with Zen4 but rather with Zen5, thus ill use 8000 naming in my examples.
AMD EPYC™ 9754 128c 256t 360W TDP
AMD EPYC™ 9654n 96c 192t 360W TDP

Example( pure fiction ) (i have gone with 30% less power drawn)
  • Ryzen 5

    8600X TDP 65 W L3 32MB
    8600 TDP 65 W L3 32 MB
    8600C C-cores and 3D Vcache TDP 44 W L3 48 MB

  • Ryzen 7 (700&800)

    8700 X TDP 105 W L3 32MB
    8800X TDP 120 W L3 32MB
    8700 / 8800 TDP 65 W L3 32MB

    8700C TDP 75 W L3 48MB
    8800C TDP 85 W L3 48MB

    8800X3D TDP 120W L3 96MB

  • Ryzen 9 ( im only doing 900 here)

    8900X TDP 170 W L3 64MB
    8900 TDP 65 W L3 64MB

    8900X3d TDP 120 W L3 96 MB ->if they make the same configuration as with the 7900X3d
    8900C TDP 144 W L3 96 MB -> This one has on both CCDs 3D-V-Cache

If they scale aswell as i think then a 8900C would be a great choice for those who want great Gaming performance and more cores, with the added benefit of better power efficency and the others just aswell and in the case of the 8800 lineup it would be a middle ground between productivity cpu and gaming cpu with 3D V Cache, and so on.
 
Efficiency calculation includes power consumption that is absolutely affected by clocks. Power consumption figures around 2-3GHz are on a quite steep slope.

More cores at lower clocks will be more efficient. The details on that are hard to see from a general result like that - plus, we do not really know the clocks distribution across tests. For example, look at 9654 and 9554 in the same lineup where the former is ~20% faster at the same power draw. It is not quite the same level of difference as 9745 vs 9654 but still a noticeable efficiency difference (also, both of these are probably running at 3+GHz but we do not know exactly).

9654 is 96c/192t at 2.4/3.55/3.7 GHz
9554 is 64c/128t at 3.1/3.75/3.75 GHz

Edit:
I am not saying that Zen4c is not more efficient but this is not the data point that would show that in any clear manner, much less getting some idea how much more efficient.

efficiency is "work done" divided by wattage,

I myself thought about this aswell.
Since it seems as if Zen4c cores use 20-30% less power at the same core count, if they added 3dvcache to such a processor, it would result in 1,5 times the cache and better power efficiency which would appeal to everyone, but in order to work better than the 7900x3d and 7950x3d have resulting in worse perfomance in games than the 7800x3d, they should put it on both ccd`s. I doubt it would happen with Zen4 but rather with Zen5, thus ill use 8000 naming in my examples.
AMD EPYC™ 9754 128c 256t 360W TDP
AMD EPYC™ 9654n 96c 192t 360W TDP

Example( pure fiction ) (i have gone with 30% less power drawn)
  • Ryzen 5

    8600X TDP 65 W L3 32MB
    8600 TDP 65 W L3 32 MB
    8600C C-cores and 3D Vcache TDP 44 W L3 48 MB

  • Ryzen 7 (700&800)

    8700 X TDP 105 W L3 32MB
    8800X TDP 120 W L3 32MB
    8700 / 8800 TDP 65 W L3 32MB

    8700C TDP 75 W L3 48MB
    8800C TDP 85 W L3 48MB

    8800X3D TDP 120W L3 96MB

  • Ryzen 9 ( im only doing 900 here)

    8900X TDP 170 W L3 64MB
    8900 TDP 65 W L3 64MB

    8900X3d TDP 120 W L3 96 MB ->if they make the same configuration as with the 7900X3d
    8900C TDP 144 W L3 96 MB -> This one has on both CCDs 3D-V-Cache

If they scale aswell as i think then a 8900C would be a great choice for those who want great Gaming performance and more cores, with the added benefit of better power efficency and the others just aswell and in the case of the 8800 lineup it would be a middle ground between productivity cpu and gaming cpu with 3D V Cache, and so on.

imo compact c-cores can't run 3dcache as of now, but hope somebody else can confirm

provisioning for TSV (vertical wiring vias for connecting 3D V-Cache) has been eliminated, which saved space on the chip
 
efficiency is "work done" divided by wattage,



imo compact c-cores can't run 3dcache as of now, but hope somebody else can confirm

provisioning for TSV (vertical wiring vias for connecting 3D V-Cache) has been eliminated, which saved space on the chip
The Zen 4c analysis by SemiAnalysis confirms that TSVs have been eliminated to save space.
The L3 also lacks the arrays of Through-Silicon Vias (TSV) for 3D V-Cache, giving a small area saving.
 
efficiency is "work done" divided by wattage
Do you want to say that neither "work done" nor wattage is affected by clocks?
 
Do you want to say that neither "work done" nor wattage is affected by clocks?
it's an entirely different thing, and clear distinctions need to be made

it's like saying an amps value, but refusing to state volts or watts - some metrics are a combination of others, but singular ones are often worthless without the other corresponding data


from the link @AnotherReader posted above
The trend is that performance per Watt in any given workload is the most important factor, and as such can command a significant price premium. Look no further than the AMD Milan to Genoa transition, where AMD was able to command an 80% price increase simply due to the increased deployment density and performance per watt.
performance per watt and overall efficiency is what AMD makes their money from, and we're going to see consumer products reflecting that.

laptops and OEM desktops will love the tits off this, because it means smaller lighter products that need less cooling, and that means more profits.


Looks like they already are
This means an identical IPC and ISA feature level, which simplifies integration on the client side. In fact, AMD’s is also silently swapping some Zen 4 cores with Zen 4c cores in its lower-end 4nm Ryzen 7000U “Phoenix” mobile processors. On Bergamo, Zen 4c allows AMD to increase core counts from 96 to 128 while saving on area and cost. This bifurcation in design philosophy will increase in future generations of hardware.


The further details on the current designs match something AMD said in TPU's interview recently, that they've got performance concerns passing a certain threshold without faster RAM to back it up - they've limited how many performance cores they can have with the current design (using 8 of 12 CCX links)

However, the truly stunning thing here is the die size. 16 Zen 4c cores are barely larger than 8 Zen 4 cores
This is where things will change, as the cores are individually slower they can slap in twice as many cores for an overall performance gain as well as an efficiency gain - and possibly use the unused CCX links.

256c cores in the server world is entirely plausible, and probably being worked on already.
 
Last edited:
Does this mean they do not require Windows 11? Windows 10 should be fine then?
 
Does this mean they do not require Windows 11? Windows 10 should be fine then?
I haven't seen benchmarks but if you're talking about the Phoenix 2 die which maxes or at 2+4 cores (Zen 4+Zen4c), I would expect Windows 10 to do fine. Windows 10 is aware of preferred cores and I think that's all it needs for best results, and under laptop power limits, worst-case results will only be a little worse than best case.
 
Does this mean they do not require Windows 11? Windows 10 should be fine then?
They arent like intel mixing two types of cores, so nothing special is needed at the OS scheduler level.
 
Does this mean they do not require Windows 11? Windows 10 should be fine then?
Considering that the only major difference between Zen 4 and Zen 4c is clock speed, which is something Intel had even back on 11th gen with Turbo Boost 3.0, I'd say, absolutely.
 
Considering that the only major difference between Zen 4 and Zen 4c is clock speed, which is something Intel had even back on 11th gen with Turbo Boost 3.0, I'd say, absolutely.
Well, cache.

Per CCX cache values, so combinations exist with dual CCX CPUs

16MB (Phoenix 'G' APU)
16MB (Dinoysus/Zen 4C)
32MB (per CCX) in regular Zen4 (Raphael)
96MB (x3D)

The Zen4C come across initially as being an APU without the APU, but they fit twice as many cores in the same space - mostly due to changing the SRAM used, it would seem.
In addition to the reduced core footprint, die space is further saved in the Zen 4c CCD via the use of denser 6T dual-port SRAM cells and an overall reduction of L3 cache to 16 MB per 8-core CCX. Zen 4c cores have the same sized L1 and L2 caches as Zen 4 cores but the cache die area in Zen 4c cores is lower due to using denser SRAM and slower cache

Using denser, slower L3 cache let them make it physically smaller and slap in double the cores, but since L1 and L2 are the same the basic performance matches Zen4 in general.
It's like a reversal of the x3D chips, since some tasks didnt benefit from the extra cache (rendering, extremely long workloads etc), they made a chip with less, slower cache to fit that need.


Cant wait for something with 8 3D cores and 32 C cores, that'll be the thing to blast every benchmark off the map
 
Using denser, slower L3 cache let them make it physically smaller and slap in double the cores, but since L1 and L2 are the same the basic performance matches Zen4 in general.
It's like a reversal of the x3D chips, since some tasks didnt benefit from the extra cache (rendering, extremely long workloads etc), they made a chip with less, slower cache to fit that need.
The only chip with both Zen 4 and Zen 4c cores is Phoenix 2, and in Phoenix 2 all cores share the same L3 cache. (The Zen 4c-only chip, Bergamo, does have denser and more widely shared L3 cache.)
 
The only chip with both Zen 4 and Zen 4c cores is Phoenix 2, and in Phoenix 2 all cores share the same L3 cache. (The Zen 4c-only chip, Bergamo, does have denser and more widely shared L3 cache.)
I assume they'll make more hybrid designs in the future, they've barely begun on it.
 
Well, cache.

Per CCX cache values, so combinations exist with dual CCX CPUs

16MB (Phoenix 'G' APU)
16MB (Dinoysus/Zen 4C)
32MB (per CCX) in regular Zen4 (Raphael)
96MB (x3D)

The Zen4C come across initially as being an APU without the APU, but they fit twice as many cores in the same space - mostly due to changing the SRAM used, it would seem.


Using denser, slower L3 cache let them make it physically smaller and slap in double the cores, but since L1 and L2 are the same the basic performance matches Zen4 in general.
It's like a reversal of the x3D chips, since some tasks didnt benefit from the extra cache (rendering, extremely long workloads etc), they made a chip with less, slower cache to fit that need.


Cant wait for something with 8 3D cores and 32 C cores, that'll be the thing to blast every benchmark off the map
Yeah, but the L3 cache is just one single entity shared across the whole CPU, so the scheduler doesn't need to do anything special to account for it. L1 and L2 are the same across Zen 4 and Zen 4c cores.
 
Well, cache.

Per CCX cache values, so combinations exist with dual CCX CPUs

16MB (Phoenix 'G' APU)
16MB (Dinoysus/Zen 4C)
32MB (per CCX) in regular Zen4 (Raphael)
96MB (x3D)

The Zen4C come across initially as being an APU without the APU, but they fit twice as many cores in the same space - mostly due to changing the SRAM used, it would seem.


Using denser, slower L3 cache let them make it physically smaller and slap in double the cores, but since L1 and L2 are the same the basic performance matches Zen4 in general.
It's like a reversal of the x3D chips, since some tasks didnt benefit from the extra cache (rendering, extremely long workloads etc), they made a chip with less, slower cache to fit that need.


Cant wait for something with 8 3D cores and 32 C cores, that'll be the thing to blast every benchmark off the map
Given that 32 MB of L3 in a Zen 4c die takes about the same die space as in a Zen 4 die, it's rather unlikely that it's any denser or slower than the L3 in regular Zen 4. For large SRAM arrays, wire delay contributes significantly to the access time so a smaller array should be a little faster than a large array. Of course, larger wires can be used for the larger array to decrease wire delay. Another example is the extra cache in the 7950X3D which is denser than regular SRAM and is on a different die, but it only incurs 4 more cycles of latency.
 
Yeah, but the L3 cache is just one single entity shared across the whole CPU, so the scheduler doesn't need to do anything special to account for it. L1 and L2 are the same across Zen 4 and Zen 4c cores.
100% agreed

They used higher density cache (and less of it) which is something the OS doesnt know or care about, so all those core types appear the same.
The only thing needed is something the chipset driver already does, with a way to push games onto cores with higher cache if available

Given that 32 MB of L3 in a Zen 4c die takes about the same die space as in a Zen 4 die, it's rather unlikely that it's any denser or slower than the L3 in regular Zen 4. For large SRAM arrays, wire delay contributes significantly to the access time so a smaller array should be a little faster than a large array. Of course, larger wires can be used for the larger array to decrease wire delay. Another example is the extra cache in the 7950X3D which is denser than regular SRAM and is on a different die, but it only incurs 4 more cycles of latency.
Zen4C fits twice as many cores in the same space - they stick 16 cores where 8 Zen4 cores fit. They are a LOT denser.
 
They used higher density cache (and less of it) which is something the OS doesnt know or care about, so all those core types appear the same.
The only thing needed is something the chipset driver already does, with a way to push games onto cores with higher cache if available
The OS certainly has reason to care about L3 cache. If one application has 2 threads which frequently communicate and share data, then it's hugely beneficial for those two threads to share the same L3 cache pool. This would be a concern in Zen and Zen 2 where every four cores has a separate L3 cache and there's a long latency penalty between blocks. And I'm not aware of the chipset playing a role in this. The OS chooses which core a thread will run on.
Zen4C fits twice as many cores in the same space - they stick 16 cores where 8 Zen4 cores fit. They are a LOT denser.
The article says that the cores themselves are 35% denser. The rest of the density increase comes from using half the L3 cache per core (but twice as many cores) with a denser cache design. That's for Bergamo.

Where Zen 4 and Zen 4c are used together is Phoenix 2, and in Phoenix 2 both types of cores share the same pool of L3 cache, so in Phoenix 2 there is literally no difference between the cores with respect to cache.
 
The OS certainly has reason to care about L3 cache. If one application has 2 threads which frequently communicate and share data, then it's hugely beneficial for those two threads to share the same L3 cache pool. This would be a concern in Zen and Zen 2 where every four cores has a separate L3 cache and there's a long latency penalty between blocks. And I'm not aware of the chipset playing a role in this. The OS chooses which core a thread will run on.

The article says that the cores themselves are 35% denser. The rest of the density increase comes from using half the L3 cache per core (but twice as many cores) with a denser cache design. That's for Bergamo.

Where Zen 4 and Zen 4c are used together is Phoenix 2, and in Phoenix 2 both types of cores share the same pool of L3 cache, so in Phoenix 2 there is literally no difference between the cores with respect to cache.
Good points - there are a variety of designs there and that does make it more confusing when talking about it.

Summary: scheduling changes aren't needed at the OS level to use these CPUs. Hybrid designs need something to push programs to the best choice, but the 'worst case' wont be like on Intels E-cores where programs can outright crash or have massive performance losses.
 
Hybrid designs need something to push programs to the best choice, but the 'worst case' wont be like on Intels E-cores where programs can outright crash or have massive performance losses.
I've never heard of crashes being caused by Intel E-core scheduling. Any app should work properly even if the OS moves it to an E-core, unless the app was designed to fail specifically in this case, like anti-cheat software. But the potential performance hit from a scheduling mistake could indeed be a lot worse than for Zen 4/4c.
 
100% agreed

They used higher density cache (and less of it) which is something the OS doesnt know or care about, so all those core types appear the same.
The only thing needed is something the chipset driver already does, with a way to push games onto cores with higher cache if available


Zen4C fits twice as many cores in the same space - they stick 16 cores where 8 Zen4 cores fit. They are a LOT denser.
It's the cores that are much denser. The L3 seems to be the same.
 
Back
Top