• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

It should theoretically be cooler as well because it'll be clocked lower. I hope we see mainstream 32 cores soon :pimp:
 
It's all good, I just loathe intels E-cores because they used a name that is the exact opposite of the product to mislead people about them

They're more efficient at single threaded tasks, and then intel uses them exclusively for multi threaded tasks.
Just... Ugh.
With my experimenting, (albeit on windows 10 which doesnt have intels pre configured scheduler).

By default in Windows 10 e-cores are heavily favoured, pretty much all single threaded tasks are loaded on to them and p-cores are parked, this even happens if parking is disabled in the power profile. (ultimate performance). park control also cant override this behaviour.

If I adjust the hetergeneous thread scheduling policy, I can manipulate this behaviour, its a hidden setting in windows. Setting it to either "all processors" or "performant" starts letting p-cores to be used, the latter however almost blocks use of e-cores so not ideal if you still want them to be used. But would be a quick and dirty fix e.g. if you want to fire up a single threaded game, it would give you a almost certainty it would use a p-core and not have to worry about affinity settings. Could use with something like 'AutoPowerOptionsOk' to automate the solution. Setting it to all processors would likely require using something like process hacker to get things working in a optimal way with automation so e.g. affinity for svchost and browsers to e-cores and affinity for games to p-cores (good for security as well as e-cores dont have htt). Both of these schedule options still automatically favour the fastest two p-cores for single threaded cinebench which is nice, on my ryzen cpu's this doesnt happen. It also doesnt happen on my 9900k, a reason why I went to all core clock speed on 9900k. But my testing on ryzen and 9900k was done on 1809, whilst on the 13700k was on 21H2, so its possible 1809 has no programming for "favoured cores" as that was introduced later I think.

AMD of course have this problem as well, with some of their processors for different reasons.

I assume the improvements in Windows 11 are just a better default behaviour when specific cpu's are recognised. For better OOB experience.
 
At least it's nice for them to finally have a real name. These Zen 4c cores are literally just APU grade Zen 4, jammed into a chiplet. Better than having to call them "reduced-cache Zen" every time to distinguish them.

A Zen4c does everything a zen4 does just probably a bit slower.

Yes, much more capable than an E-core, but in heterogeneous applications it requires no less scheduling optimization. Half-cache Zen 2 and Zen 3 perform slightly worse per clock in productivity and significantly worse in games/cache heavy workloads.
 
It seems to me that the simplification of the design has the weakness of not reaching clocks as high as Zen4. But this is not a problem on CPUs intended for servers...

"The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD."

If that's true why the slide says 35% smaller comparing just core+L2 ???
1686749568345.png
 
Last edited:
at 35% smaller,
So all that's left unknown is the effect on max boost clock's.

I don't think the enterprise version ever needed the high frequency capability that zen has so these probably cannot run as fast.

But it's intriguing.
Unless it is very heavy clock deficit, there should be no reason for keeping plain old Zen4 around any more.
 
Did you pull that from the 35% decrease in size, and just hoped the math is the same?

Cause uh, halving the cache likely decreases those quite a bit
Cache actually doesn't consume much energy, so it doesn't have a large effect on temps. The bigger contributor to lower temps will be the reduced clockspeed.
 
As long as they're making regular zen4 based chips for AM5 this will continue, also just had a look at fleabay recently awesome value on some of those previous gen EPYC's o_O
 
"The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD."

If that's true why the slide says 35% smaller comparing just core+L2 ???
Last week, TechPowerUp reported on an analysis by SemiAnalysis that went over how AMD made Zen 4c smaller. While it's behind a paywall, the first part covering the physical design is free to read. The core sans the L2 cache is 44% smaller, i.e. nearly half the size of a Zen 4 core. It's an impressive feat of physical design.
TLDR:
  1. reducing the number of timing critical regions to just 4 from well over 10 in Zen 4 as seen in the diagram below: this sacrifices clock speed for density
  2. a new SRAM bitcell developed by TSMC for memories outside L2. As a 6T design, it saves area compared to the usual 8T designs
  3. lower clock speed target allows denser circuits
  4. The L3 also lacks the arrays of Through-Silicon Vias (TSV) for 3D V-Cache, giving a small area saving. This means that there's no possibility of a stacked L3 cache for Zen 4c.
    1686750687998.png
 
Last week, TechPowerUp reported on an analysis by SemiAnalysis that went over how AMD made Zen 4c smaller. While it's behind a paywall, the first part covering the physical design is free to read. The core sans the L2 cache is 44% smaller, i.e. nearly half the size of a Zen 4 core. It's an impressive feat of physical design.
TLDR:
  1. reducing the number of timing critical regions to just 4 from well over 10 in Zen 4 as seen in the diagram below: this sacrifices clock speed for density
  2. a new SRAM bitcell developed by TSMC for memories outside L2. As a 6T design, it saves area compared to the usual 8T designs
  3. lower clock speed target allows denser circuits
  4. The L3 also lacks the arrays of Through-Silicon Vias (TSV) for 3D V-Cache, giving a small area saving. This means that there's no possibility of a stacked L3 cache for Zen 4c.
    View attachment 300809
Thank you, That is a much clearer and more detailed explanation. Zen4c would be a much better efficiency core if AMD decides to beat intel at its own game.
 
It's all good, I just loathe intels E-cores because they used a name that is the exact opposite of the product to mislead people about them

They're more efficient at single threaded tasks, and then intel uses them exclusively for multi threaded tasks.
Just... Ugh.
It's kind of funny because people have brought into the marketing fluff when it comes to their desktop product stack. Efficiency cores are just as bloated as the Performance cores. Because they are using skylake architecture for those cores.
 
It seems to me that the simplification of the design has the weakness of not reaching clocks as high as Zen4. But this is not a problem on CPUs intended for servers...

"The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD."

If that's true why the slide says 35% smaller comparing just core+L2 ???
View attachment 300804
Architectural change that effects performance...
It has been power and density optimized allowing for 16 cores per ccx... this means 16 cores share the same cache that...8 shared before...
Now you get 128 core with 8 ccx/core chiplets vs 12 for 96 on zen4
 
I thought at one point AMD had slides to show that the "c" version would just be cache reduced but when it will be implemented on the consumer LITTLE.big side they will combine the prior generation "c" cores with the next generation "p" cores. I might be miss remembering or it was just a rumor because I can't find definitive information on this with a quick google.

If intel would need to use their server Xeon Phi atom cores they could get on "feature parity" with their "p" cores at least when it comes to hyperthreading and AVX-512. The ATOM cores would still be a heck of a lot slower.
 
If intel would need to use their server Xeon Phi atom cores they could get on "feature parity" with their "p" cores at least when it comes to hyperthreading and AVX-512. The ATOM cores would still be a heck of a lot slower.
The Xeon Phi atom cores are much slower than the Gracemont cores used along side Golden Cove and Raptor Cove. Their only saving grace is AVX-512.
 
It's all good, I just loathe intels E-cores because they used a name that is the exact opposite of the product to mislead people about them

They're more efficient at single threaded tasks, and then intel uses them exclusively for multi threaded tasks.
Just... Ugh.
Looking at how Sapphire rapids struggle againt zen3 TR at equal core count while using more power, I'm really not surprised that they are being used in that manner. If RPL is already digusting when it comes to power draw, A 16 P-core i9 might have been uglier to witness on conssumers platforms. A 65w locked 7950x is still faster than golden cove going at 200 watts. (Note that Puget is enforcing PL1 125w and PL2 253w on the core i9 since those are the reference value set by Intel, and it's still faster than the xeon)
1686755696624.png
1686755825489.png
 
Apple's and orange's, there is a bigger gap between the e cores and p then this.

E cores are single threaded and have fewer resources and less capability And a reduced ISA no AVX for example.

So yes Intel do smaller but they are also weaker less capable and actually require process scheduler interaction.

A Zen4c does everything a zen4 does just probably a bit slower.

Intel's E-Cores has AVX2 via three 128-bit SIMD units, hence they are closer to AMD's Zen 1.x's quad 128-bit SIMD units.

Intel's E-Cores do not have AVX-512.
 
The Xeon Phi atom cores are much slower than the Gracemont cores used along side Golden Cove and Raptor Cove. Their only saving grace is AVX-512.
The point is they could bolt on HT and AVX-512 as they have before and they have the "roadmap" on how to do it in the next version if they wanted. Being Intel they won't until they are forced too by AMD.
 
Looking at how Sapphire rapids struggle againt zen3 TR at equal core count while using more power, I'm really not surprised that they are being used in that manner. If RPL is already digusting when it comes to power draw, A 16 P-core i9 might have been uglier to witness on conssumers platforms. A 65w locked 7950x is still faster than golden cove going at 200 watts. (Note that Puget is enforcing PL1 125w and PL2 253w on the core i9 since those are the reference value set by Intel, and it's still faster than the xeon)
View attachment 300823View attachment 300824

Cinebench R23 doesn't use AVX-512.

Blender-3.3.0-880x495.jpg



Blender.jpg



R25 prices.png


Are you using Cinema 4D R25 or Blender 3.x?

-------------------

Intel-Core-i9-13900KS-CineBench-R23-Multi-Core-Mode-Benchmark-Results.png

After 10 minute run, Intel Core i9 13900KS's scores are lower.
 
Last edited:
Intel can pack 4 E-Cores in the same size as 1 P-Core. What about AMD? How many Zen4c cores for one Zen 4 core?
Still one as 4c is ~35% smaller. In order to pack two 4c cores in the same area 4c would need to be half the size as regular 4.

However i could see a possible two chiplet AM5 version where one chiplet uses 8 Zen 4 cores and another uses 16 Zen 4c cores giving a total of 24c/48t albeit with a reduced total L3 compared to 7950X (and 7950X3D).

Not sure there is market for such a chip as it would be multi-threaded focused product that would likely suffer the same or worse problems in games as 7950X does and would lose to X3D parts for sure. However there is an argument to be made that a regular 7950X could be replaced by this with small performance hit in cache sensitive workloads. Because 7950X buyers likely care more about core counts rather than cache.

Also im not sure if it's viable to make a model that has two chiplets with different core counts because correct me if im wrong but thus far all AMD models that have used two chiplets have used the same core counts on each chiplet?
 
Cinebench R23 doesn't use AVX-512.

View attachment 300826


View attachment 300827


View attachment 300828

Are you using Cinema 4D R25 or Blender 3.x?

-------------------

View attachment 300829
After 10 minute run, Intel Core i9 13900KS's scores are lower.
I'm using a mix of both, but the point that I was trying to make is that the e-core are being used for MT on the conssumer platform because a 16 P-core i9 wouldn't have been competitive against Ryzen, especially with intel 7 having to carry intel until late 2024.
The e-cores are not just marketing, It's literally what allows Intel to stay relevant on the conssumer side for people who are not just gaming. Them lacking AVX512 isn't ideal, but it's either that, or let the competition take the performance and efficiency crown across the board
 
So what is the catch?
Looking at the posts in the thread, lower clocks and no 3d cache.

That is absolutely fine for server type usage.

I'm using a mix of both, but the point that I was trying to make is that the e-core are being used for MT on the conssumer platform because a 16 P-core i9 wouldn't have been competitive against Ryzen, especially with intel 7 having to carry intel until late 2024.
The e-cores are not just marketing, It's literally what allows Intel to stay relevant on the conssumer side for people who are not just gaming. Them lacking AVX512 isn't ideal, but it's either that, or let the competition take the performance and efficiency crown across the board
Yep the e-cores are keeping intel in the game on production type workloads, like software encoding, compressing, and compiling software. So absolutely used to keep multithreading competitive with AMD.

The p-cores keep them ahead on typical consumer use like gaming, office apps, web browsing, media playback.
 
That is absolutely fine for server type usage.
Absolutely fine for regular desktops as well, I'd rather get a 5GHz chip with 10% less ST performance than 7950x & 50-100% more cores. I bet if they decided to release a full lineup they could wipe Intel clean across lots of segments with their massive price & (MT) performance advantage! The catch for consumers though is that they make less through desktops so they won't concentrate on this for probably at least half a year.
 
Intel's E-Cores has AVX2 via three 128-bit SIMD units, hence they are closer to AMD's Zen 1.x's quad 128-bit SIMD units.

Intel's E-Cores do not have AVX-512.
Didn't know that.
 
Intel can pack 4 E-Cores in the same size as 1 P-Core. What about AMD? How many Zen4c cores for one Zen 4 core?
It is true that those are small enough, but the main issue with Atom e-cores is that those cores do not support hyper-threading and AVX512, which is one of reasons there was a complete mess with AVX512 on Alder Lake and Raptor Lake CPU. Hence, Intel nerfed AVX512 and owners cannot benefit from it.

Zen4 c-cores are fully capable cores with smaller L3 cache. c-cores support HT and AVX512 workloads; perfect for cloud.

So, Sierra Forest CPU next year will have 144 Atom cores: 144C/114T. Bergamo CPU has 128C/256T. It's a monster chip for cloud computing, trumping both Intel and ARM solutions by several times, while easily slotted in the same socket 6096. Data centre partners will not need to buy new server motherboads either.

Next year, Turin Zen5 c-cores should bring another evolution in design in 16-core chiplets, namely current two 8-core CCX will be unified into 16-core CCX/CCD. If they want to increase core count to 192 c-cores, they will have to change packaging and I/O in order to place additional two chiplets, as there is no space left on current package due to communication pathways. That's why 16-core chiplets on Bergamo are placed apart and not jointly near each other.
 
Last edited:
Yep the e-cores are keeping intel in the game on production type workloads, like software encoding, compressing, and compiling software. So absolutely used to keep multithreading competitive with AMD.
As with the P cores, Intel has clocked the E cores too high. Clocking them closer to 3 Ghz would make them true E cores: more efficient than P cores. Chips and Cheese found Gracemont to be more efficient than Golden Cove at a variety of tasks if clock speeds were kept in check. Notably, these more efficient clock speeds were lower than Intel's default for the 12900k.

1686764626387.png
 

Attachments

  • 1686764525056.png
    1686764525056.png
    32.7 KB · Views: 130
Back
Top