• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel Prepares Raptor Lake Designs With 24 Cores and 32 Threads, More E-Cores This Time

Would have preferred it if they'd added 2 more P cores and had less E core clusters
 
Intel have to do the E-cores because their main cores are too power hungry, and they want to win multi threaded benchmarks


I'd rather have 4-8 E-cores dedicated to the OS, and the big boys for programs and games
 
I like the e cores after having used them. Maybe in the perfect no crapware or background apps bench setups used by reviews they don't make much sense, but for my use case they seem work great.

I've tested my max overclock on the 12600k at 5.4 ghz and 47 ring with e-cores off and my 24/7 5.3 ghz with 43 ring with e-cores on (overclocked to 4.3ghz) and having them on very noticeably eliminates intermittent stutters in cyberpunk and far cry 6. Could be just my setup but they seem to really work (especially because I'm too lazy to shut down all my background stuff).

Also I get 90% of 12700K multithread perf at ~186W which is pretty nice - not something 8 P cores by themselves can do afaik. 12900K with 10P cores would probably get lower multithreaded performance than current 12900k 8p/8e for 100W more draw, and 0 benefit in virtually any current real-world application.



This is probably true. Stacked cache looks insane.
Finally info on someone that's intentionally overclocked the E cores. Have you tried aggressively reducing the P cores multipliers to see if the E cores can clock higher if the heat output from the P cores isn't a primary limitation to it!!? The E cores appear more efficient for the die space area than the P cores relative to the die space. To me clocking them higher is a no-brainer if you want higher overall performance. How did you go about overclocking them have you tried BCLK!!? There are some advantages to a BCLK overclock in that it raises the memory ratio speed much like infinity fabric. You can probably get better memory results as well like with infinity fabric overclocking.

The situation you describe is exactly where the strengths of E cores lies actually background CPU utilization contention that bogs down P core performance. Since you've bog down the P cores performance with less of that from the E cores that occupy less die space you have higher overall performance than you would otherwise under certain general use circumstances. There is certainly design balances between the core types, but I like the trade off myself. Your results look encouraging. I've wanted to see more of this kind of overclocking on Alder Lake and how it impacts results. That actually is 3W less than the stock 12600K multithreaded results TPU measured is that undervolted!!? Seems wild given you've got both P core and E cores types overclocked over stock though maybe that wasn't measured while stress testing under same workload circumstances with Cinebench.

Intel have to do the E-cores because their main cores are too power hungry, and they want to win multi threaded benchmarks


I'd rather have 4-8 E-cores dedicated to the OS, and the big boys for programs and games
This Intel can't push the P core frequency curve much higher at this point because voltage curve and heat output to do so is completely asinine at this point. Even with carbon nanotubes and move away from silicone the power draw would still be crazy as loon for a rather tiny increase to frequency scaling. E-cores are the right choice and more of them. A better balance medium between E cores and P cores with another core designs would further improve things, but won't happen overnight. I think we'll see kind of a stacked pyramid and inverted pyramid design of sorts eventually with TSV shingling.
 
Last edited:
Would have preferred it if they'd added 2 more P cores and had less E core clusters
See, thing is they don't want melting stock VRM's as a defect.
 
Intel Raptor Lake
or
How to get away with 300W TDPs.

In sales now.
 
I think E-cores are the way.

With Alder and Raptor Lake, Intel's laying a foundation for high-performance manycore processors in the future. I believe the company will focus on increasing E-cores' performance, while retaining the density advantage. With Foveros 3D packaging + densely packed E-core clusters, they may very well achieve GPU-like core counts per socket without giving up on IPC, and that's where the master stroke is.

I would not be surprised to see HEDT processors with wild configs like 16 P-cores + 128 E-cores in the future.
 
Finally info on someone that's intentionally overclocked the E cores. Have you tried aggressively reducing the P cores multipliers to see if the E cores can clock higher if the heat output from the P cores isn't a primary limitation to it!!?
I have not -- they seem to be limited by the core voltage of the P cores (my board uses the same voltage domain for both) so I am sure if I push volts above 1.32v I would be able to push them harder. The Ecores themselves never get that hot at the sensor (68C during CB) so I don't think heat is their main limitation -- also when they crash they crash instantly (4.5 ghz wont even boot into windows) so it's pretty binary stability. Below is some shots during/after cinebench R23 at 4.3 ghz.

1638833463779.png
1638833480679.png



The E cores appear more efficient for the die space area than the P cores relative to the die space. To me clocking them higher is a no-brainer if you want higher overall performance. How did you go about overclocking them have you tried BCLK!!? There are some advantages to a BCLK overclock in that it raises the memory ratio speed much like infinity fabric. You can probably get better memory results as well like with infinity fabric overclocking.
I have actually - my issue on this board with BCLK OC is if I touch it at all, one of my sata drives in windows disappears and my USB ports randomly shut off, so I just leave that on 100. It does help to dial in max ring/ e core clocks but i don't have separate clock domains.

The situation you describe is exactly where the strengths of E cores lies actually background CPU utilization contention that bogs down P core performance. Since you've bog down the P cores performance with less of that from the E cores that occupy less die space you have higher overall performance than you would otherwise under certain general use circumstances. There is certainly design balances between the core types, but I like the trade off myself. Your results look encouraging. I've wanted to see more of this kind of overclocking on Alder Lake and how it impacts results.

I want to take some time to see if I can get a frame pacing software set up to show difference between e cores on and off with all my garbage that I run and youtube running in the background. This is what my gaming task manager usually looks like when I fire up a game:
1638833743201.png


That actually is 3W less than the stock 12600K multithreaded results TPU measured is that undervolted!!? Seems wild given you've got both P core and E cores types overclocked over stock though maybe that wasn't measured while stress testing under same workload circumstances with Cinebench.
So I measure using HWinfo -- I not sure TPU uses a different methodology. Here is a shot during CB 23:
1638833349477.png


^ I actually draw around 189-192W in R23 (not 187, so I was a tiny bit off). Let me know if you want me to run any before / after benches on E core OC. I am sure if I go full FPU load using another stress software I can push that past 200W (still not terrible).

1638833902769.png

CB R23 full run with e cores @ 4.3
 
Raptor Lake should be a large improvement over Alder Lake especially in power efficiency and with 2x the E-cores as well as IPC uplifts shpould be a really good product. But don't count AMD out. They are releasing two Zen 4 CPU classes and for those that need massive mutli-thread performance they will have Zen 4c with up to 32 cores and each 4c core will be between 10-20% weaker than Zen 4 cores, so will obliterate Gracemont E cores in performance. 32C/64t Zen4c would destroy 13900K RL with 8 P-cores and 24 E-cores at multithreading.

Late next year will be very exciting and really can't go wrong IMO with either camp. Torn between updating my 2016 Zen 1700X system with Zen4/RL or waiting for Zen 5/Meteor Lake and pushing back update to 2024. Zen 5 introduces big.little and Meteor Lake cores bring large architectural changes and probably sees the end of ringbus topology. Zen 5's little cores will be 4c cores from Zen 4.
 
Last edited:
This seems to be the general concensus, but: for home server builders? hell yeah, bring me all E-core clusters, don't care how slow they are (Xeon Phi style). For power users? Go full ham and make something with all P-cores just for the lulz, ridiculous cooling requirements are already a problem these days so nothing changes.
Alternatively, small E-core only (relatively performant) office systems would be welcome to help out on the efficiency side. I think that's what Zhaoxin was trying to do.

I have a few home server builds but I don't really want a Hybrid CPU in my system.
Mainly I don't trust the scheduler to handle things right.
And the unknown performance drop / crashing when the scheduler decides to move my tasks from P-cores to E-cores are concerning.

On the other hand, I agreed a "Pure E-core" CPU is interesting.
A 12900k sized 40 cores CPU will be extremely handy .
 
I have a few home server builds but I don't really want a Hybrid CPU in my system.
Mainly I don't trust the scheduler to handle things right.
And the unknown performance drop / crashing when the scheduler decides to move my tasks from P-cores to E-cores are concerning.

On the other hand, I agreed a "Pure E-core" CPU is interesting.
A 12900k sized 40 cores CPU will be extremely handy .

I think that's what Zen 4c (Bergamo) is - basically scaled down zen 4 cores optimized for density.

Both camps are looking at density for mt applications it seems. Thing is I don't think AMD is planning on launching those to consumers, so a pure E core CPU, if intel decided to launch one, would be super interesting for people who need tons of mt.
 
Yeah, you have to balance motherboard cost, CPU cost, and power use cost. Some places in the Patagonia are VERY expensive, making an Intel under intensive use a bad deal.
For gaming, the cheapest i5 or i3 are the only valid options.
 
I think E-cores are the way.

With Alder and Raptor Lake, Intel's laying a foundation for high-performance manycore processors in the future. I believe the company will focus on increasing E-cores' performance, while retaining the density advantage. With Foveros 3D packaging + densely packed E-core clusters, they may very well achieve GPU-like core counts per socket without giving up on IPC, and that's where the master stroke is.

I would not be surprised to see HEDT processors with wild configs like 16 P-cores + 128 E-cores in the future.
Thats intel's ultimate goal.. imagine the multi core performance of a 8+100 core cpu insane
 
Thats intel's ultimate goal.. imagine the multi core performance of a 8+100 core cpu insane
1 core to win ST performance benchmarks
E-cores to win MT benchmarks
And low CPU prices, to in the darkness bind them
 
Thats intel's ultimate goal.. imagine the multi core performance of a 8+100 core cpu insane
Not sure what all this enthusiasm for hundreds of E cores is all about. You do know what these E cores are, don't you? You do realize that there E cores are so efficient because they miss many of the latest CPU core features, and clock like a 10 year old CPU. Once you add the features and clockspeed back, and allow for increases to IPC, because AMD don't stand still, they will just end up being P cores anyway.

Intel seem to be in trouble with these P cores, they need to shrink these things down dramatically to get the thermals and power under control, and Intel suck at new process nodes.

And another thing is that AMD don't seem to have a problem with 128 full performance cores in the server line next year, and yes, they will be low clocked, but they have all the features and IPC, unlike Intels E cores.
 
Not sure what all this enthusiasm for hundreds of E cores is all about. You do know what these E cores are, don't you? You do realize that there E cores are so efficient because they miss many of the latest CPU core features, and clock like a 10 year old CPU. Once you add the features and clockspeed back, and allow for increases to IPC, because AMD don't stand still, they will just end up being P cores anyway.

And another thing is that AMD don't seem to have a problem with 128 full performance cores in the server line next year, and yes, they will be low clocked, but they have all the features and IPC, unlike Intels E cores.
There have been much complaints about the applicability of AVX512 and that is largely the main missing feature.
SMT is the other but given the size and possible density of E-cores it can be mitigated by adding more cores.
64-core EPYCs run at 2GHz base clock (highest SKU was 2.25MHz IIRC). 40-core Ice Lake Xeon runs at 2.3GHz. That is quite a bit less than what we see E-cores in Alder Lake running at.
E-core IPC today is in the same range as Skylake or Zen+ which is not bad at all.

By the way, AMD's 128-core is Zen4C, whatever that exactly ends up being. Space-optimized (=smaller) they said but looks like it is power optimized as well.
 
Last edited:
I have not -- they seem to be limited by the core voltage of the P cores (my board uses the same voltage domain for both) so I am sure if I push volts above 1.32v I would be able to push them harder. The Ecores themselves never get that hot at the sensor (68C during CB) so I don't think heat is their main limitation -- also when they crash they crash instantly (4.5 ghz wont even boot into windows) so it's pretty binary stability. Below is some shots during/after cinebench R23 at 4.3 ghz.

View attachment 227910View attachment 227911



I have actually - my issue on this board with BCLK OC is if I touch it at all, one of my sata drives in windows disappears and my USB ports randomly shut off, so I just leave that on 100. It does help to dial in max ring/ e core clocks but i don't have separate clock domains.



I want to take some time to see if I can get a frame pacing software set up to show difference between e cores on and off with all my garbage that I run and youtube running in the background. This is what my gaming task manager usually looks like when I fire up a game:
View attachment 227913


So I measure using HWinfo -- I not sure TPU uses a different methodology. Here is a shot during CB 23:
View attachment 227909

^ I actually draw around 189-192W in R23 (not 187, so I was a tiny bit off). Let me know if you want me to run any before / after benches on E core OC. I am sure if I go full FPU load using another stress software I can push that past 200W (still not terrible).

View attachment 227914
CB R23 full run with e cores @ 4.3
This is good info on a lot of different details. Do you think your BCLK issue is board issue mostly or general problem with Alder Lake. I thought Alder Lake could push individual domains, but maybe that's basically board specific situation.

What you describes sounds like PCIE getting overclocked due to BCLK that's thing causing issues to the SATA/USB ports tied to PCIE. Makes me think of all the classic VIA chipsets that had those same basic overclocking issues. Vicious cycle of fixed then broken in regard to that.

The consistency of the tempson the E cores is kind of surprising. It looks to me like temps on P core could get in the way more readily than the E cores. The E cores don't look overly hot, but P cores certainly heat up a bit more and combined probably the bigger heat concern or seems that way.

Intel won't rest until they put a phone SoC in your PC, but tax it as a HEDT one...


Or maybe intel should't use toothpaste as TIM?


Preeeach!
Perhaps or maybe they want to put a PC in a phone and tax it like Apple.
 
Last edited:
I have a few home server builds but I don't really want a Hybrid CPU in my system.
Mainly I don't trust the scheduler to handle things right.
And the unknown performance drop / crashing when the scheduler decides to move my tasks from P-cores to E-cores are concerning.

On the other hand, I agreed a "Pure E-core" CPU is interesting.
A 12900k sized 40 cores CPU will be extremely handy .
I, too, run a home server and compile desktop programs to be distributed to other clients. Makes me wonder how much performance is lost because Windows 11 has to ensure binaries run on both P and E cores. Provided it's not as bad as i686 being the common denominator, but surely cache sizes and lines are different between the P and E cores to cause pipe-line stalls.
 
Zen 4 is going to have up to 50% perf increase with the stacked cache models. Intel isn't even on the radar.
Even if that claim is remotely true, the key here is up to.
Most new architectures are up to ~40-50% faster than their predecessors. We should expect this much.

We have to wait and see how much a massive L3 cache matters for various real world use cases.

The primary reason they went for P/E-core config is current Intel ringbus architecture maxed out at 12 slots for CPU cores per ring
As demonstrated in the Xeon e5 v4 series.
And exactly the reason why they went for mesh architecture.
The ringbus vs. mesh design has to do with core layout. We've had this discussion since the quad core days, yet the ring bus is keeping up just fine. I see no reason why the ringbus would be a problem for mainstream use for even 16 cores.

Intel have to do the E-cores because their main cores are too power hungry, and they want to win multi threaded benchmarks
Sure, synthetic benchmarks matters a lot to the enthusiast market, but you're missing the bigger picture. The main reason for the big-little design in desktops is they have hit the clock speed "wall" and (big) core count "wall", and the big PC makers like Dell, HP, Lenovo, etc. mostly sells upgrades based on "specs".

E-core IPC today is in the same range as Skylake or Zen+ which is not bad at all.
With a shared L2 the real world performance would be quite different with load on multiple small cores. This is one of the reasons why it's important to distinguish performance and IPC.
 
With a shared L2 the real world performance would be quite different with load on multiple small cores. This is one of the reasons why it's important to distinguish performance and IPC.

I'm unaware of any CPU made, be it AMD, Intel, or IBM POWER, or ARM, that did data-sharing in "close" caches (L1). All data-sharing is in LLC (last-level cache).
 
I, too, run a home server and compile desktop programs to be distributed to other clients. Makes me wonder how much performance is lost because Windows 11 has to ensure binaries run on both P and E cores. Provided it's not as bad as i686 being the common denominator, but surely cache sizes and lines are different between the P and E cores to cause pipe-line stalls.

Well, we would have no idea how its gonna behave unless someone tries to cross the mine field.
I have no time and resources to do that, so I would avoid these products for some type of use cases for now.

The ringbus vs. mesh design has to do with core layout. We've had this discussion since the quad core days, yet the ring bus is keeping up just fine. I see no reason why the ringbus would be a problem for mainstream use for even 16 cores.

The largest ring ever created by Intel was in Xeon e5 v4.
With a total of 17 ring stops in the largest ring.
An Intel mainstream CPU needs 1 ring stop for each of the following : IMC , PCI-E controller , QPI link , iGPU
That leaves 13 ring stops left.
Since Intel does not do odd number cores anymore, it is 12 cores max.

Maybe , just maybe, sometime they will come up with 16 cores single ringbus.
But that means a 20 ring stop ringbus.
Will core to core latency become a huge concern ?
 
Ring stops do be causing Sonic a lot of speed run latency issues. Is it possible to 3D stack a ringbus and have them run in reverse order to reduce latency!? Least important latency stuff could be sandwiched in the middle. Perhaps something a bit like Apple's hybrid memory subsystem for the ringbus over substrate?
 
Last edited:
E cores have no place in any desktop whatsoever. Gaming or not.
Laptops, sure.
Did you not see the E-core review that W1zz did? At 4k, the E-cores perform almost as well as the P-cores. You only see a difference at lower resolutions because each frame takes less GPU power to render and completes faster. All in all, that's pretty darn good. Are they perfect, no, but given the power consumption and how many of these cores you can fit into the same area as a single P-core makes it a nice option for a lot of different workloads. Also, what good is more P-cores if you're already hitting a thermal limit. Not every machine is going to have a huge honking cooler and not everyone wants a CPU with a power limit north of 200 watts.
 
Back
Top