• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Atlas Fallen Optimization Fail: Gain 50% Additional Performance by Turning off the E-cores

Atlas Fallen developers either forgot that E-Cores exist (and simply designed the game to load all cores, no matter their capability), or thought they'd be smarter than Intel

Middle of last year I rotated with a team of nothing but SDEs. They were having issues with performance on one of the new services we were spinning up. We were recording remote sessions and encoding them into video to be retrieved later.

They were not understanding why we were burning 192 core AMD systems and still getting poor performance. All of these guys were pretty removed from HW in general. I explained to them that we need to switch to our GPU compute cluster instead of using CPU threads since the GPUs can do HW En/Decode they were legit shocked.

We switched. Saved hundreds of thousands in internal costs took like 2 weeks for them to recode for GPUs. I got a promotion out of it. I rotated off the team not understanding how they made it that far.

Sometimes these guys literally just sit infront of a game engine and check a box "use all available CPU cores" I swear to god. I was on another team about 8 months later. I had to explain to a TAM (thankfully not an engineer) why 10gb/s links on our storage offload system did NOT mean 10 gigaBYTES/s and that the time quotes they were giving were going to be drastically off.

They got paid more than me.

Always shoot for the stars in your careers people. Even if you dont think you can cut it. The sky is already full of some pretty dim ones.
 
More accurate to say AMD for now.

yep, but they will aim them where it's actually needed (laptop/mobile ) if i am not mistaken :p
 
AMD right about now!
Laugh At Ha Ha GIF by MOODMAN


OS scheduling is independent of thread director, I'm yet to see what TD actually does & how efficient/better it is to a similar but much better software solution I posted in the other thread!

Are you sure? https://www.techpowerup.com/312237/amd-strix-point-companys-first-hybrid-processor-4p-8e-es-surfaces

If this come to be true, are they screwed together, or even worse? Since intel is dominating the technology way earlier?
 
How are they dominating? They literally had to disable AVX512 in ADL, in fact RPL could also have it just permanently(?) disabled.

Also I'm yet to see how/what thread director does? Does anyone actually have any benchmarks for it :wtf:
Middle of last year I rotated with a team of nothing but SDEs. They were having issues with performance on one of the new services we were spinning up. We were recording remote sessions and encoding them into video to be retrieved later.
Most of them I have worked with have no idea about the latest & greatest in hardware, not that they always need to, but you'd think they'd at least try to make themselves acquainted with something as fundamental to their work?
 
Would be interesting to run the same test on a 7950x, once with all 16 cores and once with just 8 cores enabled.

That would tell you whether this is a thread synchronization problem that ends up having so much locking overhead that adding cores makes it slower instead of faster.
 
They're a win for people running mutlithreaded workloads. Not because they're "efficient", but because they can squeeze more perf per sq mm (i.e. you can fit 3-4 E-cores where only 2 P-cores would fit and get better performance as a result).

E cores are not a failure, but, like any heterogenous design, results are not uniform anymore, they will vary with workload.
4 E-Cores=1 P-Core
 
I skimmed through this but I don't see you mentioning if you disabled E-Cores via Bios or just via task-manager affinity
 
They're a win for people running mutlithreaded workloads. Not because they're "efficient", but because they can squeeze more perf per sq mm (i.e. you can fit 3-4 E-cores where only 2 P-cores would fit and get better performance as a result).

Using the 13600K as an example, 8 E-cores offer ~60% more performance compared to 2 P-cores, while using ~40% more power. That's roughly a ~15% gain in efficiency, which is completely irrelevant on desktop.
That's why it actually makes no sense to have a 6P+8E SKU on desktop, when you could have an 8P+0E SKU offering very similar performance and power consumption.

Desktops are always getting the same chips as laptops. They could easily make a 12-core die instead of 8P+16E, but they wouldn't do it just for desktops. Besides, it's great marketing when your top CPU has 24 cores while the competition only has 16.

I don't mind them putting E-cores into i7's and i9's to offer more than 8 cores total, but including E-cores with fewer than 8 P-cores is just IDIOTIC.

There's also a reason why Sapphire Rapids server CPUs don't have E-cores. There's no need whatsoever.
 
Using the 13600K as an example, 8 E-cores offer ~60% more performance compared to 2 P-cores, while using ~40% more power. That's roughly a ~15% gain in efficiency, which is completely irrelevant on desktop.
That's why it actually makes no sense to have a 6P+8E SKU on desktop, when you could have an 8P+0E SKU offering very similar performance and power consumption.

Desktops are always getting the same chips as laptops. They could easily make a 12-core die instead of 8P+16E, but they wouldn't do it just for desktops. Besides, it's great marketing when your top CPU has 24 cores while the competition only has 16.

I don't mind them putting E-cores into i7's and i9's to offer more than 8 cores total, but including E-cores with fewer than 8 P-cores is just IDIOTIC.

There's also a reason why Sapphire Rapids server CPUs don't have E-cores. There's no need

13600K got priced out ultimately, but e cores are not efficiency, they're space saving with a little efficiency sprinkled in. When the 13600 cost as much as the 7700x, and with early AM5 it was a much better value.

Nowadays not so much.
 
  • Like
Reactions: bug
I think the real strategy behind is widen the gap between desktop pcs and workstation-server computers. As it is: reduce pcie lanes, drop AVX512, drop ECC, increase M2 slots, etc.
Camouflage? The last 100 or 200 Mhz you can squezee from 50-100W on 8 "performance" cores.

No desktop consumer wins with E cores. No when you can park your cores for specific package power usages.
 
For those who seems confusing about AMD's version of P&E core :

AMD is using a regular core (P) and cache-reduced core (E)
They are in the SAME architecture, supports the SAME instructions, and works the SAME way in computing tasks.
The only difference is the cache size, which affects the speed of data fetching, so the small cores is slower in certain tasks.
So the system can treat them as "Faster core & Slower core" .
Modern OS deal with that approach for years ever since core boosting is introduced.
Therefore should have no problem loading multi threads within a single programme into AMD's P&E cores simultaneously.

Intel's P & E cores are in completely DIFFERENT architectures, supports DIFFERENT instructions and works in DIFFERENT ways in computing tasks.
And this is the origin of all the scheduling problems we saw since ADL, and the reason behind the AVX512 drama.
The scheduler cannot just simply treat them as "Faster and slower cores" when they are inherently different architectures.
And programmes tend to have problems trying to load multi threads into them simultaneously (Except a few programmes worked very hard in optimizing like Cinebench)
So the approach in Intel side of things is usually treat it as "Faster CPU and Slower CPU".
When you need something fast, slap it into P cores and P cores only
When it is a "Minor Task", slap it into E cores and E cores only
However, no one wants to be a "Minor Task", so everyone (programme) requested working in the P cores when they are loaded.
Then the OS forcefully picked what it think is "Minor" and slap it into E-cores.
Thus creating the situation of "P core working, E cores watching" and vice versa.
And sometimes creating compatibility problems when the scheduler loaded multi threads within a single programme into Intel's P&E cores simultaneously.
And it is tediously horrible in virtualization and the root cause why Intel's own SR Xeon CPUs are either "P cores only" or "E cores only" but not both.


So think twice when someone asking "Strix or not Strix ?" when looking at problems introduced by Intel P&E approach.
Those problems are mainly caused by the DIFFERENT architectures, not different speed.
 
Using the 13600K as an example, 8 E-cores offer ~60% more performance compared to 2 P-cores, while using ~40% more power. That's roughly a ~15% gain in efficiency, which is completely irrelevant on desktop.
That's why it actually makes no sense to have a 6P+8E SKU on desktop, when you could have an 8P+0E SKU offering very similar performance and power consumption.
8e cores take more space than 2P, in the space of 2P you can fit 6.5 or 7 e cores. 7e cores would be still faster than 2p cores in cinebench, but the main problem is tha lack of instructions and there are task which will work faster on the 2P cores. That is the whole problem with the e cores, because the e cores are not always faster.
 
So is it worth it or not to choose cpu with e-core? If you must choose 13400f or 5700x which is the best choice?

Does e-core can be disable only specifically for one game for example at this atlas game?

Does intel 15th gen will still use e-core?
 
Middle of last year I rotated with a team of nothing but SDEs. They were having issues with performance on one of the new services we were spinning up. We were recording remote sessions and encoding them into video to be retrieved later.

They were not understanding why we were burning 192 core AMD systems and still getting poor performance. All of these guys were pretty removed from HW in general. I explained to them that we need to switch to our GPU compute cluster instead of using CPU threads since the GPUs can do HW En/Decode they were legit shocked.

We switched. Saved hundreds of thousands in internal costs took like 2 weeks for them to recode for GPUs. I got a promotion out of it. I rotated off the team not understanding how they made it that far.

Sometimes these guys literally just sit infront of a game engine and check a box "use all available CPU cores" I swear to god. I was on another team about 8 months later. I had to explain to a TAM (thankfully not an engineer) why 10gb/s links on our storage offload system did NOT mean 10 gigaBYTES/s and that the time quotes they were giving were going to be drastically off.

They got paid more than me.

Always shoot for the stars in your careers people. Even if you dont think you can cut it. The sky is already full of some pretty dim ones.

Yeah, it really is shocking how many people in tech and even computer tech(semiconductor/etc) have no idea about HW. Many times at work I facepalm internally. :laugh:
 
The article doesn't describe if the E-core are disabled or if they used something like Process Affinity to limit the process to only use P-cores. If it's the former, then it's very possibly a ring bus issue where if E-cores are active, the clocks of the ring bus are forced to be considerably lower, thus lowering the performance of the P-cores.
The E-Cores were disabled through BIOS, I'll mention that in the article

Also @Battler624 for the same question
 
Last edited:
The E-Cores were disabled through BIOS, I'll mention that in the article

Did you try disabling lots of 4 E-Cores?

Like 8P + 4E, 8P + 8E & 8P + 12E to see what that does or no point?
 
Except a few programmes worked very hard in optimizing like Cinebench
Do you have a source for that? Afaik Cinebench just splits the load, without being aware of anything, which is easy to do, especially if you have "just fast and slower cores". P-Cores will create more pixels, E-Cores fewer, but still contribute as much as they can

So is it worth it or not to choose cpu with e-core? If you must choose 13400f or 5700x which is the best choice?

Does e-core can be disable only specifically for one game for example at this atlas game?

Does intel 15th gen will still use e-core?
13400F is slightly faster than 5700X for gaming (when not GPU limited, https://www.techpowerup.com/review/intel-core-i5-13400f/17.html)

But what you actually want is RPL with the larger cache (13600K in the same chart), it's not about the cores or the mhz, but about the cache.

Which is why 7800X3D is so good: https://www.techpowerup.com/review/amd-ryzen-7-7800x3d/19.html
 
Do you have a source for that? Afaik Cinebench just splits the load, without being aware of anything, which is easy to do, especially if you have "just fast and slower cores". P-Cores will create more pixels, E-Cores fewer, but still contribute as much as they can
There's still work involved into splitting the load into chunks (they're not spinning off one task for each pixel, nor are are they spinning off a task for a whole screen/scene). And the work to wait for all tasks to finish to put a scene back together (synchronization) still exists, even if it's probably simpler than what happens in a game engine.
Though a game engine shouldn't that much different: a faster core can compute the updated path for 12 characters while a slower core will only handle 6, or compute the geometry for 100 objects while the other only computes for 50...

Atlas Fallen has some internal weirdness, I mean, it requires a 6600k for FHD@30fps on low settings.
 
Not sure if cinebench did any specific optimisations for intel's hybrid CPU's I still agree with w1zzard. That itself behaved wrong until I adjusted the CPU scheduler settings in the power schemes to prefer p cores. (no issue on win 11 due to its out of the box intel thread director).

Just the dev's of this game tried to be over clever.
 
But what you actually want is RPL with the larger cache (13600K in the same chart), it's not about the cores or the mhz, but about the cache.

MHz is just as important as cache, though. The 13600K has a 24% higher clock speed. You wouldn't get 22% more performance just from the slight cache increase. That's why the 13600K is slightly faster than the 12900K. Its clock speed is 200 MHz higher and it has more L2 cache, but the L3 cache is 20% smaller.
It's also why the 7600 is faster than the 5800X3D on average (there are exceptions in very cache-sensitive games).

But in relation to the original question, a newer 6C/12T CPU is always better than an older 8C/16T for gaming. The 13400 has better IPC than the 5700X, that's why it's faster, not because of the 4 E-cores.

Fewer cores allows you to push the frequency higher, and that has always been the most important thing for gaming.


Fun fact - Destiny 2 has small hitches when you load between different zones while traversing the world. They were usually very noticeable on my 9700K. On my 13600K @ 3.3 GHz they are still there, but much smaller and less frequent. But at 5.1 GHz they never happen at all.
I'm playing at 4K60, which means that even if a faster CPU doesn't increase your framerate, it can still help with other things like stutters and hitches.
I expect the CPU-attached NVMe drive helps as well to some degree. On the 9700K it was going through the chipset.
 
There's still work involved into splitting the load into chunks (they're not spinning off one task for each pixel, nor are are they spinning off a task for a whole screen/scene).
They split it into blocks of pixels, but same thing.. this is trivial, it's just a few lines of code

And the work to wait for all tasks to finish to put a scene back together (synchronization) still exists, even if it's probably simpler than what happens in a game engine.
There is just one synchronization per run, so one after a few minutes, this isn't even worth calling "synchronization". I doubt that it submits the last chunk onto a faster core, if it's waiting for a slower core to finish that last piece
 
Do you have a source for that?
No I don't.
Maybe I should re-iterate my sentense.
Except a few programmes worked very hard in optimizing like Cinebench
Except a few programmes which Intel themselves optimized their thread director very heavily on like Cinebench.
 
Back
Top