Thursday, August 10th 2023

Atlas Fallen Optimization Fail: Gain 50% Additional Performance by Turning off the E-cores

Action RPG "Atlas Fallen" joins a long line of RPGs this Summer for you to grind into—Baldur's Gate 3, Diablo 4, and Starfield. We've been testing the game for our GPU performance article, and found something interesting—the game isn't optimized for Intel Hybrid processors, such as the Core i9-13900K "Raptor Lake" in our bench. The game scales across all CPU cores—which is normally a good thing—until we realize that not only does it saturate all of the 8 P-cores, but also the 16 E-cores. It ends up with under 80 FPS in busy gameplay at 1080p with a GeForce RTX 4090. Performance is "restored" only when the E-cores are disabled.

Normally, when a game saturates all of the E-cores, we don't interpret it as the game being "aware" of E-cores, but rather "unaware" of them. An ideal Hybrid-aware game should saturate the P-cores for its main workload, and use the E-cores for errands such as processing the audio stack (DSPs from the game), network stack (the game's unique multiplayer network component), physics, in-flight decompression of assets from the disk, etc., which show up in Task Manager as intermittent, irregular load. "Atlas Fallen" appears to be using the E-cores for its main worker threads, and this is found imposing a performance penalty as we found out by disabling the E-cores. This performance penalty is because the E-cores run slower than P-cores, at lower clock speeds, have much lower IPC, and are cache-starved. Frame data being processed by the P-cores end up having to wait for those from the E-cores, which causes the overall framerate to come down.
In the Task Manager screenshot above, the game is running in the foreground, we set Task Manager to be "always on top," so Thread Director won't interfere with the game. It prefers to allocate the P-cores to foreground tasks, which doesn't happen here, because the developers chose to specifically put work on the E-Cores.

For comparison we took four screenshots, with E-Cores enabled and disabled (through BIOS). We picked a "typical average" scene instead of a worst case, which is why the FPS are a bit higher. As you can see, with E-Cores enabled are pretty low (136 / 152 FPS), whereas turning off the E-Cores instantly increases performance right up to the engine's internal FPS cap (187 / 197 FPS).

With the E-cores disabled, the game is confined to what is essentially an 8-core/16-thread processor with just P-cores, which boost well above the 5.00 GHz mark, and have the full 36 MB slab of L3 cache to themselves. The framerate now shoots up to 200 FPS, which is a hard framerate limit set by the developer. Our RTX 4090 should be capable of higher framerates, and developers Deck13 Interactive should consider raising it, given that monitor refresh-rates are on the rise, and it's fairly easy to find a 240 Hz or 360 Hz monitor in the high-end segment. The game is based on the Fledge engine, and supports both DirectX 12 and Vulkan APIs. We used GeForce 536.99 WHQL in our testing. Be sure to check out our full performance review of Atlas Fallen later today.
Add your own comment

120 Comments on Atlas Fallen Optimization Fail: Gain 50% Additional Performance by Turning off the E-cores

#76
Crackong
W1zzardSource? There's nothing to optimize for, Cinebench simply spawns one worker thread per core and runs work on it non-stop, until done
I mean the Intel side thread director actively recognized Cinebench as a high pirority programme and spread the loads evenly in the optimzed way.
Not Cinebench side.
Posted on Reply
#77
bug
W1zzardThey split it into blocks of pixels, but same thing.. this is trivial, it's just a few lines of code


There is just one synchronization per run, so one after a few minutes, this isn't even worth calling "synchronization". I doubt that it submits the last chunk onto a faster core, if it's waiting for a slower core to finish that last piece
Ah, the benchmark is just one static scene?
Posted on Reply
#78
R0H1T
CrackongExcept a few programmes which Intel themselves optimized their thread director very heavily on like Cinebench.
It's a load of BS, tell me what you gain from this? When ultimately it's the OS scheduler/governor that's making the decisions :slap:
The Intel® Thread Director supplies the behind-the-scenes magic that maximizes hybrid performance.

Built directly into the hardware3, the Thread Director uses machine learning to schedule tasks on the right core at the right time (opposed to relying on static rules). This helps ensure that Performance-cores and Efficient-cores work in concert; background tasks don’t slow you down, and you can have more apps open simultaneously.

Here’s how the Intel® Thread Director works:
  • It monitors the runtime instruction mix of each thread and the state of each core with nanosecond precision.
  • It provides runtime feedback to the OS to make the optimal decision for any workload.
  • It dynamically adapts its guidance according to the Thermal Design Point (TDP) of the system, operating conditions, and power settings.
By identifying the class of each workload and using its energy and performance core scoring mechanism, the Intel® Thread Director helps the OS schedule threads on the best core for performance or efficiency.

The end result is performance gains in many demanding gaming scenarios, such as streaming your game and recording gameplay footage at the same time. You get a smoother gaming experience with a higher FPS, your followers get a better viewing experience with higher-quality streams, and your gameplay captures look better, too.
Magic really :laugh:
Posted on Reply
#79
THU31
Thread Director doesn't do anything for apps that utilize 100% of the CPU. Cinebench is one of those apps. There's nothing to direct when all cores can and need to be used for max performance.

But if you're running an app that's using 100% of the CPU and then you perform a different task, Thread Director will decide where to allocate that task, whether it needs to be completed quickly or not. Same with performing other tasks (mainly background) while gaming.

All this is just a theory, though. I don't think it's even possible to measure the potential benefits of this technology. Personally I don't see myself ever enabling E-cores in a gaming setup and have no interest in hybrid architectures on desktop.
Posted on Reply
#80
bug
THU31Thread Director doesn't do anything for apps that utilize 100% of the CPU. Cinebench is one of those apps. There's nothing to direct when all cores can and need to be used for max performance.
If the OS decides to start some background task while you crunch, the TD may be able to hint that it needs to go to an E-core. Just guessing, since I don't know exactly what it does, where it stops and where the OS takes over.
Posted on Reply
#81
mechtech
Does this happen in Win 10 as well or only win 11??
Posted on Reply
#82
bug
mechtechDoes this happen in Win 10 as well or only win 11??
@W1zzard has spoon-fed you the answer: the game explicitly disables the detection of E-cores, treating everything the same. That's exactly what Win10 does, being unaware of E-cores.
Posted on Reply
#83
Max(IT)
Hybrid processors are here since a while now, and it’s unacceptable for a software house not to be able to take this architecture into consideration. This is laziness and incompetence …
Posted on Reply
#84
zlobby
Max(IT)This is laziness and incompetence …
No, that is intel. Oh, wait, this is the same.
Posted on Reply
#85
Max(IT)
atomekThe main failure was to introduce P/E idea to desktop CPUs. Who need efficient cores anyway in desktop?
That’s only if you don’t understand how it works
GarrusActually Alder Lake was a disaster for months after launch. I had lots of problems. It was all fixed up, but E cores are still causing some issues.

I switched to the Ryzen 7800X3D and couldn't be happier.
I had a 12700K two months after release: never had a single issue.
Posted on Reply
#86
Crackong
Max(IT)That’s only if you don’t understand how it works
I had a 12700K two months after release: never had a single issue.
May I ask you a question.

Why P+E Hybrid architecture isn't introduced into the Xeon scalable, where the BIG moneys at ?
If P+E Hybrid architecture is so good and so efficient and lived up to its name and as advertised and flawless and "never had a single issue" .
Why can't the enterprise users enjoy a 20P + 100E = 120core CPU right now?

There are so much money and talent in the enterprise space so there shouldn't be laziness and incompetence right?
Why Intel themselves had to limit Xeon scalable to P-cores only or E-cores only?
Why not Both?
Intel don't even talk about P+E Hybrid architecture Xeon in their roadmap until 2025.

Could you answer that?
Posted on Reply
#87
bug
CrackongMay I ask you a question.

Why P+E Hybrid architecture isn't introduced into the Xeon scalable, where the BIG moneys at ?
If P+E Hybrid architecture is so good and so efficient and lived up to its name and as advertised and flawless and "never had a single issue" .
Why can't the enterprise users enjoy a 20P + 100E = 120core CPU right now?

There are so much money and talent in the enterprise space so there shouldn't be laziness and incompetence right?
Why Intel themselves had to limit Xeon scalable to P-cores only or E-cores only?
Why not Both?
Intel don't even talk about P+E Hybrid architecture Xeon in their roadmap until 2025.

Could you answer that?
That's an easy one. Desktop handles heterogeneous loads. Sometimes a game that needs a handful of fast cores, sometimes a 3D modelling software or something that processes video and images that will take as many cores as you can throw at them. By contrast, unless you are a cloud provider, your servers will see a much more homogeneous load. There's rarely a "light workload" on servers. They tend to crunch or idle, with very little in between.
Posted on Reply
#88
Crackong
bugunless you are a cloud provider
Wait so all the cloud providers purchased so many CPUs and still don't deserve some special care ?

AWS, Google, Azure....
Posted on Reply
#89
Max(IT)
CrackongMay I ask you a question.

Why P+E Hybrid architecture isn't introduced into the Xeon scalable, where the BIG moneys at ?
If P+E Hybrid architecture is so good and so efficient and lived up to its name and as advertised and flawless and "never had a single issue" .
Why can't the enterprise users enjoy a 20P + 100E = 120core CPU right now?

There are so much money and talent in the enterprise space so there shouldn't be laziness and incompetence right?
Why Intel themselves had to limit Xeon scalable to P-cores only or E-cores only?
Why not Both?
Intel don't even talk about P+E Hybrid architecture Xeon in their roadmap until 2025.

Could you answer that?
And again if can’t even understand the difference from a Xeon target workload and a consumer CPU target workload, how could you criticize Intel’s choices?
Posted on Reply
#90
bug
CrackongWait so all the cloud providers purchased so many CPUs and still don't deserve some special care ?

AWS, Google, Azure....
If you ever configured something in the cloud, you know each provider offers an assortment of CPUs to choose from (not limited to x86_64 either). Amazon even built their own to add to the mix.
Posted on Reply
#91
Crackong
bugIf you ever configured something in the cloud, you know each provider offers an assortment of CPUs to choose from (not limited to x86_64 either). Amazon even built their own to add to the mix.
But I don't see P+E cores in the mix.



Jokes aside.
We all know why P+E core doesn't appear in the Xeon Scalable market.
Even as big as VMware had so much trouble making P+E cores working properly in virtualization.
Max(IT)And again if can’t even understand the difference from a Xeon target workload and a consumer CPU target workload, how could you criticize Intel’s choices?
So the competitor AMD could make their CPU architecture suitable for both workloads and Intel had to differentiate themselves so to make 3 architectures co-existed for no apparent positive effects other than driving the cost up?

What a Good Choice !
Posted on Reply
#92
bug
CrackongBut I don't see P+E cores in the mix.

Because they're not needed. The cloud is not a simple desktop or a server. It works differently. You create your heterogeneous environment by mixing up the homogeneous CPUs available. It's cost-ineffective to do the mix in hardware when you can control it via software. And then there's serverless.
Posted on Reply
#93
Max(IT)
CrackongSo the competitor AMD could make their CPU architecture suitable for both workloads and Intel had to differentiate themselves so to make 3 architectures co-existed for no apparent positive effects other than driving the cost up?

What a Good Choice !
AMD and Intel choose a different path, but AMD is stuck in the 16/32 configuration since a while, in the consumer market, while Intel moved on. The hybrid solution is a compromise, but a good one. You cannot just add cores on a consumer CPU.
Posted on Reply
#94
bug
CrackongSo the competitor AMD could make their CPU architecture suitable for both workloads and Intel had to differentiate themselves so to make 3 architectures co-existed for no apparent positive effects other than driving the cost up?

What a Good Choice !
www.techpowerup.com/review/amd-ryzen-9-7950x3d/8.html

13900k will match the mighty 7950X in multithreading while having half the threads running on weaker cores. Clearly not everything about Raptor Lake is as bad as you would have us believe. The costs were driven so high, the Intel CPU can be had for ~$150 less than AMD's.
Posted on Reply
#95
Crackong
bugBecause they're not needed.
So as the never-existed Xeon W-1400 series, when Intel themselves quickly realized they are unfitted vessal for virtualization.
bug13900k will match the mighty 7950X in multithreading while having half the threads running on weaker cores. Clearly not everything about Raptor Lake is as bad as you would have us believe. The costs were driven so high, the Intel CPU can be had for ~$150 less than AMD's.
While consuming double the power consumption (140 vs 276) ?

Also I don't want you to believe Raptor Lake is bad.
Raptor Lake is GOOD when I am using it as a pure 8 core CPU with E-cores disabled, or Having pure P-core SR Xeon running VMs.
Posted on Reply
#96
bug
@Crackong I cannot follow your logic, you're just throwing out random stuff. I'm out.
Posted on Reply
#97
Max(IT)
bug@Crackong I cannot follow your logic, you're just throwing out random stuff. I'm out.
There’s no reasoning with AMD cheerleaders. They really are worse than Apple’s fan
CrackongSo as the never-existed Xeon W-1400 series, when Intel themselves quickly realized they are unfitted vessal for virtualization.



While consuming double the power consumption (140 vs 276) ?

Also I don't want you to believe Raptor Lake is bad.
Raptor Lake is GOOD when I am using it as a pure 8 core CPU with E-cores disabled, or Having pure P-core SR Xeon running VMs.
Little reality check: a 7950X under full load can consume up to 240W
Posted on Reply
#98
AnotherReader
bugwww.techpowerup.com/review/amd-ryzen-9-7950x3d/8.html

13900k will match the mighty 7950X in multithreading while having half the threads running on weaker cores. Clearly not everything about Raptor Lake is as bad as you would have us believe. The costs were driven so high, the Intel CPU can be had for ~$150 less than AMD's.
I think the difference between the two is less than $50. The 7950X is available for $599 at multiple retailers such as Best Buy and Newegg while the 13900k is being sold for $568.2 by Amazon and B&H photo video.
Posted on Reply
#99
bug
Max(IT)Little reality check: a 7950X under full load can consume up to 240W
You can even shave off 100W from ADL/RPL with like a 5% penalty. They're configured pretty badly ootb, but that doesn't mean there's not a power efficient chip in there.
AnotherReaderI think the difference between the two is less than $50. The 7950X is available for $599 at multiple retailers such as Best Buy and Newegg while the 13900k is being sold for $568.2 by Amazon and B&H photo video.
Now maybe, but 7950X launched at $750 MSRP.
Posted on Reply
#100
AnotherReader
bugYou can even shave off 100W from ADL/RPL with like a 5% penalty. They're configured pretty badly ootb, but that doesn't mean there's not a power efficient chip in there.


Now maybe, but 7950X launched at $750 MSRP.
Both AMD and Intel have configured their higher end processors, bar the 7950X3D, pretty badly out of the box. The 7950X launched at $699. As far as this game is concerned, it seems to be the case of a game developer thinking they can outsmart the OS scheduler.
Posted on Reply
Add your own comment
May 9th, 2024 16:32 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts