Thursday, August 10th 2023
Atlas Fallen Optimization Fail: Gain 50% Additional Performance by Turning off the E-cores
Action RPG "Atlas Fallen" joins a long line of RPGs this Summer for you to grind into—Baldur's Gate 3, Diablo 4, and Starfield. We've been testing the game for our GPU performance article, and found something interesting—the game isn't optimized for Intel Hybrid processors, such as the Core i9-13900K "Raptor Lake" in our bench. The game scales across all CPU cores—which is normally a good thing—until we realize that not only does it saturate all of the 8 P-cores, but also the 16 E-cores. It ends up with under 80 FPS in busy gameplay at 1080p with a GeForce RTX 4090. Performance is "restored" only when the E-cores are disabled.
Normally, when a game saturates all of the E-cores, we don't interpret it as the game being "aware" of E-cores, but rather "unaware" of them. An ideal Hybrid-aware game should saturate the P-cores for its main workload, and use the E-cores for errands such as processing the audio stack (DSPs from the game), network stack (the game's unique multiplayer network component), physics, in-flight decompression of assets from the disk, etc., which show up in Task Manager as intermittent, irregular load. "Atlas Fallen" appears to be using the E-cores for its main worker threads, and this is found imposing a performance penalty as we found out by disabling the E-cores. This performance penalty is because the E-cores run slower than P-cores, at lower clock speeds, have much lower IPC, and are cache-starved. Frame data being processed by the P-cores end up having to wait for those from the E-cores, which causes the overall framerate to come down.In the Task Manager screenshot above, the game is running in the foreground, we set Task Manager to be "always on top," so Thread Director won't interfere with the game. It prefers to allocate the P-cores to foreground tasks, which doesn't happen here, because the developers chose to specifically put work on the E-Cores.
For comparison we took four screenshots, with E-Cores enabled and disabled (through BIOS). We picked a "typical average" scene instead of a worst case, which is why the FPS are a bit higher. As you can see, with E-Cores enabled are pretty low (136 / 152 FPS), whereas turning off the E-Cores instantly increases performance right up to the engine's internal FPS cap (187 / 197 FPS).
With the E-cores disabled, the game is confined to what is essentially an 8-core/16-thread processor with just P-cores, which boost well above the 5.00 GHz mark, and have the full 36 MB slab of L3 cache to themselves. The framerate now shoots up to 200 FPS, which is a hard framerate limit set by the developer. Our RTX 4090 should be capable of higher framerates, and developers Deck13 Interactive should consider raising it, given that monitor refresh-rates are on the rise, and it's fairly easy to find a 240 Hz or 360 Hz monitor in the high-end segment. The game is based on the Fledge engine, and supports both DirectX 12 and Vulkan APIs. We used GeForce 536.99 WHQL in our testing. Be sure to check out our full performance review of Atlas Fallen later today.
Normally, when a game saturates all of the E-cores, we don't interpret it as the game being "aware" of E-cores, but rather "unaware" of them. An ideal Hybrid-aware game should saturate the P-cores for its main workload, and use the E-cores for errands such as processing the audio stack (DSPs from the game), network stack (the game's unique multiplayer network component), physics, in-flight decompression of assets from the disk, etc., which show up in Task Manager as intermittent, irregular load. "Atlas Fallen" appears to be using the E-cores for its main worker threads, and this is found imposing a performance penalty as we found out by disabling the E-cores. This performance penalty is because the E-cores run slower than P-cores, at lower clock speeds, have much lower IPC, and are cache-starved. Frame data being processed by the P-cores end up having to wait for those from the E-cores, which causes the overall framerate to come down.In the Task Manager screenshot above, the game is running in the foreground, we set Task Manager to be "always on top," so Thread Director won't interfere with the game. It prefers to allocate the P-cores to foreground tasks, which doesn't happen here, because the developers chose to specifically put work on the E-Cores.
For comparison we took four screenshots, with E-Cores enabled and disabled (through BIOS). We picked a "typical average" scene instead of a worst case, which is why the FPS are a bit higher. As you can see, with E-Cores enabled are pretty low (136 / 152 FPS), whereas turning off the E-Cores instantly increases performance right up to the engine's internal FPS cap (187 / 197 FPS).
With the E-cores disabled, the game is confined to what is essentially an 8-core/16-thread processor with just P-cores, which boost well above the 5.00 GHz mark, and have the full 36 MB slab of L3 cache to themselves. The framerate now shoots up to 200 FPS, which is a hard framerate limit set by the developer. Our RTX 4090 should be capable of higher framerates, and developers Deck13 Interactive should consider raising it, given that monitor refresh-rates are on the rise, and it's fairly easy to find a 240 Hz or 360 Hz monitor in the high-end segment. The game is based on the Fledge engine, and supports both DirectX 12 and Vulkan APIs. We used GeForce 536.99 WHQL in our testing. Be sure to check out our full performance review of Atlas Fallen later today.
120 Comments on Atlas Fallen Optimization Fail: Gain 50% Additional Performance by Turning off the E-cores
Not Cinebench side.
But if you're running an app that's using 100% of the CPU and then you perform a different task, Thread Director will decide where to allocate that task, whether it needs to be completed quickly or not. Same with performing other tasks (mainly background) while gaming.
All this is just a theory, though. I don't think it's even possible to measure the potential benefits of this technology. Personally I don't see myself ever enabling E-cores in a gaming setup and have no interest in hybrid architectures on desktop.
Why P+E Hybrid architecture isn't introduced into the Xeon scalable, where the BIG moneys at ?
If P+E Hybrid architecture is so good and so efficient and lived up to its name and as advertised and flawless and "never had a single issue" .
Why can't the enterprise users enjoy a 20P + 100E = 120core CPU right now?
There are so much money and talent in the enterprise space so there shouldn't be laziness and incompetence right?
Why Intel themselves had to limit Xeon scalable to P-cores only or E-cores only?
Why not Both?
Intel don't even talk about P+E Hybrid architecture Xeon in their roadmap until 2025.
Could you answer that?
AWS, Google, Azure....
Jokes aside.
We all know why P+E core doesn't appear in the Xeon Scalable market.
Even as big as VMware had so much trouble making P+E cores working properly in virtualization. So the competitor AMD could make their CPU architecture suitable for both workloads and Intel had to differentiate themselves so to make 3 architectures co-existed for no apparent positive effects other than driving the cost up?
What a Good Choice !
13900k will match the mighty 7950X in multithreading while having half the threads running on weaker cores. Clearly not everything about Raptor Lake is as bad as you would have us believe. The costs were driven so high, the Intel CPU can be had for ~$150 less than AMD's.
Also I don't want you to believe Raptor Lake is bad.
Raptor Lake is GOOD when I am using it as a pure 8 core CPU with E-cores disabled, or Having pure P-core SR Xeon running VMs.