Tuesday, November 23rd 2021

PlayStation 3 Emulator Delivers Modest Speed-Ups with Disabled E-Cores on Intel Alder Lake Processors

According to some testing performed by the team behind RPCS3, a free and open-source emulation software for Sony's PlayStation 3, Intel's Alder Lake processors are enjoying a hefty performance boost when E-Cores is disabled. First of all, the Alder Lake processors feature a hybrid configuration with high-performance P-cores and low-power E-cores. The P-cores are based on Golden Cove architecture and can execute AVX-512 instructions with ease. However, the AVX-512 boost is only applicable when E-cores are disabled as software looks at the whole package. Officially, Alder Lake processors don't support AVX-512, as the processor's little E-cores cannot execute AVX-512 instruction.

Thanks to the team behind the RPCS3 emulator, we have some information and tests that suggest that turning E-cores off gives a performance boost to the emulation speed and game FPS. With E-Cores disabled, and only P-cores left, the processor can execute AVX-512 and gets a higher ring ratio. This means that latency in the ring bus is presumably lower. The team benchmarked Intel Core i9-12900K, and Core i9-11900K processors clocked at 5.2 GHz for tests. The Alder Lake chip had disabled E-cores. In God of War: Ascension, the Rocket Lake processor produced 68 FPS, while Alder Lake produced 78 FPS, representing around 15% improvement.
This suggests that more applications can take advantage of disabling E-cores, especially if the application has support for AVX-512 instructions, where only P-cores can execute them. So it remains to be seen throughout trial and error if more cases like this appear.
Source: RPCS3
Add your own comment

39 Comments on PlayStation 3 Emulator Delivers Modest Speed-Ups with Disabled E-Cores on Intel Alder Lake Processors

#1
ViperXTR
This is what ive been waiting for to see, hope others would use RPCS3 as a CPU benchmark like they did before with dolphin

sample test of RPCS3 running Red dead redemption on several CPUs
Posted on Reply
#2
Chomiq
ViperXTRThis is what ive been waiting for to see, hope others would use RPCS3 as a CPU benchmark like they did before with dolphin

sample test of RPCS3 running Red dead redemption on several CPUs
Yeah but you would have to retest everything with each new build of RPCS3.
Posted on Reply
#3
ViperXTR
ChomiqYeah but you would have to retest everything with each new build of RPCS3.
Just stick to one build then for a test, or maybe a common build for benchmark only like what Dolphin did
Posted on Reply
#4
qubit
Overclocked quantum bit
Something is badly wrong with a CPU architecture if disabling half the cores results in a performance improvement.

Say what you want about unoptimized software allegedly being the issue, but the bottom line is that we have 16 core CPU with 8 low performance cores rather than the full complement of 16 performance cores as it should be. I really don't like this hybrid design and feel that the consumer (us) is getting cheated out of a lot of performance.

AMD really need to come back with Alder Lake beating performance with all cores being performance cores, or this situation will continue.
Posted on Reply
#5
mb194dc
Pretty pointless application? You can get a PS3 off Ebay for about £50 and run anything on the native hardware, saving yourself the huge upgrade cost to Alder Lake for this purpose!
Posted on Reply
#6
lilunxm12
ViperXTRThis is what ive been waiting for to see, hope others would use RPCS3 as a CPU benchmark like they did before with dolphin

sample test of RPCS3 running Red dead redemption on several CPUs
That's basically an SIMD test. 2500 as a desktop processor only has 1/3 performance of 7700HQ because the former lacks AVX2 support.
qubitSomething is badly wrong with a CPU architecture if disabling half the cores results in a performance improvement.

Say what you want about unoptimized software allegedly being the issue, but the bottom line is that we have 16 core CPU with 8 low performance cores rather than the full complement of 16 performance cores as it should be. I really don't like this hybrid design and feel that the consumer (us) is getting cheated out of a lot of performance.

AMD really need to come back with Alder Lake beating performance with all cores being performance cores, or this situation will continue.
Because P-core only enables AVX512, which wasn't very useful outside of several cases and may cause unexpected throttling
Posted on Reply
#7
Vya Domus
qubitSomething is badly wrong with a CPU architecture if disabling half the cores results in a performance improvement.
There is nothing wrong with the architecture (apart from the horrid power efficiency), what is wrong is that these types of processors have no place in desktop.
qubitAMD really need to come back with Alder Lake beating performance with all cores being performance cores, or this situation will continue.
I wish the same but this is probably a cope, AMD will likely move to big.LITTLE as well because of the better margins.
Posted on Reply
#8
Tartaros
qubitSay what you want about unoptimized software allegedly being the issue, but the bottom line is that we have 16 core CPU with 8 low performance cores rather than the full complement of 16 performance cores as it should be. I really don't like this hybrid design and feel that the consumer (us) is getting cheated out of a lot of performance.
16 full cores for office and gaming use is just overkill. Why would you need 16 p cores for most people? The problem is not having less p cores rather than giving up features to use the e cores, is a bad implementation, not a bad idea.
Posted on Reply
#9
Xex360
qubitSomething is badly wrong with a CPU architecture if disabling half the cores results in a performance improvement.

Say what you want about unoptimized software allegedly being the issue, but the bottom line is that we have 16 core CPU with 8 low performance cores rather than the full complement of 16 performance cores as it should be. I really don't like this hybrid design and feel that the consumer (us) is getting cheated out of a lot of performance.

AMD really need to come back with Alder Lake beating performance with all cores being performance cores, or this situation will continue.
I agree, it doesn't make sense especially for high performance desktop CPUs, they could use this for laptops or office desktops with say 2 P cores and few E cores (as they are comparable to a 7700K core).
Posted on Reply
#10
Cobain
mb194dcPretty pointless application? You can get a PS3 off Ebay for about £50 and run anything on the native hardware, saving yourself the huge upgrade cost to Alder Lake for this purpose!
Not to mention you can play every PS3 game for free using cloud service. They are all locked to 720p 30fps natively anyway. I don't believe someone has a reason to spend their time playing 20 ps3 titles. Maybe the ocasional gem here and there, like Red Dead. You complete it and move on to other games
Posted on Reply
#11
napata
qubitSomething is badly wrong with a CPU architecture if disabling half the cores results in a performance improvement.

Say what you want about unoptimized software allegedly being the issue, but the bottom line is that we have 16 core CPU with 8 low performance cores rather than the full complement of 16 performance cores as it should be. I really don't like this hybrid design and feel that the consumer (us) is getting cheated out of a lot of performance.

AMD really need to come back with Alder Lake beating performance with all cores being performance cores, or this situation will continue.
AMD is probably going to follow Intel with Big.Little as it's the best way to scale up multicore CPU performance. The complaints are just based on ignorance. It's not 16 P-cores vs 8P+8E cores but 10 P-cores vs 8P+8E cores. Hybrid brings us more performance as stuff just doesn't scale linearly.

If software wasn't an issue you'd want a CPU that is entirely made up of E-cores as they're just more efficient in die space so you always get more performance. Unfortunately a lot of software just doesn't scale well so there's a benefit to use big cores but they're just not a cost effective use of your silicon. That's also why all upcoming known Intel architectures keep 8 P-cores and scale up the E-cores. The software that P-cores are designed for don't really use more than 8 cores anyway atm.

This discussion is just a repeat of the whole single core vs multicore CPUs. Back then we also sacrificied single core performance for the sake of having more cores. Conroe just had the benefit of being a massive increase in performance.

Also to adress your first sentence: In games and software with limited scaling disabling HT usually also leads to a performance increase. Or if you disable a CCD on a 5950x you generally also gain performance in software that doesn't scale, like games.
Posted on Reply
#12
Punkenjoy
The main thing is Intel should have include a downgraded version of AVX512 on the E-Cores. They could run the instruction but much slower than on the P core to reduce the amount of transistors used. This way they could have kept it on the P-Cores and the thread director could have moved the slight AVX512 load on the P core.

I bet the problem is they couldn't make it without investing too much transistors or crippling too much the performance. A downside of this approach is that these emulator aren't your typical AVX512 load. Generally those load are all cores fully multithreaded.
Posted on Reply
#13
Dr. Dro
napataAMD is probably going to follow Intel with Big.Little as it's the best way to scale up multicore CPU performance. The complaints are just based on ignorance. It's not 16 P-cores vs 8P+8E cores but 10 P-cores vs 8P+8E cores. Hybrid brings us more performance as stuff just doesn't scale linearly.

If software wasn't an issue you'd want a CPU that is entirely made up of E-cores as they're just more efficient in die space so you always get more performance. Unfortunately a lot of software just doesn't scale well so there's a benefit to use big cores but they're just not a cost effective use of your silicon. That's also why all upcoming known Intel architectures keep 8 P-cores and scale up the E-cores. The software that P-cores are designed for don't really use more than 8 cores anyway atm.

This discussion is just a repeat of the whole single core vs multicore CPUs. Back then we also sacrificied single core performance for the sake of having more cores. Conroe just had the benefit of being a massive increase in performance.

Also to adress your first sentence: In games and software with limited scaling disabling HT usually also leads to a performance increase. Or if you disable a CCD on a 5950x you generally also gain performance in software that doesn't scale, like games.
This used to be particularly true with the Zen 2 design, but it is not as much of a problem on the 5950X as it was on the 3950X, as Zen 3 has a single CCX per CCD and full access to the processor's resources at any given moment, I can't think of any given case where restricting applications to a single CCX actually mattered, if anything any potential performance increase would be from higher power allowance and more aggressive clock speeds attained by keeping one of the dies mostly or completely unloaded and that's in situations where threads interleaving between both dies wouldn't benefit to begin with.

I sympathize in general with qubit's line of thought (and why I plan on buying the 3D upgrade for my processor), but I also see the appeal in Alder Lake's hybrid design and where Intel wants to go with it... I just think the Windows ecosystem is not exactly ready for such advanced technology yet. A few friends of mine have upgraded to the i7-12700K and seem to be very pleased with the result, they do pull their own weight for gaming, although something that struck me as odd is that there are games that answer better to being run on the big cores and others that run better on the little cores, meaning that it isn't the big cores that always invariably win the race. That leads to some inconsistency, and to be really frank, my 5950X is plenty fast as it is, I'm only upgrading because of some accounting magic, I give this to my brother, upgrade, sell the 3900XT I left with him last year, 2 people upgrading for less than 1 CPU's full price... everyone wins, that's the idea at least. Hopefully the pricing will be sensible, or even that will not be worth it.
Posted on Reply
#14
TheDeeGee
So far i didn't regret not waiting and getting a 11700 instead.

While the future of CPUs is looking good, it's clear this new architecture has to mature first.
Posted on Reply
#15
DuxCro
mb194dcPretty pointless application? You can get a PS3 off Ebay for about £50 and run anything on the native hardware, saving yourself the huge upgrade cost to Alder Lake for this purpose!
I actually bought PS3 Super Slim 500GB this summer. Never had a PS3 before. Some games clearly look amazing. Like Resistance 3 and Killzone 3. However, low resolution prevents them from shining. I played Legend of Zelda:BOTW on CEMU in 4K/60fps and it's a game changer.
Posted on Reply
#16
Vayra86
Im not touching big little until Windows 12, that's for sure.

Probably won't take longer than a year or two, knowing MS and its strategic outlook :D
mb194dcPretty pointless application? You can get a PS3 off Ebay for about £50 and run anything on the native hardware, saving yourself the huge upgrade cost to Alder Lake for this purpose!
Sure, but you can't emulate on a PS3, its slow as shit for some games (Heavenly Sword could easily run sub 20 FPS on a ps3, and its no exception), hot and loud for others, you have storage media limitations or a failing BR lens, PS Network is no real added use anymore, should I go on?

And let's not begin about the content itself. Its not like they get released any longer.

Even emulating PS2 on a PC is 10x better than the OG console. Even if only just for save states.
Posted on Reply
#17
ViperXTR
I was hoping more on emulation architecture discussion but instead became more of a cpu discussion >_>
Posted on Reply
#18
windwhirl
ViperXTRJust stick to one build then for a test, or maybe a common build for benchmark only like what Dolphin did
There can be multiple builds in one day. And any one you pick won't necessarily be stable on all hardware.

Also, RPCS3 is alpha-level software. You don't want to benchmark with that.
mb194dcPretty pointless application? You can get a PS3 off Ebay for about £50 and run anything on the native hardware, saving yourself the huge upgrade cost to Alder Lake for this purpose!
There's a limited number of PS3 consoles in the world. And they're dwindling every day as they fail or break. Never mind that their current owners might not be willing to let go of them for some time yet.
Posted on Reply
#19
Dr. Dro
ViperXTRI was hoping more on emulation architecture discussion but instead became more of a cpu discussion >_>
I mean, these are all x86-64 processors and end of the day an emulator's job is to translate native machine code into something that your processor can execute, something that can be attained through an interpreter (which tends to be slow, but more accurate and universally predictable) or dynamic JIT (just-in-time compilation, or batched translation of bytecode at runtime, allowing for target architecture specific optimizations) among other methods, those two being the most common... This will be the case regardless of whether you run Sandy Bridge or Zen 3, or if you're running an NES emulator or a PlayStation 3 emulator.

RPCS3 was always particularly Intel-friendly, but that's because the emulator's sensitive to a few things that Intel chips currently do better and that their CPUs have historically had a bit of an advantage as far as instruction sets go. This emulator in particular has pioneered use of all of these instructions, like TSX, 256-bit and 512-bit AVX, etc, I wouldn't fault any of its developers for preferring Intel for their development machines. I applaud their use of pioneering instruction sets, even if unsupported by many modern CPU microarchitectures.

The lukewarm reaction in this thread is probably expected, RPCS3 is hardly representative of any real-world or meaningful advantage of the Intel architecture vs. the AMD one, as it has always traditionally been Intel-biased. It's not a bad thing, there are other places where Ryzen will shine particularly bright, as well. :)
Posted on Reply
#20
Punkenjoy
Dr. DroThis used to be particularly true with the Zen 2 design, but it is not as much of a problem on the 5950X as it was on the 3950X, as Zen 3 has a single CCX per CCD and full access to the processor's resources at any given moment, I can't think of any given case where restricting applications to a single CCX actually mattered, if anything any potential performance increase would be from higher power allowance and more aggressive clock speeds attained by keeping one of the dies mostly or completely unloaded and that's in situations where threads interleaving between both dies wouldn't benefit to begin with.
The thing is on Zen 2, communication between CCX had to go thru the I/O die. The infinity fabric could become saturated by all those access and it had to compete with memory and i/o access too. And this round trip to the I/O die was costly on latency and power usage.

On Zen 3, all core within the CCD can communicate directly with each other but still have to go thru the I/O die via infinity fabrics and this have a latency impact. There are application that are faster on the 5800x than on the 5900x because they are affected by that latency. By example



But those are rare and generally, the higher frequency compensate the latency problem. It's true that the OS should just use the 5950x as a Single CCD but it's harder to implement in real life than in theory. It's more up to the application to establish that.
Posted on Reply
#21
Dr. Dro
PunkenjoyThe thing is on Zen 2, communication between CCX had to go thru the I/O die. The infinity fabric could become saturated by all those access and it had to compete with memory and i/o access too. And this round trip to the I/O die was costly on latency and power usage.

On Zen 3, all core within the CCD can communicate directly with each other but still have to go thru the I/O die via infinity fabrics and this have a latency impact. There are application that are faster on the 5800x than on the 5900x because they are affected by that latency. By example

But those are rare and generally, the higher frequency compensate the latency problem. It's true that the OS should just use the 5950x as a Single CCD but it's harder to implement in real life than in theory. It's more up to the application to establish that.
I mean, with the 5950X, you have two complete dies inside the processor, so if you run into this very specific fringe scenario you've mentioned, you can disable one of the CCDs and by all intents have a fully functional 5800X in there. The latency impact is more than likely one of the things AMD hopes to mitigate with the 3D cache, and one of the reasons I believe that it will work. I wonder how would chiplet technology end up affecting a GPU (for graphics rendering purposes), I really do. Aldebaran's more of a compute processor than anything.
Posted on Reply
#22
Punkenjoy
Dr. DroI mean, with the 5950X, you have two complete dies inside the processor, so if you run into this very specific fringe scenario you've mentioned, you can disable one of the CCDs and by all intents have a fully functional 5800X in there. The latency impact is more than likely one of the things AMD hopes to mitigate with the 3D cache, and one of the reasons I believe that it will work. I wonder how would chiplet technology end up affecting a GPU (for graphics rendering purposes), I really do. Aldebaran's more of a compute processor than anything.
A larger Cache won't help to reduce latency as a data that got processed by another CCD will remain there, they will just be able to hold more.

CCD to CCD isn't much faster than memory access so it won't really help there. What AMD could do with a larger interposer is to add Infinity fabrics link between CCD. This should cut the CCD to CCD latency by half at least.

As for GPU, again it will depend of the kind of code it will run. if it has to do a lot of sync, that will not be beneficial to have 2 CPU on the same die instead of 1 big. If all data is very contain and have a very high level of parallelism, it won't matter much. (like Zen 2 still do very great on video encoding, 3d rendering etc.)
Posted on Reply
#23
chrcoluk
mb194dcPretty pointless application? You can get a PS3 off Ebay for about £50 and run anything on the native hardware, saving yourself the huge upgrade cost to Alder Lake for this purpose!
Post processing AA
Save states
Massively less cable clutter, and room required to house console.
Ability to memory hack games.

Definite advantage,
Posted on Reply
#24
Ferrum Master
More cases like this will not appear.

Where else you would need to mimic instruction sets of a super complex CELL CPU? Also a lot of contributes the raw over 5GHz single core boost. Not only the AVX512. The added performance number corelates more with the added frequency gap.

Actually the emulator is usable, I have played Metal Gear 4 on it. Occasional freezing is more an issue than the lack of CPU power. It is 30FPS limited ingame either way, so what's the fuss?

The LLVM needs a lot of job... and it still has poor multithreading, they experiment a lot in certain way, but it lacks the desired result often.
Posted on Reply
#25
windwhirl
PunkenjoyThere are application that are faster on the 5800x than on the 5900x because they are affected by that latency.
I'll add to this that there are reports of 5800X splitting the cores over two dies instead of the single one. Not sure if those are true (so much can go wrong if the testing isn't meticulous), but it's a possibility.

Ah, nevermind, the second chiplet is always disabled
Posted on Reply
Add your own comment