• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"

And that design sucks. Why bork what wasn't broke and worked so much better on Alder Lake and Raptor Lake where the P cores were together and e-cores on the side with still good latency even going to e-core though not as good and excellent. That way P cores were prioritized and e-cores the great secondary auxiliary threads kicked in when needed and it worked very well. And for stuff that scaled to infinite threads it did not matter.

Yes I know e-cores on Arrow Lake being Skymont are a lot stronger, but they still are not as strong as Intel claimed (LMAO Raptor Cove IPC lol, well maybe in cherry picked tests of IPC but not all rounder), so e-cores still not as good and cramming clusters in middle of p cores was a bad design choice.



And I do not think this update will help gaming that much. The tile based latency is real and its not gonna magically make it better or even on par with Raptor Lake which is a monolithic die that has a faster ring clock and P cores better in FP IPC and not far off in Integer IPC with as fast or faster clocks.
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency
 
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency

How much better between IMC and CPU die. And how is it compared to Zen 4 latency?

Isn't Zen 5 a step back in latency compared to Zen 5.

And even so the core layout and topology sucks big time on Arrow Lake.

Zen 3 through Zen 5 have good core to core latency within a single 8 core CCX within a single CCD. Once it leaves a CCD/CCX yikes its bad.

Alder and Raptor Lake had so much better latency leaving the CPU cores on the monolothic die. Arrow Lake broke that and while its packed approach has a bit better latency than AMD, it still borked what Intel had going for it big time.
 
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency
Copy pasting what I wrote in another thread since it seems fitting:

Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.

Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.

Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.

One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores.

Isn't Zen 5 a step back in latency compared to Zen 5.
Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)
 
AMD also promised an uplift from microcode updates for underperforming zen5 processors, It raised performance by 1-2%. I wouldn't expect much more from arrow lake.
 
Copy pasting what I wrote in another thread since it seems fitting:

Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.

Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.

Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.

One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores.


Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)

Well good intra CCX. But good intra CCD because all Zen archs from Zen 3 through Zen 5 (Who the heck knows what Zen 6 is gonna be) have 1 8 core CCX per CCD so good core to core latency within a CCD as such.
 
Last edited:
Intel does not seem to understand how the World has changed.
 
Well good inter CCX. But good inter CCD because all Zen archs from Zen 3 through Zen 5 (WHo the heck knows what Zen 6 is gonna be) have 1 8 core CCX per CCD so good core to core latency within a CCD as such.
Yeah, good catch. Zen 6c is supposed to have 2x CCXes per CCD once again, so we shall see how it behaves if something like that lands on the consumer side.
Also, I noticed that I mixed up the intra/inter terms, sorry for that.
 
Uh oh. You went and brought facts into the hate train.

Too bad the source material (more like lackthereof) for the article are two ROG forum posts talking about the micro code naming and nothing else. There’s zero substance, how is this a news article anyways.
 
Does anyone here actually think people are purchasing this chipset solely for gaming? This is a decent route to go if you want to mix productivity with gaming while not breaking the bank not to mention the power usage on these cpu's are far better than the previous 13/14 gen cpu's along with the fact this socket runs CUDIMM straight out of the box.
 
Intel advertised arrow lake as being "on par" with 13th/14th gen for gaming but that turned out to not be the case, as for productivity and gaming Zen 4 & Zen 5 is still better while using even less power.
A microcode update isn't going to be some magical fix when the problem is the architecture itself, and it would have to be a significant improvement to catch up to Zen 5 X3D.
 
Does anyone here actually think people are purchasing this chipset solely for gaming? This is a decent route to go if you want to mix productivity with gaming while not breaking the bank not to mention the power usage on these cpu's are far better than the previous 13/14 gen cpu's along with the fact this socket runs CUDIMM straight out of the box.


Well it has to be at least Raptor Lake gaming level and have a reasonable topology layout rather than the weird screw up it is.

If it had the exact same gaming across the board as Raptor Lake (1% and 0.1% lows included) with reduced power consumption with same topology of P and E cores rather than P cores in the middle of e-core clusters, I would have wanted a 265K. But no it does not sadly.

Raptor Lake is simply a better product than Arrow Lake if it was reliable and did not have degradation issues. If??? the micrcode update truly fixed them for the long haul its easily Raptor Lake over Arrow Lake anyday.

Intel advertised arrow lake as being "on par" with 13th/14th gen for gaming but that turned out to not be the case, as for productivity and gaming Zen 4 & Zen 5 is still better while using even less power.
A microcode update isn't going to be some magical fix when the problem is the architecture itself, and it would have to be a significant improvement to catch up to Zen 5 X3D.


Exactly true. Can the microcode update fix it and truly across the board make it on par or better than RPL??
 
Copy pasting what I wrote in another thread since it seems fitting:

Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.

Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.

Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.

One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores.


Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)
Thats because e-core L3 cache is shared amoung the 4 cluster of e-cores as victim cache. Data tends to stay in their a lot.

Intel could also drop unused instruction sets on p-cores, but since its slices of L3 per-p-core insteads of one whole slice shared amoung p-cores there is latncy moving out of the slices of L3 cache. Changing the way they're the P-cores are sittng/aranged on the tile, & centralizing a large single L3 cache sliice would be away to fix it.

At the same time they could just use less L3/L2 cache as arrowlake seems to be bandwisth straved just looking at the CU-DIMM results of 9200mhz it gains a lot more than zen 5 does with high ram speed.
 
Arrow Lake will definitely get better after patches, but catching up with AMD and beat them? Not even in a dream... The 14900K(S) are still better Gaming CPUs and even Hardware Unboxed posted a new video showing that the 9800X3D was 18% faster on Average @ 1080p in Gaming vs 14900K (and most CPU intensive games got around 30%+ more performance on the 9800X3D), so I really doubt Intel could get a 20 to 40% more performance uplift via a Patch !
 
Well it has to be at least Raptor Lake gaming level and have a reasonable topology layout rather than the weird screw up it is.

If it had the exact same gaming across the board as Raptor Lake (1% and 0.1% lows included) with reduced power consumption with same topology of P and E cores rather than P cores in the middle of e-core clusters, I would have wanted a 265K. But no it does not sadly.

Raptor Lake is simply a better product than Arrow Lake if it was reliable and did not have degradation issues. If??? the micrcode update truly fixed them for the long haul its easily Raptor Lake over Arrow Lake anyday.




Exactly true. Can the microcode update fix it and truly across the board make it on par or better than RPL??
This is where Arrow Lake shines ... CUDIMM and power efficiency.

With CUDIMM.
memscaling-cinebench-multi.png



Without.
cinebench-multi.png


power-applications.png


power-games.png


cpu-temperature-blender.png
 
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency
You are confusing things, latency in general is a generic word. When talk about Arrow Lake latency being bad, they are talking about the system memory latency, i.e. the latency that a core has to access a completely random part of the memory.

What you linked is the core to core latency, which is the latency to synchronize data between cores such as thread synchronization atomics(e.g. mutex, semaphores, etc) or any data really. If a workload, such a gaming, requires a good amount of those then for Zen processors, it would be ideally to put all the threads into a single cluster(what AMD calls a CCD).

For Zen 2, that was an issue because a CCD was a 4-core cluster, so games often wanted to use more than that, but with 8-core clusters for Zen 3 and later that isn't really an issue unless in the rare case of very heavy parallel CPU games like BeamNG.

This is also why a lot of 'optimization' for gaming in Zen can be turning off a CCD/CCX or pin them down and also why often the -950X SKU isn`t better at gaming vs the -700X SKU.
 
This is where Arrow Lake shines ... CUDIMM and power efficiency.

With CUDIMM.
memscaling-cinebench-multi.png



Without.
cinebench-multi.png


power-applications.png


power-games.png


cpu-temperature-blender.png
CUDIMM will be enabled on some AMD Motherboards with X870(E) chipsets so I'm curious to see how ZEN 5 and ZEN 5 3D will take advantage of it too!
 
Ultra series is very overpriced gpu .
better to buy 13/14 series or Amd Cpu

Arrow Lake will definitely get better after patches, but catching up with AMD and beat them? Not even in a dream... The 14900K(S) are still better Gaming CPUs and even Hardware Unboxed posted a new video showing that the 9800X3D was 18% faster on Average @ 1080p in Gaming vs 14900K (and most CPU intensive games got around 30%+ more performance on the 9800X3D), so I really doubt Intel could get a 20 to 40% more performance uplift via a Patch !

it depends what GPU ppls use..
14900K can be much faster VS 9800X3D because Intel user maybe using Rtx4090 and Amd user 7900XtX GPU

Also, not many high end gamers use 1080p so dif is not so big in reality

If we testing 480p we will se bigger differences, or using upcoming 5090 and 240p we see even more bigger differences.
i just want to tell, if test is not close the reality its worthless
 
Last edited:

It is not full support yet, but will probably be with BIOS updates in several months.

14900K can be much faster VS 9800X3D because Intel user maybe using Rtx4090 and Amd user 7900XtX GPU

14900K much faster? I think you meant 9800X3D right? Because every unbiased IT Channel or Website will tell you that the 9800X3D is currently the best Gaming GPU on the planet!
And I'm pretty sure that once the 5090 is out, the 9800X3D will widen the gap even more!

I personally have a 4K 240Hz monitor with a 4090 so I definitely try to play at 4K as much as possible. If the framerate is not acceptable then I play with DLSS Quality (1440p rendered) and sometimes with Frame Generation like in Black Myth: Wukong or Cyberpunk 2077 w/ Path Tracing...

But even at 4K and depending on games the 1% & 0.1% Lows can be very different depending on which CPU you are using !
 
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency
"Lion Cove P-Cores however don’t do so well. Worst case latency between P-Cores can approach cross-CCD latency on AMD’s chiplet designs."

That is quite bad. In a 16-core AMD you can just lock a low-concurrency app such a game into one CCD.

But Arrow Lake has these delays between all P-cores?
 
You are confusing things, latency in general is a generic word. When talk about Arrow Lake latency being bad, they are talking about the system memory latency, i.e. the latency that a core has to access a completely random part of the memory.

What you linked is the core to core latency, which is the latency to synchronize data between cores such as thread synchronization atomics(e.g. mutex, semaphores, etc) or any data really. If a workload, such a gaming, requires a good amount of those then for Zen processors, it would be ideally to put all the threads into a single cluster(what AMD calls a CCD).

For Zen 2, that was an issue because a CCD was a 4-core cluster, so games often wanted to use more than that, but with 8-core clusters for Zen 3 and later that isn't really an issue unless in the rare case of very heavy parallel CPU games like BeamNG.

This is also why a lot of 'optimization' for gaming in Zen can be turning off a CCD/CCX or pin them down and also why often the -950X SKU isn`t better at gaming vs the -700X SKU.


Yes I wonder the same thing,

Of course Arrow Lake latency beyond 8 cores with the whole 24 core die will be better than any Zen 3 to 5 because all cores are on a single tile/die where as AMD beyond 8 cores has to cross CCX/CCD and go through IF and ouch.

But how is Arrow Lake latency to IMC tile and other interconnects like its ring bus compared to AMD Zen 3 to 5 latency from a CCD/CCX t the infinity fabric and such?

Of course Raptor Lake and Alder Lake 10nm monolithic dies to the IMC and ring kick both's butt, but we have no monolithic die that is not 2 years old and oh has degradation issues and unknown how much the microcode update really fixes it long term???
 

It is not full support yet, but will probably be with BIOS updates in several months.

Yea, so no new information. 9000 series can boot cu-dimms in bypass and nothing else. Mostly no chance we will be seeing actual implementation until zen 6.
 
In the past, people were burned at the stake for Magic (as it is written in the title), and today I use spells (microcode) on CPU, magic is a power in advanced technology :)
 
Yea, so no new information. 9000 series can boot cu-dimms in bypass and nothing else. Mostly no chance we will be seeing actual implementation until zen 6.
CUDIMM is still new so we're years away anyway! But they might make the X870 Motherboards compatible and work as intended when paired with with ZEN 6 CPUs! (which I will probably get since it's still on AM5 and they might have 12c/24t per CCD + Dual V-Cache for dual CCDs).
 
Back
Top