Monday, August 14th 2023

Intel Arrow Lake-S to Feature 3 MB of L2 Cache per Performance Core

Intel's next-generation designs are nearing launch, and we are already getting information about the upcoming generations. Today, we have the information that Intel's Arrow Lake-S desktop/client implementations of the Arrow Lake family will feature as much as 3 MB of level two (L2) cache for each performance core. Currently, Intel's latest 13th-generation Raptor Lake and 14th-generation Raptor Lake Refresh feature 2 MB of L2 cache per performance core. However, the 15th generation Arrow Lake, scheduled for launch in 2024, will bump that up by 50% and reach 3 MB. Given that P-cores are getting a boost in capacity, we expect E-cores to do so as well, but at a smaller size.

Arrow Lake will utilize Lion Cove P-core microarchitecture, while the E-core design will be based on Skymont. Intel plans to use a 20A node for this CPU, and more details will be presented next year.
Source: via VideoCardz
Add your own comment

36 Comments on Intel Arrow Lake-S to Feature 3 MB of L2 Cache per Performance Core

#1
P4-630
Arrow Lake will possibly come without HT , yet +40% increased multithreading performance I was reading somewhere.
Posted on Reply
#2
Space Lynx
Astronaut
Not x3d level of cache, but they are definitely pulling from that playbook some and melding it with their traditional playbook of higher clocks and ipc.


8 arrow lake p cores with E cores turned off... That might be my next cpu. All depends on benchmarks VS 8800x3d though. Will be an interesting match up next year.
Posted on Reply
#3
AnarchoPrimitiv
Definitely a sign that AMD is steering the x86 ship...first chiplets.....now cache....once is a coincidence, but two makes it true, haha
P4-630Arrow Lake will possibly come without HT , yet +40% increased multithreading performance I was reading somewhere.
40% with rhe same core count? Or 40% more performance with 30% more cores?
Posted on Reply
#4
P4-630
AnarchoPrimitiv40% with rhe same core count? Or 40% more performance with 30% more cores?
It is assumed that Arrow Lake will be up to 40% faster in multithreading performance, based on comparisons between mid-range processors (6 8) of this generation, as leaked by MLID. Furthermore, Intel could make impressive progress with Beast Lake (possibly a working title) in 2026. Beast Lake is expected to boost performance to 10 cores, as opposed to the current 8 performance cores Intel has achieved since moving to hybrid technology with Alder Lake due to power consumption issues.

www.igorslab.de/en/intels-schnellere-raptor-lake-refresh-cpus-koennten-bald-auf-den-markt-kommen/#:~:text=It%20is%20assumed%20that%20Arrow,a%20working%20title)%20in%202026.
Space Lynx8 arrow lake p cores with E cores turned off... That might be my next cpu.
You shouldn't be buying intel CPU's...
Posted on Reply
#5
Space Lynx
Astronaut
P4-630It is assumed that Arrow Lake will be up to 40% faster in multithreading performance, based on comparisons between mid-range processors (6 8) of this generation, as leaked by MLID. Furthermore, Intel could make impressive progress with Beast Lake (possibly a working title) in 2026. Beast Lake is expected to boost performance to 10 cores, as opposed to the current 8 performance cores Intel has achieved since moving to hybrid technology with Alder Lake due to power consumption issues.

www.igorslab.de/en/intels-schnellere-raptor-lake-refresh-cpus-koennten-bald-auf-den-markt-kommen/#:~:text=It%20is%20assumed%20that%20Arrow,a%20working%20title)%20in%202026.



You shouldn't be buying intel CPU's...
i like my chips running cold, no need for ecores.
Posted on Reply
#6
dj-electric
Space LynxNot x3d level of cache, but they are definitely pulling from that playbook some and melding it with their traditional playbook of higher clocks and ipc.


8 arrow lake p cores with E cores turned off... That might be my next cpu. All depends on benchmarks VS 8800x3d though. Will be an interesting match up next year.
Don't confuse L2 and L3 caches, they are vastly different
Posted on Reply
#7
dyonoctis
Space Lynxi like my chips running cold, no need for ecores.
Then get a 8 cores X3D chip. You can cool that with a low profile cooler and still get max perf in gaming. I don't know who started that rumors that the e-cores are generating the bulk of the heat, but it's false. The P-cores at full throttle are just going to be hot. They still need 190w to reach their max clock speed. (I've tried 125 and 150w PL2 and they don't reach the max turbo).

And it's not like a 13700k get hot in games anyway. If you do stuff that's going to stress the CPU that much, chances are you'll be needing the e-cores. Otherwise you might as well just get a Ryzen 9 7900.
Posted on Reply
#8
persondb
I am curious about how much is the latency going to increase, hopefully Intel is able to do this change without adding too many cycles.
Posted on Reply
#9
Space Lynx
Astronaut
dyonoctisThen get a 8 cores X3D chip. You can cool that with a low profile cooler and still get max perf in gaming. I don't know who started that rumors that the e-cores are generating the bulk of the heat, but it's false. The P-cores at full throttle are just going to be hot. They still need 190w to reach their max clock speed. (I've tried 125 and 150w PL2 and they don't reach the max turbo).

And it's not like a 13700k get hot in games anyway. If you do stuff that's going to stress the CPU that much, chances are you'll be needing the e-cores. Otherwise you might as well just get a Ryzen 9 7900.
I most likely will upgrade to 8800x3d or 9900x3d. Yikes those temps scare me! I don't break 54 Celsius in a lot of games with my 5600x3d. It's so cold I love it
Posted on Reply
#10
bug
My first PC had 2MB system RAM. And that was above average at the time, most came with only 1MB. I guess things have progressed a little.
Posted on Reply
#11
Punkenjoy
The larger the cache, the longer it get to look it up. It's why by example the largest cache is the L3 for both Intel and AMD.


This may lead to increase performance but maybe not up to the point people think it would. We will see. They might have improved a lot the lookup mechanism to reduce the performance hit of a larger cache.

Also since cache do not scale very well with lower nodes, i wonder if it will be worth the die space.
Posted on Reply
#12
bug
PunkenjoyThe larger the cache, the longer it get to look it up. It's why by example the largest cache is the L3 for both Intel and AMD.


This may lead to increase performance but maybe not up to the point people think it would. We will see. They might have improved a lot the lookup mechanism to reduce the performance hit of a larger cache.

Also since cache do not scale very well with lower nodes, i wonder if it will be worth the die space.
Cache is always a game of fine balancing. Believe it or not, x86 did not have any cache until its 4th generation, 12 years after 8086.
And yes, larger caches mean more time to look up things, but engineers are well aware of that. A cache is only increased when it can be built so that the added cache hits will mitigate most/all of the latency hit, while also not burning through unsustainable amounts of power. A lot of simulation and real-world data goes into a decision to increase a particular cache or add another level. That's why we don't have 4MB 1st level caches or 16 caching level already.
Posted on Reply
#13
chrcoluk
Looking at the history of cpu design, increasing cache fairly consistently bumps up performance. It may slightly slow down cache lookups, but that is easily overcome by the gains of avoiding a miss.
Posted on Reply
#14
Space Lynx
Astronaut
P4-630It is assumed that Arrow Lake will be up to 40% faster in multithreading performance, based on comparisons between mid-range processors (6 8) of this generation, as leaked by MLID. Furthermore, Intel could make impressive progress with Beast Lake (possibly a working title) in 2026. Beast Lake is expected to boost performance to 10 cores, as opposed to the current 8 performance cores Intel has achieved since moving to hybrid technology with Alder Lake due to power consumption issues.

www.igorslab.de/en/intels-schnellere-raptor-lake-refresh-cpus-koennten-bald-auf-den-markt-kommen/#:~:text=It%20is%20assumed%20that%20Arrow,a%20working%20title)%20in%202026.



You shouldn't be buying intel CPU's...
www.techpowerup.com/review/atlas-fallen-benchmark-test-performance-analysis/6.html

e-cores activating in this new game according to the bottom of the review, ruining the experience... and how many more games out there that simply aren't played have the same issue? e-cores are a terrible concept, and if I buy Arrow Lake, I will be turning them off.
Posted on Reply
#15
Noci
Assuming that Intel increases L1 and/or L2 caches and overcome whatever techncal challenges that brings, I always heard that this kind of cache besides being blazing fast, also costs muito dineros.

That won't do much good to the consumer prices.
Posted on Reply
#16
Toothless
Tech, Games, and TPU!
Space Lynxwww.techpowerup.com/review/atlas-fallen-benchmark-test-performance-analysis/6.html

e-cores activating in this new game according to the bottom of the review, ruining the experience... and how many more games out there that simply aren't played have the same issue? e-cores are a terrible concept, and if I buy Arrow Lake, I will be turning them off.
It's literally a bug as mentioned in that review, if you read it.

You keep saying "I'm getting this CPU" and changing your mind next post, or put in "I'm disabling e-cores" without ever understanding why they're there.

Like pick a side already jfc.
Posted on Reply
#17
claes
That and rumors that ryzen will have e-cores suggests they’re here to stay
Posted on Reply
#19
Minus Infinity
No one's mentioned Arrow Lake will not have SMT. Yes, that's right no hyperthreading. They may bring it back later but apparently the new architecture and chiplets and FPGA substrate etc are making it all too hard to get working properly.

Let's hope Intel deliver on their claims for IPC uplifts in both Meteor and Arrow lake. They'll need to as Zen 5 is looking very good and may be out by April 2024.

Looking forward to Zen 5 vs Arrow lake, but I hope Meteor Lake gives us a good idea of how Intel is moving beyond Raptor lake and if they really have made large gains in efficiency too. If they have Meteor Lake mobile will be strong alternative to Phoenix at least even on iGPU.
Posted on Reply
#20
LFaWolf
PunkenjoyThe larger the cache, the longer it get to look it up. It's why by example the largest cache is the L3 for both Intel and AMD.


This may lead to increase performance but maybe not up to the point people think it would. We will see. They might have improved a lot the lookup mechanism to reduce the performance hit of a larger cache.

Also since cache do not scale very well with lower nodes, i wonder if it will be worth the die space.
Longer compared to what, no cache? If the data is not in the cache, the CPU has to get the data from memory, which is X times slower than from fetching it from cache. The more cache you have, the more frequently accessed data can be stored in the cache, and the faster data fetch and cpu operation can happen.
Posted on Reply
#21
skates
So, a 10% increase in performance vs. last generation, as per usual for decades now? I'm not being sarcastic as I've not delved into the numbers and just assume more of the same I've grown accustomed to over the years from Intel.
Posted on Reply
#22
Punkenjoy
LFaWolfLonger compared to what, no cache? If the data is not in the cache, the CPU has to get the data from memory, which is X times slower than from fetching it from cache. The more cache you have, the more frequently accessed data can be stored in the cache, and the faster data fetch and cpu operation can happen.
How the CPU know what is in the cache? Magic ?

No, it have to look up, and a simple way represent it is the larger the cache, the longer it take to look it up.

But to details this a bit, Cache do not cache data, that is a misconception. The CPU at that level is just aware of instruction and memory address. The way the cache work is by caching memory region.

A ultra fast lookup would be to just cache one contiguous memory region. The lookup would be just, is that memory address is in this region? yes/no, then done. The thing is caching a single 3 MB region would have a disastrous cache hit ratio so it wouldn't make sense to do it. Instead they cache smaller region of memory and this is were the trade off happen.

The smaller the region, the higher will be the hit ratio but at the same time, the longer it will take to see if the data is in there. Working with cache isn't just more is better. it's a balance you do when you design a CPU. By example, AMD frequently went with Larger L1 and L2 but had slower cache speed. And by example, Core 2 Duo had 3 MB of L2 per core (merged into a 6 MB shared L2). So 3 MB L2 isn't new. But at that time they didn't had L3.

The thing with cache is you have to look it up every time. If you get a L1 hit, perfect, you just had to look up at that. if you have a cache miss at all level, you still have to check if the data was in L1, then L2, then L3, Then you access it from memory. There is a cache miss penalty over not having cache at all, but if you make your stuff properly, it can way outperform accessing the memory all the time. But the way you do it can greatly impact performance. You need to find the right balance for your architecture.

Generally, L1 is ultra fast and very low latency and very close to the core itself. The L2 is generally dedicated to the core, contain a fair bit much of data needed by that core and L3 is shared across all core and is generally a victim cache (It contain the data that got evicted from L2). That setup worked well.

Intel isn't stupid, so they must think that the new core need a larger cache to be fed. It's possible that they are mitigating the larger cache size with longer pipeline or other technique. In the end it take transistor and the more you put, the larger your chip is and the more expensive it cost.

Designing is always about tradeoff and there, I was just wondering if that was the way to go. In the past, architecture that were near their last redesign frequently had larger cache than their successors because at that point, adding cache was the best thing to do without a full redesign.
Posted on Reply
#23
LFaWolf
PunkenjoyHow the CPU know what is in the cache? Magic ?

No, it have to look up, and a simple way represent it is the larger the cache, the longer it take to look it up.

But to details this a bit, Cache do not cache data, that is a misconception. The CPU at that level is just aware of instruction and memory address. The way the cache work is by caching memory region.

A ultra fast lookup would be to just cache one contiguous memory region. The lookup would be just, is that memory address is in this region? yes/no, then done. The thing is caching a single 3 MB region would have a disastrous cache hit ratio so it wouldn't make sense to do it. Instead they cache smaller region of memory and this is were the trade off happen.

The smaller the region, the higher will be the hit ratio but at the same time, the longer it will take to see if the data is in there. Working with cache isn't just more is better. it's a balance you do when you design a CPU. By example, AMD frequently went with Larger L1 and L2 but had slower cache speed. And by example, Core 2 Duo had 3 MB of L2 per core (merged into a 6 MB shared L2). So 3 MB L2 isn't new. But at that time they didn't had L3.

The thing with cache is you have to look it up every time. If you get a L1 hit, perfect, you just had to look up at that. if you have a cache miss at all level, you still have to check if the data was in L1, then L2, then L3, Then you access it from memory. There is a cache miss penalty over not having cache at all, but if you make your stuff properly, it can way outperform accessing the memory all the time. But the way you do it can greatly impact performance. You need to find the right balance for your architecture.

Generally, L1 is ultra fast and very low latency and very close to the core itself. The L2 is generally dedicated to the core, contain a fair bit much of data needed by that core and L3 is shared across all core and is generally a victim cache (It contain the data that got evicted from L2). That setup worked well.

Intel isn't stupid, so they must think that the new core need a larger cache to be fed. It's possible that they are mitigating the larger cache size with longer pipeline or other technique. In the end it take transistor and the more you put, the larger your chip is and the more expensive it cost.

Designing is always about tradeoff and there, I was just wondering if that was the way to go. In the past, architecture that were near their last redesign frequently had larger cache than their successors because at that point, adding cache was the best thing to do without a full redesign.
Oh, you are talking about slower when the CPU has a cache miss. Yes it is a tradeoff, but studies have shown that a cache hit is above 80%, sometimes as much as above 90%. The speed increase from a cache hit, overall with larger cache size, is worth it on average with a high cache hit. Certainly, there could come to a point of diminishing return, but we are not there yet, as engineers have shown by improving the cache storing algorithms to improve the performance. It is not just a linear search of all the addresses, as they do break the cache down to regions or sets, that is where the name of set associative cache comes from.
Posted on Reply
#24
Eskimonster
Space LynxI most likely will upgrade to 8800x3d or 9900x3d. Yikes those temps scare me! I don't break 54 Celsius in a lot of games with my 5600x3d. It's so cold I love it
I wonder how much you payed for the broken 5800X3D :)
Posted on Reply
#25
Toothless
Tech, Games, and TPU!
EskimonsterI wonder how much you payed for the broken 5800X3D :)
Pricing for it is actually fair.
Posted on Reply
Add your own comment
Apr 28th, 2024 16:27 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts