• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Arrow Lake-S to Feature 3 MB of L2 Cache per Performance Core

5800X3D was just for sale a 269$ at amazon, thats why i ask. I paid 200 after tax cause I had a coupon for microcenter. So i saved 75 bucks and it runs colder. I have no complaints
 
5800X3D was just for sale a 269$ at amazon, thats why i ask. I paid 200 after tax cause I had a coupon for microcenter. So i saved 75 bucks and it runs colder. I have no complaints
thats very cheap, for such a new CPU. i would also be happy with it.
 
So, a 10% increase in performance vs. last generation, as per usual for decades now? I'm not being sarcastic as I've not delved into the numbers and just assume more of the same I've grown accustomed to over the years from Intel.
I think you'll be very surprised about Arrow Lake vs Raptor Lake not just in performance but power efficiency. Time will tell, but I would expect Arrow Lake to thrash Raptor Lake. It'll need to as Zen 5 is looking very strong with > 20% IPC, and a a lot of architectural changes. Any way we'll gte an idea soon with Meteor Lake how it goes even if it's only mobile as Arrow Lake is a much refined and performant version of that.
 
So, a 10% increase in performance vs. last generation, as per usual for decades now? I'm not being sarcastic as I've not delved into the numbers and just assume more of the same I've grown accustomed to over the years from Intel.
10% more has not been a pattern for decades, in fact even last decade Intel couldn't get 10% more for 5 gens before they switched from SKL & its derivates to RKL. Same goes for AMD with Zen -> Zen+ although it was just a minor shrink.
 
The larger the cache, the longer it get to look it up. It's why by example the largest cache is the L3 for both Intel and AMD.
It depends on many factors.
Caches are usually organized in banks, which increases bandwidth substantially and offsets latency, but decreases overall cache efficiency (per cache line).
New node improvements may also lead to latency decreases.
And so on.

...The smaller the region, the higher will be the hit ratio but at the same time, the longer it will take to see if the data is in there. Working with cache isn't just more is better. it's a balance you do when you design a CPU. By example, AMD frequently went with Larger L1 and L2 but had slower cache speed. And by example, Core 2 Duo had 3 MB of L2 per core (merged into a 6 MB shared L2). So 3 MB L2 isn't new. But at that time they didn't had L3.
Both AMD and Intel have increased and decreased their L1D/L1I and L2 caches over various generations, it all depends on the cache design and priorities of the architecture. Comparing a cache across CPU architectures solely based on size is nearly pointless. And as I often say, performance is what ultimately matters.

Pretty much all current CPU architectures caches memory of the same region size, it's called a "cache line", and currently it's 64 bytes with most x86 and ARM architectures.
I do expect them to move to 128 bytes eventually, as this would greatly benefit dense data accesses (which is where you have good hit rates anyways), and implementing e.g. 3 MB of L2(128b cache lines) would not cost anywhere near 50% more than 2 MB L2 (64b cache lines) in die space, and have approx. the same latency, so a huge win in hit rates. This will also allow for 1024-bit SIMD, which is probably coming "soon".
 
It depends on many factors.
Caches are usually organized in banks, which increases bandwidth substantially and offsets latency, but decreases overall cache efficiency (per cache line).
New node improvements may also lead to latency decreases.
And so on.


Both AMD and Intel have increased and decreased their L1D/L1I and L2 caches over various generations, it all depends on the cache design and priorities of the architecture. Comparing a cache across CPU architectures solely based on size is nearly pointless. And as I often say, performance is what ultimately matters.

Pretty much all current CPU architectures caches memory of the same region size, it's called a "cache line", and currently it's 64 bytes with most x86 and ARM architectures.
I do expect them to move to 128 bytes eventually, as this would greatly benefit dense data accesses (which is where you have good hit rates anyways), and implementing e.g. 3 MB of L2(128b cache lines) would not cost anywhere near 50% more than 2 MB L2 (64b cache lines) in die space, and have approx. the same latency, so a huge win in hit rates. This will also allow for 1024-bit SIMD, which is probably coming "soon".
The data arrays won't decrease in size due to a larger line size, but the tag arrays would be smaller as you would need only 3*1024*1024/128 tags versus 2*1024*1024/64, i.e 24k vs 32k. Also note that the Pentium 4 had 128 byte lines for L2 while the L1 stayed at 64 bytes.
 
Would like to see CPUs with L4 cache. So... how come that's not a thing? No performance gains? Design challenges? Unjustified prices?
 
Would like to see CPUs with L4 cache. So... how come that's not a thing? No performance gains? Design challenges? Unjustified prices?
Power 8 and some SKUs of Haswell, Broadwell, and Skylake had L4 caches. For workloads with very large memory footprints, it can make sense, but for most workloads, a large L3 is better.
 
More cache and wider lanes please.
 
Would like to see CPUs with L4 cache. So... how come that's not a thing? No performance gains? Design challenges? Unjustified prices?
To explain this, we first need to address how L3 works.
As you might already know, L3 is a spillover cache, which means it only contains discarded cache lines from L2. L3 is also accessible across cores, which is why it has some effect on multithreaded workloads. There is a tremendous amount of data flowing constantly through the caches, including lots of prefetched data which was ultimately unnecessary. In terms of cache lines, the largest volume is data, while a smaller volume is instructions, but the chances of a single cache line being needed before it is evicted from L3 is much higher for instructions, especially from other cores. (The chances of another core needing lots of the same data within nanoseconds is slim, except for explicit synchronization.) This is why CPUs need so large L3 caches before it starts to matter, in most cases where we see sensitivity to L3, it's due to instruction cache lines being shared, not data. But we usually don't see significant gains from huge L3 caches in most computationally intense tasks, even though they churn though large amounts of data. This is due to the application being cache optimized, which is one of the most important types of low-level optimization. As any low-level programmer can tell you, sensitivity to L3 usually means the code is too large, bloated and unpredictable, which is why the CPU evicts it from cache.

Even though huge L3s make appreciable in some games and select applications, I don't believe it's a good direction to go for CPU development. It costs a tremendous amount of die space, and don't yield any meaningful significance for most heavy workloads. This die space and development effort could be spent on much more useful improvements, which would benefit most workloads. But I guess this is what we get when people are more focused on synthetic benchmark than real world results. Just think about it; slapping a whole extra cache die on the CPU makes less of a difference than a minor architectural upgrade (~10% IPC gains). That's a crude brute-force approach to extract very little overall. And this is why I'm not for L4, the usefulness of L4 would be even less, especially with a larger L3. But I do believe there is one way L3 could become more cost-effective though, splitting instructions and data. Then a much smaller L3 pool could have the same effect as 100 MB or so, at a small cost.

I'm much more excited about real architectural improvements, such as much wider execution. The difference between well written and poorly written software will only become more clear over time, as well written software will continue to scale.

More cache and wider lanes please.
PCIe lanes?
 
Back
Top