Sunday, January 16th 2022

Intel "Raptor Lake" Rumored to Feature Massive Cache Size Increases

Large on-die caches are expected to be a major contributor to IPC and gaming performance. The upcoming AMD Ryzen 7 5800X3D processor triples its on-die last-level cache using the 3D Vertical Cache technology, to level up to Intel's "Alder Lake-S" processors in gaming, while using the existing "Zen 3" IP. Intel realizes this, and is planning a massive increase in on-die cache sizes, although spread across the cache hierarchy. The next-generation "Raptor Lake-S" desktop processor the company plans to launch in the second half of 2022 is rumored to feature 68 MB of "total cache" (that's AMD lingo for L2 + L3 caches), according to a highly plausible theory by PC enthusiast OneRaichu on Twitter, and illustrated by Olrak29_.

The "Raptor Lake-S" silicon is expected to feature eight "Raptor Cove" P-cores, and four "Gracemont" E-core clusters (each cluster amounts to four cores). The "Raptor Cove" core is expected to feature 2 MB of dedicated L2 cache, an increase over the 1.25 MB L2 cache per "Golden Cove" P-core of "Alder Lake-S." In a "Gracemont" E-core cluster, four CPU cores share an L2 cache. Intel is looking to double this E-core cluster L2 cache size from 2 MB per cluster on "Alder Lake," to 4 MB per cluster. The shared L3 cache increases from 30 MB on "Alder Lake-S" (C0 silicon), to 36 MB on "Raptor Lake-S." The L2 + L3 caches hence add up to 68 MB. All eyes are now on "Zen 4," and whether AMD gives the L2 caches an increase from the 512 KB per-core size that it's consistently maintained since the first "Zen."
Sources: OneRaichu (Twitter), Olrack (Twitter), HotHardware
Add your own comment

66 Comments on Intel "Raptor Lake" Rumored to Feature Massive Cache Size Increases

#26
Punkenjoy
I got few remark on this:

- It would be great if lower SKU would still keep the full 36MB L3 cache, but that do not seems to be the case.
- In this model, it really look like the L3 is for core to core communication.
- I wonder if L2 to L3 is Exclusive or if the first 2 MB of each 3MB contain the L3 cache
- I wonder how fast that L2 will be. They might need a fast L2 to feed the core, but if the latency is too high, the core might starve and they might lose a lot of cycles.
- I feel the small core might have an easier time to talk to the main core with that design if the L3 cache is fully connected.

We will have to see. it's good that both Zen 4 and Meteor Lake look promising.
Posted on Reply
#27
Crackong
Good

Let them fight

Consumer gets better products
Posted on Reply
#28
Prima.Vera
What does that mean in plain English? I though more cache means bigger latency? Or how is more cache beneficial?
Posted on Reply
#29
thestryker6
ncrsAlder Lake has already increased cache latency in comparison to Rocket Lake. If they go even further we might arrive in a situation where Zen 3 will have almost half the cache latency of Raptor Lake. But in the end we'll have to wait for benchmarks, and even then it is going to be workload-dependent.
FWIW the Golden Cove cores being used in Sapphire Rapids already have 2MB of L2 so we ought to see if there's added latency as soon as someone benches one of those. Intel didn't say anything about increased latency over the ones in Alder Lake, but that doesn't mean there isn't.
Posted on Reply
#30
watzupken
Seems like the trend is no more focusing on clock speed and core count because these have been pushed hard, but to increase cache sizes. I think AMD have been very aggressive in this aspect, but I guess at some point, we will run into diminishing returns, especially so for most consumers.
Posted on Reply
#31
efikkan
WirkoIf you need to give just one number, 68 MB makes more sense than 36 MB.
For what purpose?
The total cache tells little of the relative performance of CPUs. Performance matters, not specs, especially pointless specs.
Prima.VeraWhat does that mean in plain English? I though more cache means bigger latency? Or how is more cache beneficial?
Very little.
The largest changes are in the E-cores, which have little impact on most workloads. The extra L3 cache is also shared with more cores, so it's not likely to offer a substantial improvement in general. And judging by performance scaling on Xeons, having extra L3 with more cores doesn't offer a significant change.

Whether more cache adds more latency is implementation specific. In this case they are adding more blocks of L3, which at least increases latency to the banks farthest away, although small compared to RAM of course.
Posted on Reply
#32
1d10t
Have to admitted, Intel engineers are really good at putting LEGO.
Posted on Reply
#33
jesdals
Ill gues Intel is about to Cash in on LGA1700
Posted on Reply
#34
bug
Prima.VeraWhat does that mean in plain English? I though more cache means bigger latency? Or how is more cache beneficial?
It's a tug-of-war game. If you can increase the cache with minimum added latency, some data that previously required a trip to the main RAM, suddenly doesn't need that anymore -> faster performance. And then the workloads change. Rinse and repeat.
It's also a fab process game, cache is expensive both from a power and a die area point of view.

Also keep in mind cache latency (like RAM latency) is usually given in clock cycles. 1-2 more clock cycles can be masked by increasing the frequency accordingly (it's a bit more complicated than that, really, but that's the gist of it.)
Posted on Reply
#36
Wirko
efikkanFor what purpose?
The total cache tells little of the relative performance of CPUs. Performance matters, not specs, especially pointless specs.
I'm sure it matters for applications that can fit their dataset into L2+L3, but not L3 alone, while using all cores. I admit this is more of a HPC territory, where you have a single application running and you can predict (and maybe adjust) cache utilisation. For desktop use ... not sure, but total cache size could affect things like transcoding or image processing a lot.
Posted on Reply
#37
TheoneandonlyMrK
Massive, perhaps compared to Alderlake, but I don't know if I would call this a massive Cache in these times.
Posted on Reply
#38
bug
TheoneandonlyMrKMassive, perhaps compared to Alderlake, but I don't know if I would call this a massive Cache in these times.
Massive increase, for those inclined to read ;)
Posted on Reply
#39
TheoneandonlyMrK
bugMassive increase, for those inclined to read ;)
Yeah I did ,I meant it's not a massive increase, I appreciate it didn't sound like that though.
Posted on Reply
#40
bug
TheoneandonlyMrKYeah I did ,I meant it's not a massive increase, I appreciate it didn't sound like that though.
1.25->2MB/core is pretty massive for L2 cache. Iirc it doesn't usually grow that fast.
Though tbh absolute size is usually meaningless. The cache size is tightly coupled with the underlying architecture (i.e. 2MB/core wouldn't have made a difference for Netburst), size alone doesn't tell much.
Posted on Reply
#41
lexluthermiester
bugIt's a tug-of-war game. If you can increase the cache with minimum added latency, some data that previously required a trip to the main RAM, suddenly doesn't need that anymore -> faster performance. And then the workloads change. Rinse and repeat.
It's also a fab process game, cache is expensive both from a power and a die area point of view.

Also keep in mind cache latency (like RAM latency) is usually given in clock cycles. 1-2 more clock cycles can be masked by increasing the frequency accordingly (it's a bit more complicated than that, really, but that's the gist of it.)
This. And when the improvements in the structure and functionality of the cache are factored in, the sum total is an overall gain in performance.
WirkoI'm sure it matters for applications that can fit their dataset into L2+L3, but not L3 alone, while using all cores. I admit this is more of a HPC territory, where you have a single application running and you can predict (and maybe adjust) cache utilisation. For desktop use ... not sure, but total cache size could affect things like transcoding or image processing a lot.
I think you misunderstand how cache is used and why it exists. Caches exist to minimize, as much as possible, how many times the CPU needs to fetch data from system ram, which is drastically slower. The less frequently that needs to happen, the better the performance. 99.9% of the time, programs and executing code are completely cache agnostic. This means that programs are generally optimized to run in minimal amounts of cache. However, the more the merrier. So if a program has more room to use, it will use it and the CPU will fit it in. That said, all programs will benefit from more cache unless they are so small that they will fit into L2 or a couple MB of L3, however, this is rare.
Posted on Reply
#42
ModEl4
Although Alder Lake greatly improved the gaming performance vs Rocket lake, when you look back at 2020 roadmap slides/bullet points, Intel didn't emphasize the gaming performance improvement of the CPU design (alder lake), in contrast with Raptor Lake gaming prowess mention (based on redesign cache) which clearly suggest the we will see at least the same jump in gaming performance as we had between rocket->alder (or I'm reading too much into this slide?)

cdn.videocardz.com/1/2021/03/Intel-Raptor-Lake-VideoCardz.jpg
Posted on Reply
#43
efikkan
WirkoI'm sure it matters for applications that can fit their dataset into L2+L3, but not L3 alone, while using all cores. I admit this is more of a HPC territory, where you have a single application running and you can predict (and maybe adjust) cache utilisation. For desktop use ... not sure, but total cache size could affect things like transcoding or image processing a lot.
Many have the misconception of CPU caches containing the most important data, when in reality they only contain the most recently used (or prefetched) data, caches are streaming buffers. While it's possible for a application to give the CPU hints about prefetching and discarding cache lines, it's ultimately controlled by the CPU, and there are no guarantees. The code don't see the caches, they are transparent, to the code it's just normal memory accesses that turn out to be very fast. When we do cache optimization this is about making it easier for the CPU, like denser code and data, less function calls, less branching, using SIMD, etc. Still, the application don't control what's in cache, or whether the application "fits in cache", because this will "never" be the case anyways.

Even for a core with a large 2 MB L2 cache, it only makes up 32768 cache lines. And if you consider that the CPU can prefetch multiple cachelines per clock, not to mention the fact that the CPU prefetches a lot of data which is never used before eviction, even an L3 cache 10-20x this size will probably be overwritten within a few thousand clock cycles. (don't forget other threads are competing over the L3) So, if you want the code of an application to "stay in cache", you pretty much have to make sure all the (relevant) code is executed every few thousand clock cycles, otherwise other cache lines will get it evicted. Also keep in mind that data cache lines usually greatly outnumber code cache lines, so the more data the application churns through, the more often it needs to access the code cache lines for them to remain in cache. Don't forget that other threads and things like system calls will also pull other code and data into caches, competing with your application. In practice, the entire cache is usually overwritten every few microseconds, with the possible exception if some super dense code is running constantly and are running exclusively on that core.

So if you have a demanding application, it's not the entire application in cache, it's probably only the heavy algorithm you use at that moment in time, and possibly only a small part of a larger algorithm, that fits in cache at the time.
Posted on Reply
#44
Richards
This will help it dominate in gaming.. l2 is way faster than l3 cache anyway
Posted on Reply
#45
mechtech
The real question is how much cash is that cache gonna cost me???
Posted on Reply
#46
Chrispy_
The best thing about Raptor Lake isn't the cache, it's the additional 8 E-cores.

8 P-cores is enough for the moment, and based on historic trends, enough for a decade or more.

If something is truly multi-threaded the problem is IPC/Watt and IPC/die-area. There's only so much power you can pump into a motherboard socket, and only so much cooling something can handle before it becomes too difficult for mainstream consumer use. E-cores vastly outperform P-cores in terms of power efficiency and area efficiency, so it's a no-brainer to just throw more of them at heavily-threaded workloads.
Posted on Reply
#47
stimpy88
Chrispy_8 P-cores is enough for the moment, and based on historic trends, enough for a decade or more.
So was 640kb
Posted on Reply
#48
lexluthermiester
Chrispy_8 P-cores is enough for the moment, and based on historic trends
This yes...
Chrispy_enough for a decade or more.
...but this? Might be stretching things a bit. Though to be fair, the 6core Socket1366 CPUs are over a decade old and are still holding their own, so who knows...
Posted on Reply
#49
Chrispy_
stimpy88So was 640kb
I said a decade or more, not indefinitely.

Quad core CPUs were launched in 2008 and were good up until at least 2018. Arguably a 4C/8T is still decent enough today but definitely no longer in its prime.
Posted on Reply
#50
lexluthermiester
Chrispy_Quad core CPUs were launched in 2008 and were good up until at least 2018.
That depends greatly on the Quad core being discussed. The original Core2Quads are not good for much on a modern level and they were struggling even in 2018. The second gen C2Q line held up much better because of the higher FSB and performance. There are users here in the forums who are still running them. The early AMD quads didn't fair so well. They were irrelevant in 2015.

Bring that forward and the 6+4core and above Rocket Lake model will likely last 8 to 10 year barring a massive breakthrough in IC substrate materials.
Posted on Reply
Add your own comment
May 4th, 2024 16:32 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts