• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel "Raptor Lake" Rumored to Feature Massive Cache Size Increases

I got few remark on this:

- It would be great if lower SKU would still keep the full 36MB L3 cache, but that do not seems to be the case.
- In this model, it really look like the L3 is for core to core communication.
- I wonder if L2 to L3 is Exclusive or if the first 2 MB of each 3MB contain the L3 cache
- I wonder how fast that L2 will be. They might need a fast L2 to feed the core, but if the latency is too high, the core might starve and they might lose a lot of cycles.
- I feel the small core might have an easier time to talk to the main core with that design if the L3 cache is fully connected.

We will have to see. it's good that both Zen 4 and Meteor Lake look promising.
 
Good

Let them fight

Consumer gets better products
 
What does that mean in plain English? I though more cache means bigger latency? Or how is more cache beneficial?
 
Alder Lake has already increased cache latency in comparison to Rocket Lake. If they go even further we might arrive in a situation where Zen 3 will have almost half the cache latency of Raptor Lake. But in the end we'll have to wait for benchmarks, and even then it is going to be workload-dependent.
FWIW the Golden Cove cores being used in Sapphire Rapids already have 2MB of L2 so we ought to see if there's added latency as soon as someone benches one of those. Intel didn't say anything about increased latency over the ones in Alder Lake, but that doesn't mean there isn't.
 
Seems like the trend is no more focusing on clock speed and core count because these have been pushed hard, but to increase cache sizes. I think AMD have been very aggressive in this aspect, but I guess at some point, we will run into diminishing returns, especially so for most consumers.
 
If you need to give just one number, 68 MB makes more sense than 36 MB.
For what purpose?
The total cache tells little of the relative performance of CPUs. Performance matters, not specs, especially pointless specs.

What does that mean in plain English? I though more cache means bigger latency? Or how is more cache beneficial?
Very little.
The largest changes are in the E-cores, which have little impact on most workloads. The extra L3 cache is also shared with more cores, so it's not likely to offer a substantial improvement in general. And judging by performance scaling on Xeons, having extra L3 with more cores doesn't offer a significant change.

Whether more cache adds more latency is implementation specific. In this case they are adding more blocks of L3, which at least increases latency to the banks farthest away, although small compared to RAM of course.
 
Have to admitted, Intel engineers are really good at putting LEGO.
 
Ill gues Intel is about to Cash in on LGA1700
 
What does that mean in plain English? I though more cache means bigger latency? Or how is more cache beneficial?
It's a tug-of-war game. If you can increase the cache with minimum added latency, some data that previously required a trip to the main RAM, suddenly doesn't need that anymore -> faster performance. And then the workloads change. Rinse and repeat.
It's also a fab process game, cache is expensive both from a power and a die area point of view.

Also keep in mind cache latency (like RAM latency) is usually given in clock cycles. 1-2 more clock cycles can be masked by increasing the frequency accordingly (it's a bit more complicated than that, really, but that's the gist of it.)
 
Thanks AMD
 
For what purpose?
The total cache tells little of the relative performance of CPUs. Performance matters, not specs, especially pointless specs.
I'm sure it matters for applications that can fit their dataset into L2+L3, but not L3 alone, while using all cores. I admit this is more of a HPC territory, where you have a single application running and you can predict (and maybe adjust) cache utilisation. For desktop use ... not sure, but total cache size could affect things like transcoding or image processing a lot.
 
Massive, perhaps compared to Alderlake, but I don't know if I would call this a massive Cache in these times.
 
Massive, perhaps compared to Alderlake, but I don't know if I would call this a massive Cache in these times.
Massive increase, for those inclined to read ;)
 
Massive increase, for those inclined to read ;)
Yeah I did ,I meant it's not a massive increase, I appreciate it didn't sound like that though.
 
Yeah I did ,I meant it's not a massive increase, I appreciate it didn't sound like that though.
1.25->2MB/core is pretty massive for L2 cache. Iirc it doesn't usually grow that fast.
Though tbh absolute size is usually meaningless. The cache size is tightly coupled with the underlying architecture (i.e. 2MB/core wouldn't have made a difference for Netburst), size alone doesn't tell much.
 
Last edited:
It's a tug-of-war game. If you can increase the cache with minimum added latency, some data that previously required a trip to the main RAM, suddenly doesn't need that anymore -> faster performance. And then the workloads change. Rinse and repeat.
It's also a fab process game, cache is expensive both from a power and a die area point of view.

Also keep in mind cache latency (like RAM latency) is usually given in clock cycles. 1-2 more clock cycles can be masked by increasing the frequency accordingly (it's a bit more complicated than that, really, but that's the gist of it.)
This. And when the improvements in the structure and functionality of the cache are factored in, the sum total is an overall gain in performance.

I'm sure it matters for applications that can fit their dataset into L2+L3, but not L3 alone, while using all cores. I admit this is more of a HPC territory, where you have a single application running and you can predict (and maybe adjust) cache utilisation. For desktop use ... not sure, but total cache size could affect things like transcoding or image processing a lot.
I think you misunderstand how cache is used and why it exists. Caches exist to minimize, as much as possible, how many times the CPU needs to fetch data from system ram, which is drastically slower. The less frequently that needs to happen, the better the performance. 99.9% of the time, programs and executing code are completely cache agnostic. This means that programs are generally optimized to run in minimal amounts of cache. However, the more the merrier. So if a program has more room to use, it will use it and the CPU will fit it in. That said, all programs will benefit from more cache unless they are so small that they will fit into L2 or a couple MB of L3, however, this is rare.
 
Last edited:
Although Alder Lake greatly improved the gaming performance vs Rocket lake, when you look back at 2020 roadmap slides/bullet points, Intel didn't emphasize the gaming performance improvement of the CPU design (alder lake), in contrast with Raptor Lake gaming prowess mention (based on redesign cache) which clearly suggest the we will see at least the same jump in gaming performance as we had between rocket->alder (or I'm reading too much into this slide?)

 
I'm sure it matters for applications that can fit their dataset into L2+L3, but not L3 alone, while using all cores. I admit this is more of a HPC territory, where you have a single application running and you can predict (and maybe adjust) cache utilisation. For desktop use ... not sure, but total cache size could affect things like transcoding or image processing a lot.
Many have the misconception of CPU caches containing the most important data, when in reality they only contain the most recently used (or prefetched) data, caches are streaming buffers. While it's possible for a application to give the CPU hints about prefetching and discarding cache lines, it's ultimately controlled by the CPU, and there are no guarantees. The code don't see the caches, they are transparent, to the code it's just normal memory accesses that turn out to be very fast. When we do cache optimization this is about making it easier for the CPU, like denser code and data, less function calls, less branching, using SIMD, etc. Still, the application don't control what's in cache, or whether the application "fits in cache", because this will "never" be the case anyways.

Even for a core with a large 2 MB L2 cache, it only makes up 32768 cache lines. And if you consider that the CPU can prefetch multiple cachelines per clock, not to mention the fact that the CPU prefetches a lot of data which is never used before eviction, even an L3 cache 10-20x this size will probably be overwritten within a few thousand clock cycles. (don't forget other threads are competing over the L3) So, if you want the code of an application to "stay in cache", you pretty much have to make sure all the (relevant) code is executed every few thousand clock cycles, otherwise other cache lines will get it evicted. Also keep in mind that data cache lines usually greatly outnumber code cache lines, so the more data the application churns through, the more often it needs to access the code cache lines for them to remain in cache. Don't forget that other threads and things like system calls will also pull other code and data into caches, competing with your application. In practice, the entire cache is usually overwritten every few microseconds, with the possible exception if some super dense code is running constantly and are running exclusively on that core.

So if you have a demanding application, it's not the entire application in cache, it's probably only the heavy algorithm you use at that moment in time, and possibly only a small part of a larger algorithm, that fits in cache at the time.
 
The real question is how much cash is that cache gonna cost me???
 
The best thing about Raptor Lake isn't the cache, it's the additional 8 E-cores.

8 P-cores is enough for the moment, and based on historic trends, enough for a decade or more.

If something is truly multi-threaded the problem is IPC/Watt and IPC/die-area. There's only so much power you can pump into a motherboard socket, and only so much cooling something can handle before it becomes too difficult for mainstream consumer use. E-cores vastly outperform P-cores in terms of power efficiency and area efficiency, so it's a no-brainer to just throw more of them at heavily-threaded workloads.
 
So was 640kb

I said a decade or more, not indefinitely.

Quad core CPUs were launched in 2008 and were good up until at least 2018. Arguably a 4C/8T is still decent enough today but definitely no longer in its prime.
 
Back
Top