Tuesday, October 23rd 2018

Intel Increases L1D and L2 Cache Sizes with "Ice Lake"

Intel's next major CPU microarchitecture being designed for the 10 nm silicon fabrication process, codenamed "Ice Lake," could introduce the first major core redesign in over three years. Keen observers of Geekbench database submissions of dual-core "Ice Lake" processor engineering samples noticed something curious - Intel has increased its L1 and L2 cache sizes from previous generations.

The L1 data cache has been enlarged to 48 KB from 32 KB of current-generation "Coffee Lake," and more interestingly, the L2 cache has been doubled in size to 512 KB, from 256 KB. The L1 instruction cache is still 32 KB in size, while the shared L3 cache for this dual-core chip is 4 MB. The "Ice Lake" chip in question is still a "mainstream" rendition of the microarchitecture, and not an enterprise version, which has had a "re-balanced" cache hierarchy since "Skylake-X," which combined large 1 MB L2 caches with relatively smaller shared L3 caches.
Source: Geekbench Database
Add your own comment

12 Comments on Intel Increases L1D and L2 Cache Sizes with "Ice Lake"

#1
Xx Tek Tip xX
Moving cache back up? Interesting since the HEDT got hit harder, the cache total on the 6950x was more than the 7980xe.
Posted on Reply
#2
dont whant to set it"'
Good, or is it ? if its a new CPU architecture I cant argue on no basis whatsoever, more cache is "more better" is it as fast? what are the benefits ? can the chain of micro-ops handle it or is the scheduler up to the task, but, yet again I is no expert.
Posted on Reply
#3
agent_x007
In general, having bigger cache is good. However lalency is also important.
Programs that can fit in smaller cache should execute faster on older tech, if latency on bigger cache is higher.

Also, last L1 bump on "consumer grade" platform was with Conroe (from Netburst) and we have 256kB L2 since Nehalem (first gen Core I series).
Intel never released a large L3 caches per core on LGA11xx platforms (always 2MB/core max.).
Posted on Reply
#4
efikkan
We still don't know the details of Ice Lake, but the new details look like this in comparison with existing architectures:

(based on info around the web, may not be 100% accurate)

One thing I consider interesting is that Intel seem to prioritize L1 data cache while AMD prioritizes L1 instruction cache.

Xx Tek Tip xX, post: 3928243, member: 178884"
Moving cache back up? Interesting since the HEDT got hit harder, the cache total on the 6950x was more than the 7980xe.
What?
The L3 cache on Skylake-X works differently. Prior generations had an inclusive L3 cache, meaning L2 will be duplicated in L3, so effectively the L3 cache size of older generations is 1.75 MB. Skylake-X also quadrupled the L2 cache, leading to an effective increase in cache per core, but more importantly, a more efficient cache.

dont whant to set it"', post: 3928245, member: 160414"
Good, or is it ? if its a new CPU architecture I cant argue on no basis whatsoever, more cache is "more better" is it as fast? what are the benefits ? can the chain of micro-ops handle it or is the scheduler up to the task, but, yet again I is no expert.
Cache have always been more complex than just "more is better".
I believe even the old 80486 supported something like 512 kB of off-chip L2 cache.
For cache it comes down to latency, throughput and die space. Fewer banks may give higher cache efficiency, but lower bandwidth and higher complexity. More banks is simpler, gives higher bandwidth, but sacrifices cache efficiency. Latency is even tougher, it depends on the implementation.
Posted on Reply
#5
srsbsns
Will need to wait for an actual product. Right now Intel 10nm is vaporware/rumormill at best.
Posted on Reply
#7
Midland Dog
birdie, post: 3928443, member: 131299"
Here's a comparison with a similarly clocked Kaby Lake CPU (SkyLake uArch):

https://browser.geekbench.com/v4/cpu/compare/9473563?baseline=10445533

I can't say Ice Lake is impressive - there are some gains but overall it's a minimal advantage.
1st gen 10nm wont beat 14nm ++++ they should be chasing ipc not clocks with 14nm since they have an extremely mature node
Posted on Reply
#8
darkangel0504
IceLake is running at 16 GB Dual Channel and on Linux. Other i3-7130U laptops maybe were running in Single Channel and on Windows
Posted on Reply
#9
hat
Enthusiast
Interesting. I thought cache sizes got smaller as we moved away from FSB on to QPI/DMI because QPI/DMI was so much faster than FSB, so large caches weren't needed...
Posted on Reply
#10
First Strike
hat, post: 3928537, member: 32804"
Interesting. I thought cache sizes got smaller as we moved away from FSB on to QPI/DMI because QPI/DMI was so much faster than FSB, so large caches weren't needed...
Nope... Cache is still a must. No matter how fast the memory/IO may be, internal CPU pipelines will always be way faster.

efikkan, post: 3928265, member: 150226"
Cache on Skylake-X works differently. Prior generations had an inclusive L3 cache, meaning L2 will be duplicated in L3, so effectively the L3 cache size of older generations is 1.75 MB. Skylake-X also quadrupled the L2 cache, leading to an effective increase in cache per core, but more importantly, a more efficient cache.
I agree with your opinion. Only that SKL-SP's victim cache is not necessarily more efficient. The efficiency of a victim cache and an inclusive cache depends on the workload.
And by the way, Ryzen uses victim cache too, similar to SKL-SP.
Posted on Reply
#11
Xajel
hat, post: 3928537, member: 32804"
Interesting. I thought cache sizes got smaller as we moved away from FSB on to QPI/DMI because QPI/DMI was so much faster than FSB, so large caches weren't needed...
The main idea of the memory subsystem is to scale directly to how close the data is being processed handled, the closer they're the closer they get to the processor while being handled faster and with much lower latency. The L1 cache handles the actual data being processed at that exact time, that's why it consist of L1 Data and L1 Instruction, as the processor will use the L1i to process the data currently on the L1d. L2 cache contains data to be processed next or the rest of the data that the L1d cant handle. And you guessed it, the L3 contains data of the next level. And then the RAM, and finally the rest is on the HDD.

When ever a higher priority cache/memory is not enough, the system will use the next available, so if L1d is not enough, the next step is L2, when that is full then L3 comes (if available), and when there's L4 cache it will be the next level also, if not RAM will be used and so on.

Do you remember why the system becomes very slow when you have heavy applications and low RAM ? so upgrading RAM sped up your system noticeably then ? or when you finally upgraded to SSD and saw a huge jump in responsiveness and speed ? This what happens if the higher level cache/memory becomes too low and the system is forced to go for the next "slower" one.


When Intel first released Celeron, they experimented with L2 cache less one to make it cost less, it did cost less to make. But it performed horribly. They quickly scrapped that and the next update came with L2.

So why not having more and more of cache ? there's several things to consider:-
1- Cache are expensive: They require a lot of die area and consume power.
2- More cache brings latency: The larger the cache is the more time it takes to actually look for the data you need, and latency is crucial here.
3- Performance gain with more cache is not linear.
4- Architecture favouring: Duo to the second and third points, and how the architecture actually handles the data and cache hierarchy is working, there will be an optimal cache size for each level that brings the most performance at best power/cost. Adding more might rise the power/cost too much for little performance boost or might actually bring performance down a little for some latency critical applications.
Posted on Reply
#12
efikkan
Xajel, post: 3928641, member: 51625"
The main idea of the memory subsystem is to scale directly to how close the data is being processed handled, the closer they're the closer they get to the processor while being handled faster and with much lower latency. The L1 cache handles the actual data being processed at that exact time, that's why it consist of L1 Data and L1 Instruction, as the processor will use the L1i to process the data currently on the L1d. L2 cache contains data to be processed next or the rest of the data that the L1d cant handle. And you guessed it, the L3 contains data of the next level. And then the RAM, and finally the rest is on the HDD.
The purpose of the cache is to hide latency.
To be precise, L1 is still a cache, the actual data being processed are in registers.

Even some introductory books in CS describe the cache hierarchy incorrectly. L1…L3 is just a streaming buffer, it contains code and data that is likely to be used or have been recently used. Many mistakenly think that the most important stuff is stored in L1, then L2 and so on. These caches are overwritten thousands of times per second, no data ever stays there for long. And it's not like your running program can fit in there, or your most important variables in code.

Modern CPUs do aggressive prefetching, which means it preloads data you might need. Each bank in the cache is a usually a Least Recently Used(LRU) queue, which means that any time one cache line is written, the oldest one is discarded. So caching things that are not needed may actually replace useful data. Depending on workload, the cache may at times be mostly wasted, but it's of course still better than no cache.

Xajel, post: 3928641, member: 51625"
Do you remember why the system becomes very slow when you have heavy applications and low RAM ? so upgrading RAM sped up your system noticeably then ? or when you finally upgraded to SSD and saw a huge jump in responsiveness and speed ? This what happens if the higher level cache/memory becomes too low and the system is forced to go for the next "slower" one.
SSDs does wonders for file operations, but only affects responsiveness when the OS is swapping heavily, and by that point the system is too sluggish anyway. There is a lot of placebo tied to the benefits of SSDs. Don't get me wrong, SSDs are good, but they don't make code faster.
Posted on Reply
Add your own comment