• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Arrow Lake-S to Feature 3 MB of L2 Cache per Performance Core

Toothless

Tech, Games, and TPU!
Supporter
Joined
Mar 26, 2014
Messages
9,312 (2.51/day)
Location
Washington, USA
System Name Veral
Processor 5950x
Motherboard MSI MEG x570 Ace
Cooling Corsair H150i RGB Elite
Memory 4x16GB G.Skill TridentZ
Video Card(s) Powercolor 7900XTX Red Devil
Storage Crucial P5 Plus 1TB, Samsung 980 1TB, Teamgroup MP34 4TB
Display(s) Acer Nitro XZ342CK Pbmiiphx + 2x AOC 2425W
Case Fractal Design Meshify Lite 2
Audio Device(s) Blue Yeti + SteelSeries Arctis 5 / Samsung HW-T550
Power Supply Corsair HX850
Mouse Corsair Nightsword
Keyboard Corsair K55
VR HMD HP Reverb G2
Software Windows 11 Professional
Benchmark Scores PEBCAK
Joined
Mar 24, 2019
Messages
620 (0.33/day)
Location
Denmark - Aarhus
System Name Iglo
Processor 5800X3D
Motherboard TUF GAMING B550-PLUS WIFI II
Cooling Arctic Liquid Freezer II 360
Memory 32 gigs - 3600hz
Video Card(s) EVGA GeForce GTX 1080 SC2 GAMING
Storage NvmE x2 + SSD + spinning rust
Display(s) BenQ XL2420Z - lenovo both 27" and 1080p 144/60
Case Fractal Design Meshify C TG Black
Audio Device(s) Logitech Z-2300 2.1 200w Speaker /w 8 inch subwoofer
Power Supply Seasonic Prime Ultra Platinum 550w
Mouse Logitech G900
Keyboard Corsair k100 Air Wireless RGB Cherry MX
Software win 10
Benchmark Scores Super-PI 1M T: 7,993 s :CinebR20: 5755 point GeekB: 2097 S-11398-M 3D :TS 7674/12260

Space Lynx

Astronaut
Joined
Oct 17, 2014
Messages
16,365 (4.68/day)
Location
Kepler-186f
Processor Ryzen 7800X3D -20 uv
Motherboard AsRock Steel Legend B650
Cooling MSI C360 AIO
Memory 32gb 6000 CL 30-36-36-76
Video Card(s) MERC310 7900 XT -50 uv
Display(s) NZXT Canvas IPS 1440p 165hz 27"
Case NZXT H710 (Red/Black)
Audio Device(s) HD58X, Asgard 2, Modi 3
Power Supply Corsair RM850W
5800X3D was just for sale a 269$ at amazon, thats why i ask. I paid 200 after tax cause I had a coupon for microcenter. So i saved 75 bucks and it runs colder. I have no complaints
 
Joined
Mar 24, 2019
Messages
620 (0.33/day)
Location
Denmark - Aarhus
System Name Iglo
Processor 5800X3D
Motherboard TUF GAMING B550-PLUS WIFI II
Cooling Arctic Liquid Freezer II 360
Memory 32 gigs - 3600hz
Video Card(s) EVGA GeForce GTX 1080 SC2 GAMING
Storage NvmE x2 + SSD + spinning rust
Display(s) BenQ XL2420Z - lenovo both 27" and 1080p 144/60
Case Fractal Design Meshify C TG Black
Audio Device(s) Logitech Z-2300 2.1 200w Speaker /w 8 inch subwoofer
Power Supply Seasonic Prime Ultra Platinum 550w
Mouse Logitech G900
Keyboard Corsair k100 Air Wireless RGB Cherry MX
Software win 10
Benchmark Scores Super-PI 1M T: 7,993 s :CinebR20: 5755 point GeekB: 2097 S-11398-M 3D :TS 7674/12260
5800X3D was just for sale a 269$ at amazon, thats why i ask. I paid 200 after tax cause I had a coupon for microcenter. So i saved 75 bucks and it runs colder. I have no complaints
thats very cheap, for such a new CPU. i would also be happy with it.
 
Joined
May 3, 2018
Messages
2,347 (1.07/day)
So, a 10% increase in performance vs. last generation, as per usual for decades now? I'm not being sarcastic as I've not delved into the numbers and just assume more of the same I've grown accustomed to over the years from Intel.
I think you'll be very surprised about Arrow Lake vs Raptor Lake not just in performance but power efficiency. Time will tell, but I would expect Arrow Lake to thrash Raptor Lake. It'll need to as Zen 5 is looking very strong with > 20% IPC, and a a lot of architectural changes. Any way we'll gte an idea soon with Meteor Lake how it goes even if it's only mobile as Arrow Lake is a much refined and performant version of that.
 
Joined
Apr 12, 2013
Messages
6,769 (1.67/day)
So, a 10% increase in performance vs. last generation, as per usual for decades now? I'm not being sarcastic as I've not delved into the numbers and just assume more of the same I've grown accustomed to over the years from Intel.
10% more has not been a pattern for decades, in fact even last decade Intel couldn't get 10% more for 5 gens before they switched from SKL & its derivates to RKL. Same goes for AMD with Zen -> Zen+ although it was just a minor shrink.
 
Joined
Jun 10, 2014
Messages
2,905 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
The larger the cache, the longer it get to look it up. It's why by example the largest cache is the L3 for both Intel and AMD.
It depends on many factors.
Caches are usually organized in banks, which increases bandwidth substantially and offsets latency, but decreases overall cache efficiency (per cache line).
New node improvements may also lead to latency decreases.
And so on.

...The smaller the region, the higher will be the hit ratio but at the same time, the longer it will take to see if the data is in there. Working with cache isn't just more is better. it's a balance you do when you design a CPU. By example, AMD frequently went with Larger L1 and L2 but had slower cache speed. And by example, Core 2 Duo had 3 MB of L2 per core (merged into a 6 MB shared L2). So 3 MB L2 isn't new. But at that time they didn't had L3.
Both AMD and Intel have increased and decreased their L1D/L1I and L2 caches over various generations, it all depends on the cache design and priorities of the architecture. Comparing a cache across CPU architectures solely based on size is nearly pointless. And as I often say, performance is what ultimately matters.

Pretty much all current CPU architectures caches memory of the same region size, it's called a "cache line", and currently it's 64 bytes with most x86 and ARM architectures.
I do expect them to move to 128 bytes eventually, as this would greatly benefit dense data accesses (which is where you have good hit rates anyways), and implementing e.g. 3 MB of L2(128b cache lines) would not cost anywhere near 50% more than 2 MB L2 (64b cache lines) in die space, and have approx. the same latency, so a huge win in hit rates. This will also allow for 1024-bit SIMD, which is probably coming "soon".
 
Joined
Nov 26, 2021
Messages
1,372 (1.52/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
It depends on many factors.
Caches are usually organized in banks, which increases bandwidth substantially and offsets latency, but decreases overall cache efficiency (per cache line).
New node improvements may also lead to latency decreases.
And so on.


Both AMD and Intel have increased and decreased their L1D/L1I and L2 caches over various generations, it all depends on the cache design and priorities of the architecture. Comparing a cache across CPU architectures solely based on size is nearly pointless. And as I often say, performance is what ultimately matters.

Pretty much all current CPU architectures caches memory of the same region size, it's called a "cache line", and currently it's 64 bytes with most x86 and ARM architectures.
I do expect them to move to 128 bytes eventually, as this would greatly benefit dense data accesses (which is where you have good hit rates anyways), and implementing e.g. 3 MB of L2(128b cache lines) would not cost anywhere near 50% more than 2 MB L2 (64b cache lines) in die space, and have approx. the same latency, so a huge win in hit rates. This will also allow for 1024-bit SIMD, which is probably coming "soon".
The data arrays won't decrease in size due to a larger line size, but the tag arrays would be smaller as you would need only 3*1024*1024/128 tags versus 2*1024*1024/64, i.e 24k vs 32k. Also note that the Pentium 4 had 128 byte lines for L2 while the L1 stayed at 64 bytes.
 
Joined
Dec 29, 2022
Messages
222 (0.44/day)
Would like to see CPUs with L4 cache. So... how come that's not a thing? No performance gains? Design challenges? Unjustified prices?
 
Joined
Nov 26, 2021
Messages
1,372 (1.52/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Would like to see CPUs with L4 cache. So... how come that's not a thing? No performance gains? Design challenges? Unjustified prices?
Power 8 and some SKUs of Haswell, Broadwell, and Skylake had L4 caches. For workloads with very large memory footprints, it can make sense, but for most workloads, a large L3 is better.
 
Joined
Jul 16, 2013
Messages
205 (0.05/day)
System Name latest-greatest
Processor i7 12700K
Motherboard Z690 Rog Strix-E
Cooling Lian Li Galahad 360
Memory corsair vengeance Ddr5 4800
Video Card(s) 2080ti
Storage 980 pro gen4
Display(s) LG C1 4K 120Mhz
Case fractal meshify2
Audio Device(s) Realtec 4080
Power Supply Corsair rm1000x
More cache and wider lanes please.
 
Joined
Jun 10, 2014
Messages
2,905 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Would like to see CPUs with L4 cache. So... how come that's not a thing? No performance gains? Design challenges? Unjustified prices?
To explain this, we first need to address how L3 works.
As you might already know, L3 is a spillover cache, which means it only contains discarded cache lines from L2. L3 is also accessible across cores, which is why it has some effect on multithreaded workloads. There is a tremendous amount of data flowing constantly through the caches, including lots of prefetched data which was ultimately unnecessary. In terms of cache lines, the largest volume is data, while a smaller volume is instructions, but the chances of a single cache line being needed before it is evicted from L3 is much higher for instructions, especially from other cores. (The chances of another core needing lots of the same data within nanoseconds is slim, except for explicit synchronization.) This is why CPUs need so large L3 caches before it starts to matter, in most cases where we see sensitivity to L3, it's due to instruction cache lines being shared, not data. But we usually don't see significant gains from huge L3 caches in most computationally intense tasks, even though they churn though large amounts of data. This is due to the application being cache optimized, which is one of the most important types of low-level optimization. As any low-level programmer can tell you, sensitivity to L3 usually means the code is too large, bloated and unpredictable, which is why the CPU evicts it from cache.

Even though huge L3s make appreciable in some games and select applications, I don't believe it's a good direction to go for CPU development. It costs a tremendous amount of die space, and don't yield any meaningful significance for most heavy workloads. This die space and development effort could be spent on much more useful improvements, which would benefit most workloads. But I guess this is what we get when people are more focused on synthetic benchmark than real world results. Just think about it; slapping a whole extra cache die on the CPU makes less of a difference than a minor architectural upgrade (~10% IPC gains). That's a crude brute-force approach to extract very little overall. And this is why I'm not for L4, the usefulness of L4 would be even less, especially with a larger L3. But I do believe there is one way L3 could become more cost-effective though, splitting instructions and data. Then a much smaller L3 pool could have the same effect as 100 MB or so, at a small cost.

I'm much more excited about real architectural improvements, such as much wider execution. The difference between well written and poorly written software will only become more clear over time, as well written software will continue to scale.

More cache and wider lanes please.
PCIe lanes?
 
Top