Intel Arrow Lake-S to Feature 3 MB of L2 Cache per Performance Core

Toothless · Aug 16, 2023

Eskimonster said:
I wonder how much you payed for the broken 5800X3D

Pricing for it is actually fair.

Eskimonster · Aug 16, 2023

Toothless said:
Pricing for it is actually fair.

5800X3D was just for sale a 269$ at amazon, thats why i ask.

Space Lynx · Aug 16, 2023

5800X3D was just for sale a 269$ at amazon, thats why i ask. I paid 200 after tax cause I had a coupon for microcenter. So i saved 75 bucks and it runs colder. I have no complaints

Eskimonster · Aug 16, 2023

Space Lynx said:
5800X3D was just for sale a 269$ at amazon, thats why i ask. I paid 200 after tax cause I had a coupon for microcenter. So i saved 75 bucks and it runs colder. I have no complaints

thats very cheap, for such a new CPU. i would also be happy with it.

Minus Infinity · Aug 16, 2023

skates said:
So, a 10% increase in performance vs. last generation, as per usual for decades now? I'm not being sarcastic as I've not delved into the numbers and just assume more of the same I've grown accustomed to over the years from Intel.

I think you'll be very surprised about Arrow Lake vs Raptor Lake not just in performance but power efficiency. Time will tell, but I would expect Arrow Lake to thrash Raptor Lake. It'll need to as Zen 5 is looking very strong with > 20% IPC, and a a lot of architectural changes. Any way we'll gte an idea soon with Meteor Lake how it goes even if it's only mobile as Arrow Lake is a much refined and performant version of that.

R0H1T · Aug 16, 2023

skates said:
So, a 10% increase in performance vs. last generation, as per usual for decades now? I'm not being sarcastic as I've not delved into the numbers and just assume more of the same I've grown accustomed to over the years from Intel.

10% more has not been a pattern for decades, in fact even last decade Intel couldn't get 10% more for 5 gens before they switched from SKL & its derivates to RKL. Same goes for AMD with Zen -> Zen+ although it was just a minor shrink.

efikkan · Aug 16, 2023

Punkenjoy said:
The larger the cache, the longer it get to look it up. It's why by example the largest cache is the L3 for both Intel and AMD.

It depends on many factors.
Caches are usually organized in banks, which increases bandwidth substantially and offsets latency, but decreases overall cache efficiency (per cache line).
New node improvements may also lead to latency decreases.
And so on.

Punkenjoy said:
...The smaller the region, the higher will be the hit ratio but at the same time, the longer it will take to see if the data is in there. Working with cache isn't just more is better. it's a balance you do when you design a CPU. By example, AMD frequently went with Larger L1 and L2 but had slower cache speed. And by example, Core 2 Duo had 3 MB of L2 per core (merged into a 6 MB shared L2). So 3 MB L2 isn't new. But at that time they didn't had L3.

Both AMD and Intel have increased and decreased their L1D/L1I and L2 caches over various generations, it all depends on the cache design and priorities of the architecture. Comparing a cache across CPU architectures solely based on size is nearly pointless. And as I often say, performance is what ultimately matters.

Pretty much all current CPU architectures caches memory of the same region size, it's called a "cache line", and currently it's 64 bytes with most x86 and ARM architectures.
I do expect them to move to 128 bytes eventually, as this would greatly benefit dense data accesses (which is where you have good hit rates anyways), and implementing e.g. 3 MB of L2(128b cache lines) would not cost anywhere near 50% more than 2 MB L2 (64b cache lines) in die space, and have approx. the same latency, so a huge win in hit rates. This will also allow for 1024-bit SIMD, which is probably coming "soon".

AnotherReader · Aug 16, 2023

efikkan said:
It depends on many factors.
Caches are usually organized in banks, which increases bandwidth substantially and offsets latency, but decreases overall cache efficiency (per cache line).
New node improvements may also lead to latency decreases.
And so on.

Both AMD and Intel have increased and decreased their L1D/L1I and L2 caches over various generations, it all depends on the cache design and priorities of the architecture. Comparing a cache across CPU architectures solely based on size is nearly pointless. And as I often say, performance is what ultimately matters.

Pretty much all current CPU architectures caches memory of the same region size, it's called a "cache line", and currently it's 64 bytes with most x86 and ARM architectures.
I do expect them to move to 128 bytes eventually, as this would greatly benefit dense data accesses (which is where you have good hit rates anyways), and implementing e.g. 3 MB of L2(128b cache lines) would not cost anywhere near 50% more than 2 MB L2 (64b cache lines) in die space, and have approx. the same latency, so a huge win in hit rates. This will also allow for 1024-bit SIMD, which is probably coming "soon".

The data arrays won't decrease in size due to a larger line size, but the tag arrays would be smaller as you would need only 3*1024*1024/128 tags versus 2*1024*1024/64, i.e 24k vs 32k. Also note that the Pentium 4 had 128 byte lines for L2 while the L1 stayed at 64 bytes.

Dan.G · Aug 16, 2023

Would like to see CPUs with L4 cache. So... how come that's not a thing? No performance gains? Design challenges? Unjustified prices?

AnotherReader · Aug 16, 2023

Dan.G said:
Would like to see CPUs with L4 cache. So... how come that's not a thing? No performance gains? Design challenges? Unjustified prices?

Power 8 and some SKUs of Haswell, Broadwell, and Skylake had L4 caches. For workloads with very large memory footprints, it can make sense, but for most workloads, a large L3 is better.

skates · Aug 18, 2023

More cache and wider lanes please.

efikkan · Aug 20, 2023

Dan.G said:
Would like to see CPUs with L4 cache. So... how come that's not a thing? No performance gains? Design challenges? Unjustified prices?

To explain this, we first need to address how L3 works.
As you might already know, L3 is a spillover cache, which means it only contains discarded cache lines from L2. L3 is also accessible across cores, which is why it has some effect on multithreaded workloads. There is a tremendous amount of data flowing constantly through the caches, including lots of prefetched data which was ultimately unnecessary. In terms of cache lines, the largest volume is data, while a smaller volume is instructions, but the chances of a single cache line being needed before it is evicted from L3 is much higher for instructions, especially from other cores. (The chances of another core needing lots of the same data within nanoseconds is slim, except for explicit synchronization.) This is why CPUs need so large L3 caches before it starts to matter, in most cases where we see sensitivity to L3, it's due to instruction cache lines being shared, not data. But we usually don't see significant gains from huge L3 caches in most computationally intense tasks, even though they churn though large amounts of data. This is due to the application being cache optimized, which is one of the most important types of low-level optimization. As any low-level programmer can tell you, sensitivity to L3 usually means the code is too large, bloated and unpredictable, which is why the CPU evicts it from cache.

Even though huge L3s make appreciable in some games and select applications, I don't believe it's a good direction to go for CPU development. It costs a tremendous amount of die space, and don't yield any meaningful significance for most heavy workloads. This die space and development effort could be spent on much more useful improvements, which would benefit most workloads. But I guess this is what we get when people are more focused on synthetic benchmark than real world results. Just think about it; slapping a whole extra cache die on the CPU makes less of a difference than a minor architectural upgrade (~10% IPC gains). That's a crude brute-force approach to extract very little overall. And this is why I'm not for L4, the usefulness of L4 would be even less, especially with a larger L3. But I do believe there is one way L3 could become more cost-effective though, splitting instructions and data. Then a much smaller L3 pool could have the same effect as 100 MB or so, at a small cost.

I'm much more excited about real architectural improvements, such as much wider execution. The difference between well written and poorly written software will only become more clear over time, as well written software will continue to scale.

skates said:
More cache and wider lanes please.

PCIe lanes?

System Name	Veral
Processor	7800x3D
Motherboard	x670e Asus Crosshair Hero
Cooling	Thermalright Phantom Spirit 120 EVO
Memory	2x24 Klevv Cras V RGB
Video Card(s)	Powercolor 7900XTX Red Devil
Storage	Crucial P5 Plus 1TB, Samsung 980 1TB, Teamgroup MP34 4TB
Display(s)	Acer Nitro XZ342CK Pbmiiphx, 2x AOC 2425W, AOC I1601FWUX
Case	Fractal Design Meshify Lite 2
Audio Device(s)	Blue Yeti + SteelSeries Arctis 5 / Samsung HW-T550
Power Supply	Corsair HX850
Mouse	Corsair Harpoon
Keyboard	Corsair K55
VR HMD	HP Reverb G2
Software	Windows 11 Professional
Benchmark Scores	PEBCAK

System Name	Iglo
Processor	5800X3D
Motherboard	TUF GAMING B550-PLUS WIFI II
Cooling	Arctic Liquid Freezer II 360
Memory	32 gigs - 3600hz
Video Card(s)	EVGA GeForce GTX 1080 SC2 GAMING
Storage	NvmE x2 + SSD + spinning rust
Display(s)	BenQ XL2420Z - lenovo both 27" and 1080p 144/60
Case	Fractal Design Meshify C TG Black
Audio Device(s)	Logitech Z-2300 2.1 200w Speaker /w 8 inch subwoofer
Power Supply	Seasonic Prime Ultra Platinum 550w
Mouse	Logitech G900
Keyboard	Corsair k100 Air Wireless RGB Cherry MX
Software	win 10
Benchmark Scores	Super-PI 1M T: 7,993 s :CinebR20: 5755 point GeekB: 2097 S-11398-M 3D :TS 7674/12260

Processor	7800X3D -25 all core
Motherboard	B650 Steel Legend
Cooling	RZ620 (White/Silver)
Memory	32gb ddr5 (2x16) cl 30 6000
Video Card(s)	Merc 310 7900 XT @3200 core -.75v
Display(s)	Agon QHD 27" QD-OLED Glossy 240hz
Case	NZXT H710 (Black/Red)
Power Supply	Corsair RM850x

System Name	Iglo
Processor	5800X3D
Motherboard	TUF GAMING B550-PLUS WIFI II
Cooling	Arctic Liquid Freezer II 360
Memory	32 gigs - 3600hz
Video Card(s)	EVGA GeForce GTX 1080 SC2 GAMING
Storage	NvmE x2 + SSD + spinning rust
Display(s)	BenQ XL2420Z - lenovo both 27" and 1080p 144/60
Case	Fractal Design Meshify C TG Black
Audio Device(s)	Logitech Z-2300 2.1 200w Speaker /w 8 inch subwoofer
Power Supply	Seasonic Prime Ultra Platinum 550w
Mouse	Logitech G900
Keyboard	Corsair k100 Air Wireless RGB Cherry MX
Software	win 10
Benchmark Scores	Super-PI 1M T: 7,993 s :CinebR20: 5755 point GeekB: 2097 S-11398-M 3D :TS 7674/12260

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Intel Arrow Lake-S to Feature 3 MB of L2 Cache per Performance Core

Toothless

Tech, Games, and TPU!

Eskimonster

Space Lynx

Astronaut

Eskimonster

Minus Infinity

R0H1T

efikkan

AnotherReader

Dan.G

AnotherReader

skates

efikkan

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	latest-greatest
Processor	i7 12700K
Motherboard	Z690 Rog Strix-E
Cooling	Lian Li Galahad 360
Memory	corsair vengeance Ddr5 4800
Video Card(s)	2080ti
Storage	980 pro gen4
Display(s)	LG C1 4K 120Mhz
Case	fractal meshify2
Audio Device(s)	Realtec 4080
Power Supply	Corsair rm1000x