Intel "Raptor Lake" Rumored to Feature Massive Cache Size Increases

lexluthermiester · Jan 17, 2022

Tigger said:
There's a lot of shit posting in this news thread.

Right?

Punkenjoy · Jan 17, 2022

I got few remark on this:

- It would be great if lower SKU would still keep the full 36MB L3 cache, but that do not seems to be the case.
- In this model, it really look like the L3 is for core to core communication.
- I wonder if L2 to L3 is Exclusive or if the first 2 MB of each 3MB contain the L3 cache
- I wonder how fast that L2 will be. They might need a fast L2 to feed the core, but if the latency is too high, the core might starve and they might lose a lot of cycles.
- I feel the small core might have an easier time to talk to the main core with that design if the L3 cache is fully connected.

We will have to see. it's good that both Zen 4 and Meteor Lake look promising.

Crackong · Jan 17, 2022

Good

Let them fight

Consumer gets better products

Prima.Vera · Jan 17, 2022

What does that mean in plain English? I though more cache means bigger latency? Or how is more cache beneficial?

thestryker6 · Jan 17, 2022

ncrs said:
Alder Lake has already increased cache latency in comparison to Rocket Lake. If they go even further we might arrive in a situation where Zen 3 will have almost half the cache latency of Raptor Lake. But in the end we'll have to wait for benchmarks, and even then it is going to be workload-dependent.

FWIW the Golden Cove cores being used in Sapphire Rapids already have 2MB of L2 so we ought to see if there's added latency as soon as someone benches one of those. Intel didn't say anything about increased latency over the ones in Alder Lake, but that doesn't mean there isn't.

watzupken · Jan 17, 2022

Seems like the trend is no more focusing on clock speed and core count because these have been pushed hard, but to increase cache sizes. I think AMD have been very aggressive in this aspect, but I guess at some point, we will run into diminishing returns, especially so for most consumers.

efikkan · Jan 17, 2022

Wirko said:
If you need to give just one number, 68 MB makes more sense than 36 MB.

For what purpose?
The total cache tells little of the relative performance of CPUs. Performance matters, not specs, especially pointless specs.

Prima.Vera said:
What does that mean in plain English? I though more cache means bigger latency? Or how is more cache beneficial?

Very little.
The largest changes are in the E-cores, which have little impact on most workloads. The extra L3 cache is also shared with more cores, so it's not likely to offer a substantial improvement in general. And judging by performance scaling on Xeons, having extra L3 with more cores doesn't offer a significant change.

Whether more cache adds more latency is implementation specific. In this case they are adding more blocks of L3, which at least increases latency to the banks farthest away, although small compared to RAM of course.

1d10t · Jan 17, 2022

Have to admitted, Intel engineers are really good at putting LEGO.

jesdals · Jan 17, 2022

Ill gues Intel is about to Cash in on LGA1700

bug · Jan 17, 2022

Prima.Vera said:
What does that mean in plain English? I though more cache means bigger latency? Or how is more cache beneficial?

It's a tug-of-war game. If you can increase the cache with minimum added latency, some data that previously required a trip to the main RAM, suddenly doesn't need that anymore -> faster performance. And then the workloads change. Rinse and repeat.
It's also a fab process game, cache is expensive both from a power and a die area point of view.

Also keep in mind cache latency (like RAM latency) is usually given in clock cycles. 1-2 more clock cycles can be masked by increasing the frequency accordingly (it's a bit more complicated than that, really, but that's the gist of it.)

stimpy88 · Jan 17, 2022

Thanks AMD

Wirko · Jan 17, 2022

efikkan said:
For what purpose?
The total cache tells little of the relative performance of CPUs. Performance matters, not specs, especially pointless specs.

I'm sure it matters for applications that can fit their dataset into L2+L3, but not L3 alone, while using all cores. I admit this is more of a HPC territory, where you have a single application running and you can predict (and maybe adjust) cache utilisation. For desktop use ... not sure, but total cache size could affect things like transcoding or image processing a lot.

TheoneandonlyMrK · Jan 17, 2022

Massive, perhaps compared to Alderlake, but I don't know if I would call this a massive Cache in these times.

bug · Jan 17, 2022

TheoneandonlyMrK said:
Massive, perhaps compared to Alderlake, but I don't know if I would call this a massive Cache in these times.

Massive increase, for those inclined to read

TheoneandonlyMrK · Jan 17, 2022

bug said:
Massive increase, for those inclined to read

Yeah I did ,I meant it's not a massive increase, I appreciate it didn't sound like that though.

bug · Jan 17, 2022

TheoneandonlyMrK said:
Yeah I did ,I meant it's not a massive increase, I appreciate it didn't sound like that though.

1.25->2MB/core is pretty massive for L2 cache. Iirc it doesn't usually grow that fast.
Though tbh absolute size is usually meaningless. The cache size is tightly coupled with the underlying architecture (i.e. 2MB/core wouldn't have made a difference for Netburst), size alone doesn't tell much.

lexluthermiester · Jan 17, 2022

bug said:
It's a tug-of-war game. If you can increase the cache with minimum added latency, some data that previously required a trip to the main RAM, suddenly doesn't need that anymore -> faster performance. And then the workloads change. Rinse and repeat.
It's also a fab process game, cache is expensive both from a power and a die area point of view.

Also keep in mind cache latency (like RAM latency) is usually given in clock cycles. 1-2 more clock cycles can be masked by increasing the frequency accordingly (it's a bit more complicated than that, really, but that's the gist of it.)

This. And when the improvements in the structure and functionality of the cache are factored in, the sum total is an overall gain in performance.

Wirko said:
I'm sure it matters for applications that can fit their dataset into L2+L3, but not L3 alone, while using all cores. I admit this is more of a HPC territory, where you have a single application running and you can predict (and maybe adjust) cache utilisation. For desktop use ... not sure, but total cache size could affect things like transcoding or image processing a lot.

I think you misunderstand how cache is used and why it exists. Caches exist to minimize, as much as possible, how many times the CPU needs to fetch data from system ram, which is drastically slower. The less frequently that needs to happen, the better the performance. 99.9% of the time, programs and executing code are completely cache agnostic. This means that programs are generally optimized to run in minimal amounts of cache. However, the more the merrier. So if a program has more room to use, it will use it and the CPU will fit it in. That said, all programs will benefit from more cache unless they are so small that they will fit into L2 or a couple MB of L3, however, this is rare.

ModEl4 · Jan 17, 2022

Although Alder Lake greatly improved the gaming performance vs Rocket lake, when you look back at 2020 roadmap slides/bullet points, Intel didn't emphasize the gaming performance improvement of the CPU design (alder lake), in contrast with Raptor Lake gaming prowess mention (based on redesign cache) which clearly suggest the we will see at least the same jump in gaming performance as we had between rocket->alder (or I'm reading too much into this slide?)

https://cdn.videocardz.com/1/2021/03/Intel-Raptor-Lake-VideoCardz.jpg

efikkan · Jan 17, 2022

Wirko said:
I'm sure it matters for applications that can fit their dataset into L2+L3, but not L3 alone, while using all cores. I admit this is more of a HPC territory, where you have a single application running and you can predict (and maybe adjust) cache utilisation. For desktop use ... not sure, but total cache size could affect things like transcoding or image processing a lot.

Many have the misconception of CPU caches containing the most important data, when in reality they only contain the most recently used (or prefetched) data, caches are streaming buffers. While it's possible for a application to give the CPU hints about prefetching and discarding cache lines, it's ultimately controlled by the CPU, and there are no guarantees. The code don't see the caches, they are transparent, to the code it's just normal memory accesses that turn out to be very fast. When we do cache optimization this is about making it easier for the CPU, like denser code and data, less function calls, less branching, using SIMD, etc. Still, the application don't control what's in cache, or whether the application "fits in cache", because this will "never" be the case anyways.

Even for a core with a large 2 MB L2 cache, it only makes up 32768 cache lines. And if you consider that the CPU can prefetch multiple cachelines per clock, not to mention the fact that the CPU prefetches a lot of data which is never used before eviction, even an L3 cache 10-20x this size will probably be overwritten within a few thousand clock cycles. (don't forget other threads are competing over the L3) So, if you want the code of an application to "stay in cache", you pretty much have to make sure all the (relevant) code is executed every few thousand clock cycles, otherwise other cache lines will get it evicted. Also keep in mind that data cache lines usually greatly outnumber code cache lines, so the more data the application churns through, the more often it needs to access the code cache lines for them to remain in cache. Don't forget that other threads and things like system calls will also pull other code and data into caches, competing with your application. In practice, the entire cache is usually overwritten every few microseconds, with the possible exception if some super dense code is running constantly and are running exclusively on that core.

So if you have a demanding application, it's not the entire application in cache, it's probably only the heavy algorithm you use at that moment in time, and possibly only a small part of a larger algorithm, that fits in cache at the time.

Richards · Jan 17, 2022

This will help it dominate in gaming.. l2 is way faster than l3 cache anyway

mechtech · Jan 18, 2022

The real question is how much cash is that cache gonna cost me???

Chrispy_ · Jan 18, 2022

The best thing about Raptor Lake isn't the cache, it's the additional 8 E-cores.

8 P-cores is enough for the moment, and based on historic trends, enough for a decade or more.

If something is truly multi-threaded the problem is IPC/Watt and IPC/die-area. There's only so much power you can pump into a motherboard socket, and only so much cooling something can handle before it becomes too difficult for mainstream consumer use. E-cores vastly outperform P-cores in terms of power efficiency and area efficiency, so it's a no-brainer to just throw more of them at heavily-threaded workloads.

stimpy88 · Jan 19, 2022

Chrispy_ said:
8 P-cores is enough for the moment, and based on historic trends, enough for a decade or more.

So was 640kb

lexluthermiester · Jan 19, 2022

Chrispy_ said:
8 P-cores is enough for the moment, and based on historic trends

This yes...

Chrispy_ said:
enough for a decade or more.

...but this? Might be stretching things a bit. Though to be fair, the 6core Socket1366 CPUs are over a decade old and are still holding their own, so who knows...

Chrispy_ · Jan 19, 2022

stimpy88 said:
So was 640kb

I said a decade or more, not indefinitely.

Quad core CPUs were launched in 2008 and were good up until at least 2018. Arguably a 4C/8T is still decent enough today but definitely no longer in its prime.

System Name	Personal Gaming Rig
Processor	Ryzen 7800X3D
Motherboard	MSI X670E Carbon
Cooling	MO-RA 3 420
Memory	32GB 6000MHz
Video Card(s)	RTX 4090 ICHILL FROSTBITE ULTRA
Storage	4x 2TB Nvme
Display(s)	Samsung G8 OLED
Case	Silverstone FT04

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	KUROUTOSHIKOU RTX 5080 GALAKURO
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

Processor	265K (running stock until more Intel updates land)
Motherboard	MPG Z890 Carbon WIFI
Cooling	Peerless Assassin 140
Memory	48GB DDR5-7200 CL34
Video Card(s)	RTX 3080 12GB FTW3 Ultra Hybrid
Storage	1.5TB 905P and 2x 2TB P44 Pro
Display(s)	CU34G2X and U2724D
Case	Dark Base 901
Audio Device(s)	Sound Blaster X4
Power Supply	Toughpower PF3 850
Mouse	G502 HERO/G700s
Keyboard	Ducky One 3 Pro Nazca

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	Poor Man's PC
Processor	Ryzen 7 7700
Motherboard	MSI B650M Mortar WiFi
Cooling	AMD Wraith Prism
Memory	32GB GSkill Flare X5 DDR5 6000Mhz
Video Card(s)	XFX Merc 310 Radeon RX 7900 XT
Storage	XPG Gammix S70 Blade 2TB + 8 TB WD Ultrastar DC HC320
Display(s)	Xiaomi G Pro 27i MiniLED
Case	Asus A21 Case
Audio Device(s)	MPow Air Wireless + Mi Soundbar
Power Supply	Enermax Revolution DF 650W Gold
Mouse	Logitech MX Anywhere 3
Keyboard	Logitech Pro X + Kailh box heavy pale blue switch + Durock stabilizers
VR HMD	Meta Quest 2
Benchmark Scores	Who need bench when everything already fast?

System Name	R9 5950x/Skylake 6400
Processor	R9 5950x/i5 6400
Motherboard	Gigabyte Aorus Master X570/Asus Z170 Pro Gaming
Cooling	Arctic Liquid Freezer II 360/Stock
Memory	4x8GB Patriot PVS416G4440 CL14/G.S Ripjaws 32 GB F4-3200C16D-32GV
Video Card(s)	7900XTX/6900XT
Storage	RIP Seagate 530 4TB (died after 7 months), WD SN850 2TB, Aorus 2TB, Corsair MP600 1TB / 960 Evo 1TB
Display(s)	3x LG 27gl850 1440p
Case	Custom builds
Audio Device(s)	-
Power Supply	Silverstone 1000watt modular Gold/1000Watt Antec
Software	Win11pro/win10pro / Win10 Home / win7 / wista 64 bit and XPpro

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	AMD Ryzen 9 5950X
Motherboard	Asus ROG Crosshair VIII Hero WiFi
Cooling	Arctic Liquid Freezer II 420
Memory	32Gb G-Skill Trident Z Neo @3806MHz C14
Video Card(s)	MSI GeForce RTX2070
Storage	Seagate FireCuda 530 1TB
Display(s)	Samsung G9 49" Curved Ultrawide
Case	Cooler Master Cosmos
Audio Device(s)	O2 USB Headphone AMP
Power Supply	Corsair HX850i
Mouse	Logitech G502
Keyboard	Cherry MX
Software	Windows 11

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

Processor	Ryzen 5700x
Motherboard	Gigabyte X570S Aero G R1.1 BiosF5g
Cooling	Noctua NH-C12P SE14 w/ NF-A15 HS-PWM Fan 1500rpm
Memory	Micron DDR4-3200 2x32GB D.S. D.R. (CT2K32G4DFD832A)
Video Card(s)	AMD RX 6800 - Asus Tuf
Storage	Kingston KC3000 1TB & 2TB & 4TB Corsair MP600 Pro LPX
Display(s)	LG 27UL550-W (27" 4k)
Case	Be Quiet Pure Base 600 (no window)
Audio Device(s)	Realtek ALC1220-VB
Power Supply	SuperFlower Leadex V Gold Pro 850W ATX Ver2.52
Mouse	Mionix Naos Pro
Keyboard	Corsair Strafe with browns
Software	W10 22H2 Pro x64

System Name	Bragging Rights
Processor	Atom Z3735F 1.33GHz
Motherboard	It has no markings but it's green
Cooling	No, it's a 2.2W processor
Memory	2GB DDR3L-1333
Video Card(s)	Gen7 Intel HD (4EU @ 311MHz)
Storage	32GB eMMC and 128GB Sandisk Extreme U3
Display(s)	10" IPS 1280x800 60Hz
Case	Veddha T2
Audio Device(s)	Apparently, yes
Power Supply	Samsung 18W 5V fast-charger
Mouse	MX Anywhere 2
Keyboard	Logitech MX Keys (not Cherry MX at all)
VR HMD	Samsung Oddyssey, not that I'd plug it into this though....
Software	W10 21H1, barely
Benchmark Scores	I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.