• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Plans to Copy AMD's 3D V-Cache Tech in 2025, Just Not for Desktops

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,669 (7.43/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Intel is coming around to the idea of large last-level caches on its processors. Florian Maislinger, a tech communications manager for Intel, in an interview with Der8auer and Bens Hardware, revealed that the company is working on augmenting its processors with large shared L3 caches, however, it will begin doing so only with its server processors. The company is working on a new server/workstation processor for 2025 that comes with cache tiles that augment the shared L3 cache on its server processor, so it excels in the kind of workloads AMD's EPYC "Genoa-X" processors and upcoming "Turin-X" processors excel at—technical computing. On "Genoa-X" processors, each of the up to 12 "Zen 4" CCDs comes with stacked 3D V-Cache, which is found to have a profound impact on performance in applications that are cache-sensitive, such as the Ansys suite, OpenFOAM, etc.

The interview reveals that the server processor with large last-level cache should come out in 2025, however there is no such effort on the horizon for the company's client processors, such as the Core Ultra "Arrow Lake-S," at least not in the year 2025. The company's recently launched "Arrow Lake-S" desktop processors do not provide a generational gaming performance uplift over the 14th Gen Core "Raptor Lake Refresh," however, Intel claims to have identified certain correctable reasons for the gaming performance falling below expectations, and is hoping to release updates to the processor (possibly in the form of a new microcode, or something at the OS-vendor level). This, the company claims, should improve the gaming performance of "Arrow Lake-S."



View at TechPowerUp Main Site | Source
 
Yeah - intel needs to be careful not to sell too many CPUs or they might not be able to fab to all that demand. Makes sense.
 
Better late than never later.:D
 
GLWS

They still need to figure out how to lower their CPU's wattage is just crazy.

Anyways lost opportunity for Intel.
 
I do not agree with that. Intel already had such a processor with extra "cache". i7-5775C


Again, the CPU includes 6MB of L3 cache and 128MB of eDRAM.


It's up to discussion. I see the 7800X3d Cache as 4th level one like the EDRAM cache of the i7-5775C
 
imitation best form flattery.
It's up to discussion. I see the 7800X3d Cache as 4th level one like the EDRAM cache of the i7-5775C

Not really the same.

64MB of SRAM for L3 vs 128MB of EDRAM for L4.
 
I do not agree with that. Intel already had such a processor with extra "cache". i7-5775C


Again, the CPU includes 6MB of L3 cache and 128MB of eDRAM.


It's up to discussion. I see the 7800X3d Cache as 4th level one like the EDRAM cache of the i7-5775C

It's not really the same though, as the eDRAM in those chips was far higher latency than what you'd get from an L3 cache. It was on-package but not as tightly integrated as the L3 on X3D chips so again not the same. So better than system RAM, but not as good L3 cache.

Otherwise you may as well argue that systems with soldered RAM are better than those with removable as it's attached to the system. Just being attached doesn't make them better or have higher performance.

I hope intel does provide chips with far higher L3 cache. Should give us some competition in the gaming CPU world, and may encourage more game devs and other workloads to be designed to take advantage of it even more.
 
It's not really the same though, as the eDRAM in those chips was far higher latency than what you'd get from an L3 cache. It was on-package but not as tightly integrated as the L3 on X3D chips so again not the same. So better than system RAM, but not as good L3 cache.

Otherwise you may as well argue that systems with soldered RAM are better than those with removable as it's attached to the system. Just being attached doesn't make them better or have higher performance.
I was about to write something along those lines so instead I'll link Chips and Cheese's recent analysis of Broadwell and Skylake L4 implementations ;)
 
64MB of SRAM for L3 vs 128MB of EDRAM for L4.

Well I can not argue with that as AMD does not really know what to write themself

I always try to use the datasheet or specification from the manufacturer:

L1 Cache
512 KB
L2 Cache
8 MB
L3 Cache
96 MB

the same stupidity for the 9800X3d see: https://www.amd.com/en/products/processors/desktops/ryzen/9000-series/amd-ryzen-7-9800x3d.html

I can not take any manufacturer serious who is unable to provide proper specifications on the specifications page.
Something like
32MiB CACHE TYPE x
64MiB CACHE type y

which also give me the next question. MB or MiB units?

I think the units are wrong also. The units should be MiB. What I read about microcontrollers and such, it's always binary. not the human 10er base. Its base 2.

e.g. I saw in past months that more and more software in gnu linux already use the correct units. Base 2 or Base 10.

--

It up to discussion. What a Level Cache is? How you define a Level 1, Level 2, Level 3 and Level 4 Cache.
I do not think that the latency of a Level 4 cache is an argument. I do agree - AMD may be first with a 3D-Type of Cache Module for a Level 4 Cache. But maybe not the first who uses Level 4 Cache the first time.
 
Last edited:
It's not an L4 cache, it's not addressed as an L4 cache by any code. It is a low latency L3 cache with significant performance characteristics that distinguish it from eDRAM or any other on or off die memory options.

Cache bandwidth and latency are two key factors on why the L1/2/3 caches are so impactful to performance. So the high latency of the eDRAM is a massive disadvantage and why it was referred to as an L4 style memory.

If it was similar then we'd see far better performance impact from that older intel chip, and we'd see issues with worse performance for code needing smaller L3 cache on X3D vs non X3D if the extra L3 cache on X3D didn't perform as well as standard L3 cache.
 
If they pair this up with their previous idea for that Xeon Max lineup (which had some HBM on die), it would make for a killing HPC/CFD processor. Throw in as much memory channels as possible and could beat AMD's current offerings for that use case.

What a Level Cache is? How you define a Level 1, Level 2, Level 3 and Level 4 Cache.
I'd say latency and speed (both of which correlate to the proximity to the cores), as well as how many of those different stackings you have.

Tbh it's just a matter of memory hierarchy, and at this point we can add storage and different memory levels as well. What is a "cache", "memory" or "storage" ends up moot under this view since what matters is their speed/latency.
 
Well I can not argue with that as AMD does not really know what to write themself

They write 96MB because while physically it is 32+64MB, due to the cache lines being shared (linked by in-silicon vias) there is no logical differences between the 32MB built into the core die and the 64MB on the 3D cache die. When L2 flushes to L3 cache lines it does not require any extra special loads or stores to interface with the 3D cache, because at the logical level it is identical to the rest of the L3.
 
Intel should plan to copy AMD's 3D V-cache performance and efficiency, just for desktops.
AMD has clearly better efficiency, but it's not due to the large L3 cache.
But it's hard to find something more deserving of the title "waste of sand" than throwing a bunch of L3 cache on a die, as it's only a tiny subset of very poorly optimized code which significantly benefit from it, namely certain outliers in applications and games running at very unrealistically low GPU load. It would be much better to have a CPU with 5% more computational power, especially down the road, as future games are likely to become more demanding so the bottleneck will be computational performance, not "artificial" ones running games at hundreds of frames per second.
For CPUs to advance, they should stop focusing on gimmicks and make actual architectural advancements instead. Large L3 caches is a waste of precious development resources as well as production capacity.
 
Last edited:
For CPUs to advance, they should stop focusing on gimmicks and make actual architectural advancements instead. Large L3 caches is a waste of precious development resources as well as production capacity.
Not sure I see it as a gimmick.

You can only go so far with shrinking nodes and they are at the mercy of TSMC in that respect. Secondly I don't think Epyc processors that don't run client workloads that benefit greatly from cache is a bad idea. Client Desktop doesn't really drive anything its all enterprise. Better to deal with the low hanging fruit as you continue to address overall processor improvement than trying to hit the ball of the park every single launch. I consider that a better use of development and production capacity as the core counts go up and processors get faster feeding the cores will always be an issue. If you can help that with cache's and not having to add additional memory channels etc I consider that a win.
 
Last edited:
Intel should plan to copy AMD's 3D V-cache performance and efficiency, just for desktops.

It's not AMD's 3D vcache, it's TSMC's
 
But it's hard to find something more deserving of the title "waste of sand" than throwing a bunch of L3 cache on a die, as it's only a tiny subset of very poorly optimized code which significantly benefit from it, namely certain outliers in applications and games running at very unrealistically low GPU load.

Even well optimized games and workloads can benefit if the highly utilized code can be contained in the cache, as it is higher bandwidth and lower latency than waiting to go to system RAM. Even factorio, which is an extremely well optimized game, massively benefits from this, as do many other workloads. You may as well say computers don't need more than 64k of RAM and any applications that do are poorly optimized.

Extra CPU cycles don't do anything if the CPU is waiting for the data from memory. That's why the 1% lows massively benefit, and performance is better even if the frequency is lower. I have a very high end GPU yet I noticed the difference when playing at 4k. Lows, frame pace consistency etc all benefit and games that used to have very periodic stutters have none compared to my 5ghz 9900k.
 
Even well optimized games and workloads can benefit if the highly utilized code can be contained in the cache, as it is higher bandwidth and lower latency than waiting to go to system RAM. Even factorio, which is an extremely well optimized game, massively benefits from this, as do many other workloads. You may as well say computers don't need more than 64k of RAM and any applications that do are poorly optimized.

Extra CPU cycles don't do anything if the CPU is waiting for the data from memory. That's why the 1% lows massively benefit, and performance is better even if the frequency is lower. I have a very high end GPU yet I noticed the difference when playing at 4k. Lows, frame pace consistency etc all benefit and games that used to have very periodic stutters have none compared to my 5ghz 9900k.
Thank you. I have been arguing this with people posting 4K Ultra benchmarks. CPU performance matters in Games and X3D have changed the World.
 
Looks like glueing cores was the way, and glueing cache to them too, in the end.
 
Thank you. I have been arguing this with people posting 4K Ultra benchmarks. CPU performance matters in Games and X3D have changed the World.
I don't understand those people honestly. Benchmarks don't show periodic stutters which you can get in some games for example, and those are fully eliminated for me. Plus, you do see the better lows and general performance in benchmarks. If CPUs didn't make a difference we'd all have 4090s paired with ancient processors.

I have my 7800X3D at 40-60W providing a much better, much more consistent experience with the same GPU and screen than my 9900k that was eating 150W.
 
weird statement
They are concerned for Intel….

Back to the days of minor performance increases and new board is a requirement for 2% performance increase….

But good on Intel, doing what they would sue AMD for, copying a good idea. I wonder if they will be paying royalties?
 
Back
Top