• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel Plans to Copy AMD's 3D V-Cache Tech in 2025, Just Not for Desktops

It's not AMD's 3D vcache, it's TSMC's
So its not Intel's CPU either anymore then. Neat!

Next time I want a patch for an application I'll write some random factory in China, too.
 
Last edited:
What a shame! Intel's desktop lineup could really use such a boost.
Desktop is by far the least important line up. Server and mobile are what matters. Desktop is so far behind either it's laughable. They are getting crushed in server but doing ok in mobile.
 
Last edited:
There was a Broadwell chip with 60MB L3 cache. They aren't new to big L3. Sapphire Rapids has around 110MB L3 and also optionally a huge L4. More cache is just the natural progression for all these companies because the problems to solve are the same as ever.
 
Last edited:
It is not, TSMC owns the 3d cache packaging. It is not an AMD design. AMD simply took advantage of a services that TSMC offered (3d cache) and tried it out on their processers.

TSMC's 3D Stacked SoIC Packaging Making Quick Progress, Eyeing Ultra-Dense 3μm Pitch In 2027

And you have this deck from TSMC back in 2021 regarding 3d stacking: Advanced Technology Leadership
It was based on AMD interposer technology for the first HBM stacks in 2015. That Intel also copied, and Nvidia.

1731705723105.jpeg
 
Give us HEDT CPUs with the cache and ECC memory and I'll forget about the desktop. Deal?
 
Been sayin' that X3D is good for more than gaming...

Intel will have an issue though: For all but its highest-billing most demanding customers, adding extra cache will 'extend' the usable life of the platform.
I wholly expect hardware-level platform locking, and a non-existent 2nd hand market (in years to come).
 
Even well optimized games and workloads can benefit if the highly utilized code can be contained in the cache, as it is higher bandwidth and lower latency than waiting to go to system RAM. Even factorio, which is an extremely well optimized game, massively benefits from this, as do many other workloads.
You don't grasp the difference between L2 and L3 caches. L3 only contains data recently discarded by L2, so it's cache lines that have either been very recently used or more likely pre-fetched and then never used at all. The most data and computationally intensive workloads see no benefit beyond a decent L3 cache, because the program is what we called cached optimized, which is a requirement for any performant piece of software. For any such heavy workload, the chances of a hit in L3 of a data cache line is extremely low, except for the few times cores are synced. This means the few hits that you actually get is likely instruction cache lines, and the rest is just meaningless garbage streaming through the L3. Sensitivity to L3 cache is mainly known as an indicator of bloat in software optimization, and the solution is to reduce said bloat and make the code more computationally dense.

As heavy workloads move more and more towards SIMD (e.g. AVX-512), the amount of data streaming through memory->L2->L3 is greater than ever, and the chances of a hit in L3 data cache is getting slimmer and slimmer. (Which should be obvious, as the workload needs to be cache optimized, for both instruction and data, otherwise the pipeline would stall.) The amount of data cache lines greatly outnumbers instruction cache lines, which is why AMD needed so much of it in order to make a tiny difference.

While instruction cache lines are comparatively "few" in number and not bottlenecked by memory bandwidth, the cache hierarchy for data cache lines behave like a "streaming buffer"; a continuous stream of data flowing from memory->L2->L3, all the data being overwritten every few thousand clock cycles, so the bottleneck here would not be L3 bandwidth, but rather memory bandwidth.

It's no accident that CPUs over the past decade or so have continuously increased bandwidth of both memory and caches, especially for heavy AVX workloads, and even prioritizing bandwidth over latency. While the cache sizes (L1I, L1D, L2, L3) have comparatively remained fairly stable until the arrival of 3D V-cache (except growing L3 proportionally to core count), otherwise you might have expected a 1GB L2 cache by now. And this "discrepancy" is due to misconceptions about how chaches work; as said the caches are an extremely efficient streaming buffer to keep the execution ports fed (with staggering amounts of data flowing through there), not a hierarchy of data based on "importance". :)

You may as well say computers don't need more than 64k of RAM and any applications that do are poorly optimized.
Nice attempt at a straw man argument there, but you are in fact just grasping at straws.
 
It's not AMD's 3D vcache, it's TSMC's
It was engineered by AMD and manufactured by TSMC.

Intel's taking a similar approach, but will call it something else.
 
AMD has clearly better efficiency, but it's not due to the large L3 cache.
But it's hard to find something more deserving of the title "waste of sand" than throwing a bunch of L3 cache on a die, as it's only a tiny subset of very poorly optimized code which significantly benefit from it, namely certain outliers in applications and games running at very unrealistically low GPU load. It would be much better to have a CPU with 5% more computational power, especially down the road, as future games are likely to become more demanding so the bottleneck will be computational performance, not "artificial" ones running games at hundreds of frames per second.
For CPUs to advance, they should stop focusing on gimmicks and make actual architectural advancements instead. Large L3 caches is a waste of precious development resources as well as production capacity.

unless your architectural advancements are bottlenecked by memory bandwidth and latency- then that waste of sand turns into out of stock products that everyone who runs games wants….
 
I must the only person who wants to see AMD try Forveros for dual CCD die cpu's....
Oh well.
 
Caches are usually defined by cycle latencies, not by size or preference.

L1 1ns - 4 cycles
L2 3ns - 14 cycles
L3 10ns - 50 cycles
L4/eDRAM 36ns - 140 cycles
DRAM 60-100ns - MANY cycles

Guess where X3D stands
Now the L4 has what? 50-100GB/sec bandwidth?
Just for comparison the first gen X3D can hit 600GB/sec with 47 cycles latency.
So it has 6x bandwidth and 3x faster access times....which is the same as most L3 caches.
Just fyi simple CPU instructions usually last 1-4 cycles and more complex ones like AVX might be up to 20-60-100 cycles

I think that the reason for L1/L2 caches not increasing is because they're part of the cores, doubling of size means greater area and bigger dies, which means higher latencies, only recently has density improved enough (die shrinks used to provide 2-3x density) due to EUV, that we saw some improvement.

In fact both L1 and L2 have increased in the last few gens after 20 years of staying between 256-512KB (not counting halo products like the FX or the shared L2....but different FX) all without increasing the latencies.

L3 is just easier to increase or move into it's own stacked die, there's even rumours that AMD plans to have the next Zen arch with L3 cache completely moved on a stacked die
 
Copying could open the door for litigation.
 
I don't understand those people honestly. Benchmarks don't show periodic stutters which you can get in some games for example, and those are fully eliminated for me. Plus, you do see the better lows and general performance in benchmarks. If CPUs didn't make a difference we'd all have 4090s paired with ancient processors.

I have my 7800X3D at 40-60W providing a much better, much more consistent experience with the same GPU and screen than my 9900k that was eating 150W.
CPU is only important now because its AMD right?

U PC have something wrong, maybe slow Ram?
my second PC whit 9900K and 4090 there is no difference in 4k gaming VS my main system 7800X3D whit same GPU
even 1440p there is no big differences

But if i use GPU like 4060 then i will se difference Asap,not because CPU but because Slow GPU

AMD have good CPUs but in real world GPU is much more important.
Both Intel/Amd even whit older CPUs can do gaming just fine.

Ppls just hyped extra % they see in Bench 1080p+4090

It was engineered by AMD and manufactured by TSMC.

Intel's taking a similar approach, but will call it something else.
it was engineered by TSMC not AMD
 
It looks like (another secret) agreement between Intel and AMD, dividing up which of the two gets which market.
 
CPU is only important now because its AMD right?

U PC have something wrong, maybe slow Ram?
my second PC whit 9900K and 4090 there is no difference in 4k gaming VS my main system 7800X3D whit same GPU
even 1440p there is no big differences

But if i use GPU like 4060 then i will se difference Asap,not because CPU but because Slow GPU

AMD have good CPUs but in real world GPU is much more important.
Both Intel/Amd even whit older CPUs can do gaming just fine.

Ppls just hyped extra % they see in Bench 1080p+4090


it was engineered by TSMC not AMD

3d.jpg

TM says it all. TSMC invention licensed to AMD i guess for their use. I don't think it was originally for memory stacking was it? AMD just used it that way.


Can't wait to see how Intel does it, surely they can't copy, unless they get a secret license from TSMC to use it
 
Just fyi simple CPU instructions usually last 1-4 cycles and more complex ones like AVX might be up to 20-60-100 cycles
That's just plainly wrong.
Most core AVX operations are within 1-5 cycles on recent architectures. Haswell and Skylake did a lot to improve AVX throughput, but there have been several improvements since then too. E.g. add operations are now down from 4 to 2 cycles on Alder Lake and Sapphire Rapids. Shift operations are down to a single cycle. This is as fast as single integer operations. And FYI, all floating point operations go through the vector units, whether it's single operation, SSE or AVX, the latency will be the same. ;)
 
To put large cache tile onto CPU cores was idea of one person in AMD team.

TSMCs 3D technology was used to manufacture this idea and they decided to further improve the technology.

Don't mix general 3D manufacturing process with that tile of extra cache im X3D CPUs.
 
Last edited:
I do not agree with that. Intel already had such a processor with extra "cache". i7-5775C


Again, the CPU includes 6MB of L3 cache and 128MB of eDRAM.


It's up to discussion. I see the 7800X3d Cache as 4th level one like the EDRAM cache of the i7-5775C
I always thought the eDRAM in those Intel processors was for the iGPU...
 
View attachment 371926
TM says it all. TSMC invention licensed to AMD i guess for their use. I don't think it was originally for memory stacking was it? AMD just used it that way.


Can't wait to see how Intel does it, surely they can't copy, unless they get a secret license from TSMC to use it
Well the chip are made by TSMC, so in really, it's not AMD Ryzen but TSMC Ryzen right ?


Yes the fabrication technologies was researched by TSMC and they are the one doing it. But guess what, this is normal as they are they are the one making those chip for AMD. AMD is not a fab.


But, AMD is the only one right now using that technologies because this isn't just a box you tick when you order TSMC some wafer. It's not "I would take X chips with more cache". You still have to design a chip that will be able to communicate with the cache chips, send power etc.

The physical portion of 3D Vcache is a TSMC technology. This is expected as AMD is fabless.
The logical portion of 3D Vcache is an AMD technology. This is expected as TSMC do not design chip.

In the end, it's a collaboration of both company.

Also, The added chip is indeed L3. There is no separate lookup for that chip when there is a check if it de data is in the L3 cache. The whole 96 MB is looked at the same time and there is no penalty for accessing data into the 3d vcache chip.
 
Back
Top