Intel Plans to Copy AMD's 3D V-Cache Tech in 2025, Just Not for Desktops

lexluthermiester · Nov 15, 2024

btarunr said:
Just Not for Desktops

What a shame! Intel's desktop lineup could really use such a boost.

Vayra86 · Nov 15, 2024

FoulOnWhite said:
It's not AMD's 3D vcache, it's TSMC's

So its not Intel's CPU either anymore then. Neat!

Next time I want a patch for an application I'll write some random factory in China, too.

Craptacular · Nov 15, 2024

ZoneDymo said:
weird statement

It is not, TSMC owns the 3d cache packaging. It is not an AMD design. AMD simply took advantage of a services that TSMC offered (3d cache) and tried it out on their processers.

TSMC's 3D Stacked SoIC Packaging Making Quick Progress, Eyeing Ultra-Dense 3μm Pitch In 2027

And you have this deck from TSMC back in 2021 regarding 3d stacking: Advanced Technology Leadership

SOAREVERSOR · Nov 15, 2024

lexluthermiester said:
What a shame! Intel's desktop lineup could really use such a boost.

Desktop is by far the least important line up. Server and mobile are what matters. Desktop is so far behind either it's laughable. They are getting crushed in server but doing ok in mobile.

phints · Nov 15, 2024

FoulOnWhite said:
It's not AMD's 3D vcache, it's TSMC's

Bizarre statement, do you really not realize I just modified TPU's post title to make a point?

swaaye · Nov 15, 2024

There was a Broadwell chip with 60MB L3 cache. They aren't new to big L3. Sapphire Rapids has around 110MB L3 and also optionally a huge L4. More cache is just the natural progression for all these companies because the problems to solve are the same as ever.

Steevo · Nov 15, 2024

Craptacular said:
It is not, TSMC owns the 3d cache packaging. It is not an AMD design. AMD simply took advantage of a services that TSMC offered (3d cache) and tried it out on their processers.

TSMC's 3D Stacked SoIC Packaging Making Quick Progress, Eyeing Ultra-Dense 3μm Pitch In 2027

And you have this deck from TSMC back in 2021 regarding 3d stacking: Advanced Technology Leadership

It was based on AMD interposer technology for the first HBM stacks in 2015. That Intel also copied, and Nvidia.

unwind-protect · Nov 15, 2024

Give us HEDT CPUs with the cache and ECC memory and I'll forget about the desktop. Deal?

LabRat 891 · Nov 15, 2024

Been sayin' that X3D is good for more than gaming...

Intel will have an issue though: For all but its highest-billing most demanding customers, adding extra cache will 'extend' the usable life of the platform.
I wholly expect hardware-level platform locking, and a non-existent 2nd hand market (in years to come).

efikkan · Nov 15, 2024

human_error said:
Even well optimized games and workloads can benefit if the highly utilized code can be contained in the cache, as it is higher bandwidth and lower latency than waiting to go to system RAM. Even factorio, which is an extremely well optimized game, massively benefits from this, as do many other workloads.

You don't grasp the difference between L2 and L3 caches. L3 only contains data recently discarded by L2, so it's cache lines that have either been very recently used or more likely pre-fetched and then never used at all. The most data and computationally intensive workloads see no benefit beyond a decent L3 cache, because the program is what we called cached optimized, which is a requirement for any performant piece of software. For any such heavy workload, the chances of a hit in L3 of a data cache line is extremely low, except for the few times cores are synced. This means the few hits that you actually get is likely instruction cache lines, and the rest is just meaningless garbage streaming through the L3. Sensitivity to L3 cache is mainly known as an indicator of bloat in software optimization, and the solution is to reduce said bloat and make the code more computationally dense.

As heavy workloads move more and more towards SIMD (e.g. AVX-512), the amount of data streaming through memory->L2->L3 is greater than ever, and the chances of a hit in L3 data cache is getting slimmer and slimmer. (Which should be obvious, as the workload needs to be cache optimized, for both instruction and data, otherwise the pipeline would stall.) The amount of data cache lines greatly outnumbers instruction cache lines, which is why AMD needed so much of it in order to make a tiny difference.

While instruction cache lines are comparatively "few" in number and not bottlenecked by memory bandwidth, the cache hierarchy for data cache lines behave like a "streaming buffer"; a continuous stream of data flowing from memory->L2->L3, all the data being overwritten every few thousand clock cycles, so the bottleneck here would not be L3 bandwidth, but rather memory bandwidth.

It's no accident that CPUs over the past decade or so have continuously increased bandwidth of both memory and caches, especially for heavy AVX workloads, and even prioritizing bandwidth over latency. While the cache sizes (L1I, L1D, L2, L3) have comparatively remained fairly stable until the arrival of 3D V-cache (except growing L3 proportionally to core count), otherwise you might have expected a 1GB L2 cache by now. And this "discrepancy" is due to misconceptions about how chaches work; as said the caches are an extremely efficient streaming buffer to keep the execution ports fed (with staggering amounts of data flowing through there), not a hierarchy of data based on "importance".

human_error said:
You may as well say computers don't need more than 64k of RAM and any applications that do are poorly optimized.

Nice attempt at a straw man argument there, but you are in fact just grasping at straws.

ThomasK · Nov 15, 2024

FoulOnWhite said:
It's not AMD's 3D vcache, it's TSMC's

It was engineered by AMD and manufactured by TSMC.

Intel's taking a similar approach, but will call it something else.

phanbuey · Nov 15, 2024

efikkan said:
AMD has clearly better efficiency, but it's not due to the large L3 cache.
But it's hard to find something more deserving of the title "waste of sand" than throwing a bunch of L3 cache on a die, as it's only a tiny subset of very poorly optimized code which significantly benefit from it, namely certain outliers in applications and games running at very unrealistically low GPU load. It would be much better to have a CPU with 5% more computational power, especially down the road, as future games are likely to become more demanding so the bottleneck will be computational performance, not "artificial" ones running games at hundreds of frames per second.
For CPUs to advance, they should stop focusing on gimmicks and make actual architectural advancements instead. Large L3 caches is a waste of precious development resources as well as production capacity.

unless your architectural advancements are bottlenecked by memory bandwidth and latency- then that waste of sand turns into out of stock products that everyone who runs games wants….

DemonicRyzen666 · Nov 16, 2024

I must the only person who wants to see AMD try Forveros for dual CCD die cpu's....
Oh well.

Dazzm8 · Nov 16, 2024

Looks like Intel shouldn't have dissed glue so quickly

marios15 · Nov 16, 2024

Caches are usually defined by cycle latencies, not by size or preference.

L1 1ns - 4 cycles
L2 3ns - 14 cycles
L3 10ns - 50 cycles
L4/eDRAM 36ns - 140 cycles
DRAM 60-100ns - MANY cycles

Guess where X3D stands
Now the L4 has what? 50-100GB/sec bandwidth?
Just for comparison the first gen X3D can hit 600GB/sec with 47 cycles latency.
So it has 6x bandwidth and 3x faster access times....which is the same as most L3 caches.
Just fyi simple CPU instructions usually last 1-4 cycles and more complex ones like AVX might be up to 20-60-100 cycles

I think that the reason for L1/L2 caches not increasing is because they're part of the cores, doubling of size means greater area and bigger dies, which means higher latencies, only recently has density improved enough (die shrinks used to provide 2-3x density) due to EUV, that we saw some improvement.

In fact both L1 and L2 have increased in the last few gens after 20 years of staying between 256-512KB (not counting halo products like the FX or the shared L2....but different FX) all without increasing the latencies.

L3 is just easier to increase or move into it's own stacked die, there's even rumours that AMD plans to have the next Zen arch with L3 cache completely moved on a stacked die

thesmokingman · Nov 16, 2024

ZoneDymo said:
weird statement

It's the IDF. /s

LabRat 891 · Nov 16, 2024

DemonicRyzen666 said:
I must the only person who wants to see AMD try Forveros for dual CCD die cpu's....
Oh well.

Who knows, what the future may bring?

Caring1 · Nov 16, 2024

Copying could open the door for litigation.

Dawora · Nov 16, 2024

human_error said:
I don't understand those people honestly. Benchmarks don't show periodic stutters which you can get in some games for example, and those are fully eliminated for me. Plus, you do see the better lows and general performance in benchmarks. If CPUs didn't make a difference we'd all have 4090s paired with ancient processors.

I have my 7800X3D at 40-60W providing a much better, much more consistent experience with the same GPU and screen than my 9900k that was eating 150W.

CPU is only important now because its AMD right?

U PC have something wrong, maybe slow Ram?
my second PC whit 9900K and 4090 there is no difference in 4k gaming VS my main system 7800X3D whit same GPU
even 1440p there is no big differences

But if i use GPU like 4060 then i will se difference Asap,not because CPU but because Slow GPU

AMD have good CPUs but in real world GPU is much more important.
Both Intel/Amd even whit older CPUs can do gaming just fine.

Ppls just hyped extra % they see in Bench 1080p+4090

ThomasK said:
It was engineered by AMD and manufactured by TSMC.

Intel's taking a similar approach, but will call it something else.

it was engineered by TSMC not AMD

Nhonho · Nov 16, 2024

It looks like (another secret) agreement between Intel and AMD, dividing up which of the two gets which market.

FoulOnWhite · Nov 16, 2024

Dawora said:
CPU is only important now because its AMD right?

U PC have something wrong, maybe slow Ram?
my second PC whit 9900K and 4090 there is no difference in 4k gaming VS my main system 7800X3D whit same GPU
even 1440p there is no big differences

But if i use GPU like 4060 then i will se difference Asap,not because CPU but because Slow GPU

AMD have good CPUs but in real world GPU is much more important.
Both Intel/Amd even whit older CPUs can do gaming just fine.

Ppls just hyped extra % they see in Bench 1080p+4090

it was engineered by TSMC not AMD

TM says it all. TSMC invention licensed to AMD i guess for their use. I don't think it was originally for memory stacking was it? AMD just used it that way.

Can't wait to see how Intel does it, surely they can't copy, unless they get a secret license from TSMC to use it

efikkan · Nov 16, 2024

marios15 said:
Just fyi simple CPU instructions usually last 1-4 cycles and more complex ones like AVX might be up to 20-60-100 cycles

That's just plainly wrong.
Most core AVX operations are within 1-5 cycles on recent architectures. Haswell and Skylake did a lot to improve AVX throughput, but there have been several improvements since then too. E.g. add operations are now down from 4 to 2 cycles on Alder Lake and Sapphire Rapids. Shift operations are down to a single cycle. This is as fast as single integer operations. And FYI, all floating point operations go through the vector units, whether it's single operation, SSE or AVX, the latency will be the same.

LittleBro · Nov 16, 2024

To put large cache tile onto CPU cores was idea of one person in AMD team.

TSMCs 3D technology was used to manufacture this idea and they decided to further improve the technology.

Don't mix general 3D manufacturing process with that tile of extra cache im X3D CPUs.

luyten · Nov 16, 2024

_roman_ said:
I do not agree with that. Intel already had such a processor with extra "cache". i7-5775C

Intel® Core™ i7-5775C Processor (6M Cache, up to 3.70 GHz) - Product Specifications | Intel

Intel® Core™ i7-5775C Processor (6M Cache, up to 3.70 GHz) quick reference with specifications, features, and technologies.

www.intel.com

Again, the CPU includes 6MB of L3 cache and 128MB of eDRAM.

Broadwell: Intel Core i7-5775C And i5-5675C Review

Broadwell has only a few short months to shine before Intel's Skylake architecture is expected to surface. Can its two socketed CPUs steal the spotlight?

www.tomshardware.com

It's up to discussion. I see the 7800X3d Cache as 4th level one like the EDRAM cache of the i7-5775C

I always thought the eDRAM in those Intel processors was for the iGPU...

Punkenjoy · Nov 16, 2024

FoulOnWhite said:
View attachment 371926
TM says it all. TSMC invention licensed to AMD i guess for their use. I don't think it was originally for memory stacking was it? AMD just used it that way.

Can't wait to see how Intel does it, surely they can't copy, unless they get a secret license from TSMC to use it

Well the chip are made by TSMC, so in really, it's not AMD Ryzen but TSMC Ryzen right ?

Yes the fabrication technologies was researched by TSMC and they are the one doing it. But guess what, this is normal as they are they are the one making those chip for AMD. AMD is not a fab.

But, AMD is the only one right now using that technologies because this isn't just a box you tick when you order TSMC some wafer. It's not "I would take X chips with more cache". You still have to design a chip that will be able to communicate with the cache chips, send power etc.

The physical portion of 3D Vcache is a TSMC technology. This is expected as AMD is fabless.
The logical portion of 3D Vcache is an AMD technology. This is expected as TSMC do not design chip.

In the end, it's a collaboration of both company.

Also, The added chip is indeed L3. There is no separate lookup for that chip when there is a check if it de data is in the L3 cache. The whole 96 MB is looked at the same time and there is no penalty for accessing data into the 3d vcache chip.

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

System Name	Metalia
Processor	AMD Ryzen 7 5800X3D
Motherboard	Asus TuF Gaming X570-PLUS
Cooling	ID Cooling 280mm AIO w/ Arctic P14s
Memory	2x32GB DDR4-3600
Video Card(s)	Sapphire Pulse RX 9070 XT
Storage	Optane P5801X 400GB, Samsung 990Pro 2TB
Display(s)	LG ‎32GS95UV 32" OLED 240/480hz 4K/1080P Dual Mode
Case	Geometric Future M8 Dharma
Audio Device(s)	Xonar Essence STX
Power Supply	Seasonic Focus GX-1000 Gold
Mouse	Attack Shark R3 Magnesium - White
Keyboard	Keychron K8 Pro - White - Tactile Brown Switch
Software	Windows 10 IoT Enterprise LTSC 2021

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	"Lots of people name their swords. Lots of cunts."
Processor	R7 7800X3D
Motherboard	ASRock B650M PG Riptide
Cooling	Wraith Max + 2x Noctua Redux NF-P14r + 3x NF-P12r
Memory	2x 16GB ADATA XPG Lancer Blade DDR5-6000 C30
Video Card(s)	Sapphire Pulse RX 9070 XT
Storage	ADATA Legend 970 2TB PCIe 5.0
Display(s)	Dell 32" S3222DGM - 1440p 165Hz / P2422H 1080p 60Hz
Case	HYTE Y40
Audio Device(s)	Microsoft Xbox TLL-00008
Power Supply	Cooler Master MWE 750 V2
Mouse	Alienware AW320M
Keyboard	Alienware AW510K
Software	W11 Pro

System Name	stress-less
Processor	9800X3D @ 5425 MHZ
Motherboard	MSI PRO B650M-A Wifi
Cooling	Thermalright Phantom Spirit EVO (Intake)
Memory	64GB DDR5 6200 1:1 CL32-36-36, FCLK 2067
Video Card(s)	RTX 4090 FE
Storage	2TB WD SN850, 4TB WD SN850X
Display(s)	Alienware 32" 4k 240hz OLED
Case	Jonsbo Z20
Audio Device(s)	Yes
Power Supply	Corsair SF750
Mouse	DeathadderV2 X Hyperspeed
Keyboard	65% HE Keyboard
Software	Windows 11
Benchmark Scores	They're pretty good, nothing crazy.

System Name	S.L.I + RTX research rig
Processor	Ryzen 7 5800X 3D.
Motherboard	MSI MEG ACE X570
Cooling	Corsair H150i Cappellx
Memory	Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s)	2x Dell RTX 2080 Ti in S.L.I
Storage	Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s)	HP X24i
Case	Corsair 7000D Airflow
Power Supply	EVGA G+1600watts
Mouse	Corsair Scimitar
Keyboard	Cosair K55 Pro RGB

Processor	AMD 5900x
Motherboard	Asus x570 Strix-E
Cooling	Hardware Labs
Memory	G.Skill 4000c17 2x16gb
Video Card(s)	RTX 3090
Storage	Sabrent
Display(s)	Samsung G9
Case	Phanteks 719
Audio Device(s)	Fiio K5 Pro
Power Supply	EVGA 1000 P2
Mouse	Logitech G600
Keyboard	Corsair K95

System Name	H7 Flow 2024
Processor	AMD 5800X3D
Motherboard	Asus X570 Tough Gaming
Cooling	Custom liquid
Memory	32 GB DDR4
Video Card(s)	Intel ARC A750
Storage	Crucial P5 Plus 2TB.
Display(s)	AOC 24" Freesync 1m.s. 75Hz
Mouse	Lenovo
Keyboard	Eweadn Mechanical
Software	W11 Pro 64 bit

System Name	BigRed
Processor	ryzen 7 7800X3D
Motherboard	Asus Rog Strix B650E-E Gaming WIFI
Cooling	Noctua D15S chromax black/MX6
Memory	Corsair Vengeance 2x16GB DDR5 6000c30
Video Card(s)	MSI RTX 3080 Gaming Trio X 10GB
Storage	M.2 drives WD SN850X 1TB 4x4 BOOT/WD SN850X 4TB 4x4 STEAM/USB3 4TB OTHER
Display(s)	Dell s3422dwg 34" 3440x1440p 144hz ultrawide
Case	Corsair 7000D
Audio Device(s)	Logitech Z5450/KEF uniQ speakers/Bowers and Wilkins P7 Headphones
Power Supply	Corsair RM850x 80% gold
Mouse	Logitech G604 lightspeed wireless
Keyboard	Logitech G915 TKL lightspeed wireless
Software	Windows 10 Pro X64
Benchmark Scores	Who cares

System Name	AM5_TimeKiller
Processor	AMD Ryzen 7 9800X3D
Motherboard	ASUS ROG Strix B650E-F Gaming
Cooling	Arctic Freezer II 420 rev.7 (with 6 fans in push-pull setup)
Memory	G.Skill Trident Z5 Neo RGB, 2x16 GB DDR5, Hynix A-Die, 6400 MHz @ CL30-39-39-102-141 1T @ 1.40 V
Video Card(s)	ASUS TUF Radeon RX 9070 XT GAMING
Storage	Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case	Corsair 7000D Airflow
Audio Device(s)	Creative Sound Blaster X-Fi Titanium
Power Supply	Seasonic Prime TX-850
Mouse	Logitech wireless mouse for 15€, 6y old
Keyboard	Logitech wireless keyboard, 12y old

Processor	Intel Core i5-9600K
Motherboard	Asus TUF Z390-Plus Gaming (Wi-Fi)
Memory	2x8GB Corsair Vengeance LPX CMK16GX4M2A2666C16
Video Card(s)	Palit GeForce GTX 1060 Super JetStream 6GB
Storage	Samsung SSD 970 PRO 512GB, Samsung SSD 870 EVO 500GB
Case	Thermaltake CORE X31
Power Supply	Seasonic SS-620GB
Mouse	Logitech G403
Keyboard	Logitech G213 Prodigy
Software	Windows 10 Professional