AMD Ryzen 7 9800X3D Has the CCD on Top of the 3D V-cache Die, Not Under it

evernessince · Oct 27, 2024

SOAREVERSOR said:
Most of the people buying high core count chips aren't doing it for gaming

We know this isn't true given Intel has sold high core count chips for 2 generations advertised towards gamers.

You are vastly under-estimating the number of people wanting a chip that can do both gaming and core heavy tasks. I for one would have purchased a 7950X3D if it had matched a 7800X in gaming and I might purchase a 9950X3D if it matches the 9800X3D in gaming performance. For people buying in this price bracket it's a no-brainer to spend a little bit more to get a system that can do it all.

SOAREVERSOR said:
the X3D chips perform worse in most productivity and creative tasks where high core count matters.

You are conflating things, X3D chips perform worse in certain applications that are frequency sensitive that don't benefit from cache. In core heavy workloads they are 100% equal to their non-X3D counterparts.

Mind you, if AMD stacks the CCD above the cache as the article implies they may do, that negative disappears.

SOAREVERSOR said:
X3D makes much more sense for six and eight core chips than 16 core chips.

We know this is false because AMD themselves has stated X3D was designed for servers. That it came to consumer products is due to a side experiment by an AMD employee who wanted to see if there was benefit in everyday workloads.

SOAREVERSOR said:
Yes because of the added cache. The added cache produces gains in some areas but the limits it imposes causes losses in other areas. There's nothing wrong with that. It's great tech it's got a very specific focus and trade offs such as this have always existed. It's a lateral move to focus on a specific area.

Read the article, it specifically states that AMD may be getting rid of these limiations.

TumbleGeorge · Oct 27, 2024

Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?

SL2 · Oct 27, 2024

evernessince said:
You are vastly under-estimating the number of people wanting a chip that can do both gaming and core heavy tasks.

I think we all forget/ignore people telling us that a product is just for one thing from time to time.

As if you're expected to have one 7800X3D for games, and a 7950X for work. Strictly thinking inside the box and turn it into law lol.

Or, people who won't stop bitching about why gaming laptops won't/shouldn't have cameras.. yeah you're supposed to buy another laptop for that, or a separate camera..

/end of rant

ThomasK · Oct 27, 2024

Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.

trsttte · Oct 27, 2024

demirael said:
Latency? Light travels .3 meters in 1 ns. Latency isn't an issue.

Light does but electricity doesn't. Since AMD hasn't moved to photonic computing your comment is not very relevant.

Though indeed there shouldn't be any difference, it's still all in the same package and whatnot

DemonicRyzen666 · Oct 27, 2024

I highly doubt this even possible as the substrate/PCB has all the connections for the cpu on it's layer, also the cpu are flip chips & have been for a while.

igormp · Oct 27, 2024

TumbleGeorge said:
Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?

No, not a thing.

Squared · Oct 28, 2024

claylomax said:
Exactly; and it's going to launch with no competition.

What's funny about that is that since Meteor Lake Intel has put their cores on top of another die. Before Meteor Lake came out, there were rumors that it was going to have an L4 cache in the base tile. It seems like Arrow Lake is pretty close to having the same CPU-stacked-over-cache technology if Intel wanted it to.

yfn_ratchet said:
Interesting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets it's losing what I like about the X3D chips in the first place, and that is amazing gaming performance at low-ish power. If it's about the same as the 7800X3D I'm just gonna get the 7800X3D, lest they whoopsie a new IOD on these with more gen 5 lanes and CKD support.

That's an interesting concern. X3D had to be lower power, but now it won't need go be. But the 9000 series is a little more efficient than the 7000 series, and in other chips usually more cash does translate to more power savings even at the same frequency.

SOAREVERSOR said:
Most of the people buying high core count chips aren't doing it for gaming and the X3D chips perform worse in most productivity and creative tasks where high core count matters. X3D makes much more sense for six and eight core chips than 16 core chips.

Theoretically, with the v-cache no longer sitting between the CPU and the cooler, the X3D chips will be the same speed or faster than the regular chips in every use case. And since many people want one CPU both for productivity and gaming, there will still be demand for the higher core count chips.

A&P211 · Oct 28, 2024

ThomasK said:
Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.

Yes

mkppo · Oct 28, 2024

evernessince said:
Hard disagree, AMD has X3D cache in chips all the way down to the 5600X3D.

Having 2 cache chiplets on $700 - $750 parts is likewise absolutely possible.

Even if the uplift is a mere 3%, every little bit matters at the high end. Particularly when it could make the 9950X3D reach gaming parity with the 9800X3D, it would upsell a lot of people to the more expensive processor.

Thing is, you're sacrificing productivity by 3% as well since the other CCD won't clock as high. So the overall picture might be slightly different but I agree with the fact that matching 9800X3D while taking a 6% hit in productivity vs 9950X is better than being 3% slower while getting a 3% hit in productivity.

RaceT3ch · Oct 28, 2024

People might be able to get the cpu running at the same speed as the 9700x, nice.

dragontamer5788 · Oct 28, 2024

TumbleGeorge said:
Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?

This is already becoming true for the "TLB", translation lookaside buffer.

A "Page" has been set at 4096 bytes since the 1980s (even ARM systems are paged at 4k). There's a 4096 entry TLB in Zen5, meaning there is 4096 (entries) x 4096 bytes (per entry with default pages) == 16MB of RAM indexed in the Virtual RAM page table before the CPU Core runs out of entries.

That's smaller than Zen5 x3d L3 cache. In fact, this curious slowdown has been true for quite a few generations (and is likely a reason why Zen5 upgraded from 3072 entry into 4096 entry TLB between Zen4 and Zen5).

--------

Modern computers can theoretically use "HugePages" (2MB or 1GB in size). Servers are configured to use them but consumer hardware has so much backwards compatibility issues with Windows and Linux that the default page size remains 4k in practice. Still, if you can play with the right settings, setting up the TLB to be of these larger page sizes leads to 10%+ improvements as more data effectively fits in the TLB-cache (a process necessary before the real cache is hit).

Vayra86 · Oct 28, 2024

ThomasK said:
Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.

It generally isn't, but we turned it into a bit of fun

LittleBro · Oct 28, 2024

I know where AMD is aiming with this ...

Since node shrinking will continue to be a tougher problem (less nm = less process yields, more heat density, etc), AMD wants to make place for bigger CCDs even with 4nm or 3nm. L3 cache takes size of roughly 4 Zen 5 cores. Putting that cache below cores would allow not only putting more cores into a CCD, but also expanding L3 cache and other caches, too. This way AMD can easily reach 10-12 cores per CCD with 96+ MB of cache in regular non-X3D processors.

Putting cache below CCD also allows for significant core clocks boost, basically the same clocks as you'd get with non-X3D CPUs.

One may start to think whether this is not the beginning of an end of X3D processors as we know them.

kondamin · Oct 28, 2024

SL2 said:
I don't think that's needed with cache size is this large, and all cores are connected to all cache anyway. I'm talking ONE SINGLE V-cache chip for ALL cores.

I haven't heards about such a thing, sounds like a really bad idea. AMD just moved V-cache in order to cool the CCD properly, that would one step forward, three steps backwards.

That would be a bad choice as that would make things even slower.
Searching trough memory takes time and the bigger it is the more time it takes.

Giving more cores access to the same memory also racks up penalties.
each core will only have limited time to read and write to the memory, and coordinating everything becomes even harder.

Also note that L3 isn't something that makes everything faster, if you look at the benchmarks provided here by the People of TPU you will see that it's only interesting for virtualisation and gaming.
And since gaming doesn't scale with an increasing number of cores. a second CCD with access to a big cache is worthless for gaming.
as for virtualisation the shared L3 is nothing but a security risk.

A&P211 said:
Yes

it's a joke that refers to https://www.imdb.com/title/tt0105929/

SL2 · Oct 28, 2024

kondamin said:
That would be a bad choice as that would make things even slower.
Searching trough memory takes time and the bigger it is the more time it takes.

Giving more cores access to the same memory also racks up penalties.
each core will only have limited time to read and write to the memory, and coordinating everything becomes even harder.

Well yeah, that's what I meant with "hard or complicated". You're correct in theory, but we have no grasp of where the practical limit currently is for doing this.

kondamin said:
Also note that L3 isn't something that makes everything faster, if you look at the benchmarks provided here by the People of TPU you will see that it's only interesting for virtualisation and gaming.

Not sure why you're telling me this lol, I never said it makes everything faster. You're jumping to conclusions here.

kondamin said:
And since gaming doesn't scale with an increasing number of cores. a second CCD with access to a big cache is worthless for gaming.

I've never said that. Also, that's not the only reason for doing it.

kondamin · Oct 28, 2024

SL2 said:
Well yeah, that's what I meant with "hard or complicated". You're correct in theory, but we have no grasp of where the practical limit currently is for doing this.

Not sure why you're telling me this lol, I never said it makes everything faster. You're jumping to conclusions here.

I've never said that. Also, that's not the only reason for doing it.

I think I forgot writing down that it probably wasn't worth the extra cost it would involve given the aforementioned which is why i listed them...

SL2 · Oct 28, 2024

kondamin said:
I think I forgot writing down that it probably wasn't worth the extra cost it would involve given the aforementioned which is why i listed them...

My point is in the post before.

The 16 cores with all V-cache is not necessarily about thinking you need more than 8 cores for games. It's for people who wants 16 cores for work, but not wanting a compromize in either way with that high price. Moved and double V-cache might help there. Unified, shared V-cache would be a possible next step, but maybe not feasible for one reason or another.

Then there's conflicing info about recommended hardware for Space marine 2 4k, for instance. I haven't read into it, but 12 cores is recommended (both AMD and Intel) on Steam.

Dr. Dro · Oct 28, 2024

SL2 said:
My point is in the post before.

The 16 cores with all V-cache is not necessarily about thinking you need more than 8 cores for games. It's for people who wants 16 cores for work, but not wanting a compromize in either way with that high price. Moved and double V-cache might help there. Unified, shared V-cache would be a possible next step, but maybe not feasible for one reason or another.

Then there's conflicing info about recommended hardware for Space marine 2 4k, for instance. I haven't read into it, but 12 cores is recommended (both AMD and Intel) on Steam.

Unified double (or even multiple, in the case of Epyc) V-cache is the future. But to achieve this, they must first overcome the internal fabric bottleneck so accessing data across any chiplet or part of the chip is effectively seamless. This will probably happen when they move from 2.5D packaging (the current chiplet system) into a fully 3D system like Foveros/Intel's 3D tiling system. This physical closeness should allow a ultra-high-bandwidth link that will make such a thing possible.

Wirko · Oct 28, 2024

DemonicRyzen666 said:
I highly doubt this even possible as the substrate/PCB has all the connections for the cpu on it's layer, also the cpu are flip chips & have been for a while.

It's possible if the bottom die has contact pads on both sides. TSV makes that possible.

mouacyk · Oct 28, 2024

yfn_ratchet said:
Interesting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets it's losing what I like about the X3D chips in the first place, and that is amazing gaming performance at low-ish power. If it's about the same as the 7800X3D I'm just gonna get the 7800X3D, lest they whoopsie a new IOD on these with more gen 5 lanes and CKD support.

You do realize you can undervolt and underclock it as you need, in order to hit YOUR power efficiency targets? Why should your goal hamper others' ambition to go fast.

SL2 · Oct 28, 2024

yfn_ratchet said:
Interesting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets

No, they're the same, 120 W TDP.

It's just that the 9800X3D actually can make use of it, not really a drawback. Just change it if you're not happy with it.

evernessince · Oct 28, 2024

mkppo said:
Thing is, you're sacrificing productivity by 3% as well since the other CCD won't clock as high. So the overall picture might be slightly different but I agree with the fact that matching 9800X3D while taking a 6% hit in productivity vs 9950X is better than being 3% slower while getting a 3% hit in productivity.

Again another person who didn't read the article or simply doesn't understand.

No, if what's stated ends up being correct in that the thermal issue is solved and clocks are the same between the X3D and non-X3D part productivity performance will be equal to or better than non-X3D parts. It would eliminate the downside to X3D chips.

mkppo · Oct 28, 2024

evernessince said:
Again another person who didn't read the article or simply doesn't understand.

No, if what's stated ends up being correct in that the thermal issue is solved and clocks are the same between the X3D and non-X3D part productivity performance will be equal to or better than non-X3D parts. It would eliminate the downside to X3D chips.

I read it and it's really not hard to understand the article but the part about not losing clocks is pure speculation. Turns out they were incorrect anyway and looking at the boost clocks between 9700x and 9800X3D, there's still a hit to clocks albeit less than before.

So yeah, adding L3 to both CCD's would reduce productivity for a minor gain in performance. What's worse is that it'll increase performance for unwanted situations which they would want to mitigate through drivers anyway because ideally you want the gaming cores to be pinned to one CCD. In situations where it jumps to another, it won't match the 9800X3D's performance simply because of the latency incurred to jump to the other CCD.

So you're looking at a slight benefit for games in edge cases and a slight hit to productivity for a CPU that costs more. Pretty sure AMD said the same during 7950X3D launch when they did the math. Whether that changes remains to be seen

AnotherReader · Oct 28, 2024

dragontamer5788 said:
This is already becoming true for the "TLB", translation lookaside buffer.

A "Page" has been set at 4096 bytes since the 1980s (even ARM systems are paged at 4k). There's a 4096 entry TLB in Zen5, meaning there is 4096 (entries) x 4096 bytes (per entry with default pages) == 16MB of RAM indexed in the Virtual RAM page table before the CPU Core runs out of entries.

That's smaller than Zen5 x3d L3 cache. In fact, this curious slowdown has been true for quite a few generations (and is likely a reason why Zen5 upgraded from 3072 entry into 4096 entry TLB between Zen4 and Zen5).

--------

Modern computers can theoretically use "HugePages" (2MB or 1GB in size). Servers are configured to use them but consumer hardware has so much backwards compatibility issues with Windows and Linux that the default page size remains 4k in practice. Still, if you can play with the right settings, setting up the TLB to be of these larger page sizes leads to 10%+ improvements as more data effectively fits in the TLB-cache (a process necessary before the real cache is hit).

That's just the TLB for data. In addition, there's a 2048 entry L2 TLB for instructions. Zen CPUs also can coalesce 4 consecutive pages into one TLB entry so one Zen 5 core can cover 64 MB of cache with the L2 data TLB.

Zen 4 also has page coalescing capability. There weren’t specifics on whether this mechanism changed in Zen 4, though performance counter unit mask descriptions indicate it’s still present. Assuming Zen 4 can coalesce up to four consecutive 4K pages like Zen 2 and 3, the 3072 entry L2 DTLB can cover up to 48 MB which is great news. While Zen 2/3’s 2048 entry L2 DTLB already preformed reasonably well, more is always better.

Processor	Ryzen 7800X3D
Motherboard	ASRock X670E Taichi
Cooling	Noctua NH-D15 Chromax
Memory	32GB DDR5 6000 CL30
Video Card(s)	MSI RTX 4090 Trio
Storage	P5800X 1.6TB 4x 15.36TB Micron 9300 Pro 4x WD Black 8TB M.2
Display(s)	Acer Predator XB3 27" 240 Hz
Case	Thermaltake Core X9
Audio Device(s)	JDS Element IV, DCA Aeon II
Power Supply	Seasonic Prime Titanium 850w
Mouse	PMM P-305
Keyboard	Wooting HE60
VR HMD	Valve Index
Software	Win 10

System Name	Lots of people name their swords. Lots of cunts.
Processor	R7 7800X3D
Motherboard	ASRock B650M PG Riptide
Cooling	Wraith Max + 2x Noctua Redux NF-P14r + 3x NF-P12
Memory	2x16GB ADATA XPG Lancer Blade DDR5-6000 CL30
Video Card(s)	Sapphire Pulse RX 9070 XT
Storage	ADATA Legend 970 2TB PCIe 5.0
Display(s)	Dell 32" S3222DGM - 1440p 165Hz + P2422H 1080p 60Hz
Case	HYTE Y40
Audio Device(s)	Microsoft Xbox TLL-00008
Power Supply	Cooler Master MWE 750 V2
Mouse	Alienware AW320M
Keyboard	Alienware AW510K
Software	W11 Pro

System Name	S.L.I + RTX research rig
Processor	Ryzen 7 5800X 3D.
Motherboard	MSI MEG ACE X570
Cooling	Corsair H150i Cappellx
Memory	Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s)	2x Dell RTX 2080 Ti in S.L.I
Storage	Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s)	HP X24i
Case	Corsair 7000D Airflow
Power Supply	EVGA G+1600watts
Mouse	Corsair Scimitar
Keyboard	Cosair K55 Pro RGB

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Lenovo slim 5 16'
Processor	AMD 8845hs
Motherboard	Lenovo motherboard
Cooling	2 fans
Memory	64gb 5600mhz cl40
Video Card(s)	4070 laptop
Storage	16tb, x2 8tb SSD
Display(s)	16in 16:10 (1920x1200) 144hz
Power Supply	230w psu

System Name	GraniteXT
Processor	Ryzen 9950X
Motherboard	ASRock B650M-HDV
Cooling	2x360mm custom loop
Memory	2x24GB Team Xtreem DDR5-8000 [M die]
Video Card(s)	RTX 3090 FE underwater
Storage	Intel P5800X 800GB + Samsung 980 Pro 2TB
Display(s)	MSI 342C 34" OLED
Case	O11D Evo RGB
Audio Device(s)	DCA Aeon 2 w/ SMSL M200/SP200
Power Supply	Superflower Leadex VII XG 1300W
Mouse	Razer Basilisk V3
Keyboard	Steelseries Apex Pro V2 TKL

System Name	Why won't this thing die?
Processor	Ryzen 5 5600
Motherboard	Aorus X370 Gaming 5
Cooling	Cooler Master ML240L V2
Memory	3200mhz CL16 Silicon Power (2 x 16gb)
Video Card(s)	Aorus 5700 XT
Storage	2x Samsung 970 Evo Plus 500gb (On an X8 Expansion card) + Crucial P3 1TB
Display(s)	XG2431 (Luv ya Viewsonic for this great monitor)
Case	Cooler Master MB TG520
Audio Device(s)	HyperX Cloud Alpha
Power Supply	AP850GM (Aorus 850 Watt)
Mouse	Razer Viper Ultimate (If you're reading this, can you please suggest me a new one?)
Keyboard	Redragon K614
Software	Windows 11
Benchmark Scores	4.7GHZ on the CPU at 1.3 Volts

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	AM4_TimeKiller
Processor	AMD Ryzen 5 5600X @ all-core 4.7 GHz
Motherboard	ASUS ROG Strix B550-E Gaming
Cooling	Arctic Freezer II 420 rev.7 (push-pull)
Memory	G.Skill TridentZ RGB, 2x16 GB DDR4, B-Die, 3800 MHz @ CL14-15-14-29-43 1T, 53.2 ns
Video Card(s)	ASRock Radeon RX 7800 XT Phantom Gaming
Storage	Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case	Corsair 7000D Airflow
Audio Device(s)	Creative Sound Blaster X-Fi Titanium
Power Supply	Seasonic Prime TX-850
Mouse	Logitech wireless mouse
Keyboard	Logitech wireless keyboard

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	Gentoo64 /w Cold Coffee
Processor	9900K 5.2GHz @1.312v
Motherboard	MXI APEX
Cooling	Raystorm Pro + 1260mm Super Nova
Memory	2x16GB TridentZ 4000-14-14-28-2T @1.6v
Video Card(s)	RTX 4090 LiquidX Barrow 3015MHz @1.1v
Storage	660P 1TB, 860 QVO 2TB
Display(s)	LG C1 + Predator XB1 QHD
Case	Open Benchtable V2
Audio Device(s)	SB X-Fi
Power Supply	MSI A1000G
Mouse	G502
Keyboard	G815
Software	Gentoo/Windows 10
Benchmark Scores	Always only ever very fast

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04