• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Ryzen 7 9800X3D Has the CCD on Top of the 3D V-cache Die, Not Under it

Most of the people buying high core count chips aren't doing it for gaming

We know this isn't true given Intel has sold high core count chips for 2 generations advertised towards gamers.

You are vastly under-estimating the number of people wanting a chip that can do both gaming and core heavy tasks. I for one would have purchased a 7950X3D if it had matched a 7800X in gaming and I might purchase a 9950X3D if it matches the 9800X3D in gaming performance. For people buying in this price bracket it's a no-brainer to spend a little bit more to get a system that can do it all.

the X3D chips perform worse in most productivity and creative tasks where high core count matters.

You are conflating things, X3D chips perform worse in certain applications that are frequency sensitive that don't benefit from cache. In core heavy workloads they are 100% equal to their non-X3D counterparts.

Mind you, if AMD stacks the CCD above the cache as the article implies they may do, that negative disappears.

X3D makes much more sense for six and eight core chips than 16 core chips.

We know this is false because AMD themselves has stated X3D was designed for servers. That it came to consumer products is due to a side experiment by an AMD employee who wanted to see if there was benefit in everyday workloads.

Yes because of the added cache. The added cache produces gains in some areas but the limits it imposes causes losses in other areas. There's nothing wrong with that. It's great tech it's got a very specific focus and trade offs such as this have always existed. It's a lateral move to focus on a specific area.

Read the article, it specifically states that AMD may be getting rid of these limiations.
 
Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?
 
You are vastly under-estimating the number of people wanting a chip that can do both gaming and core heavy tasks.
I think we all forget/ignore people telling us that a product is just for one thing from time to time.

As if you're expected to have one 7800X3D for games, and a 7950X for work. Strictly thinking inside the box and turn it into law lol.

Or, people who won't stop bitching about why gaming laptops won't/shouldn't have cameras.. yeah you're supposed to buy another laptop for that, or a separate camera..

/end of rant
 
Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.
 
Latency? Light travels .3 meters in 1 ns. Latency isn't an issue.

Light does but electricity doesn't. Since AMD hasn't moved to photonic computing your comment is not very relevant.

Though indeed there shouldn't be any difference, it's still all in the same package and whatnot
 
I highly doubt this even possible as the substrate/PCB has all the connections for the cpu on it's layer, also the cpu are flip chips & have been for a while.
 
Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?
No, not a thing.
 
Exactly; and it's going to launch with no competition.
What's funny about that is that since Meteor Lake Intel has put their cores on top of another die. Before Meteor Lake came out, there were rumors that it was going to have an L4 cache in the base tile. It seems like Arrow Lake is pretty close to having the same CPU-stacked-over-cache technology if Intel wanted it to.
Interesting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets it's losing what I like about the X3D chips in the first place, and that is amazing gaming performance at low-ish power. If it's about the same as the 7800X3D I'm just gonna get the 7800X3D, lest they whoopsie a new IOD on these with more gen 5 lanes and CKD support.
That's an interesting concern. X3D had to be lower power, but now it won't need go be. But the 9000 series is a little more efficient than the 7000 series, and in other chips usually more cash does translate to more power savings even at the same frequency.
Most of the people buying high core count chips aren't doing it for gaming and the X3D chips perform worse in most productivity and creative tasks where high core count matters. X3D makes much more sense for six and eight core chips than 16 core chips.
Theoretically, with the v-cache no longer sitting between the CPU and the cooler, the X3D chips will be the same speed or faster than the regular chips in every use case. And since many people want one CPU both for productivity and gaming, there will still be demand for the higher core count chips.
 
Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.
Yes
 
Hard disagree, AMD has X3D cache in chips all the way down to the 5600X3D.

Having 2 cache chiplets on $700 - $750 parts is likewise absolutely possible.

Even if the uplift is a mere 3%, every little bit matters at the high end. Particularly when it could make the 9950X3D reach gaming parity with the 9800X3D, it would upsell a lot of people to the more expensive processor.

Thing is, you're sacrificing productivity by 3% as well since the other CCD won't clock as high. So the overall picture might be slightly different but I agree with the fact that matching 9800X3D while taking a 6% hit in productivity vs 9950X is better than being 3% slower while getting a 3% hit in productivity.
 
People might be able to get the cpu running at the same speed as the 9700x, nice.
 
Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?

This is already becoming true for the "TLB", translation lookaside buffer.

A "Page" has been set at 4096 bytes since the 1980s (even ARM systems are paged at 4k). There's a 4096 entry TLB in Zen5, meaning there is 4096 (entries) x 4096 bytes (per entry with default pages) == 16MB of RAM indexed in the Virtual RAM page table before the CPU Core runs out of entries.

That's smaller than Zen5 x3d L3 cache. In fact, this curious slowdown has been true for quite a few generations (and is likely a reason why Zen5 upgraded from 3072 entry into 4096 entry TLB between Zen4 and Zen5).

--------

Modern computers can theoretically use "HugePages" (2MB or 1GB in size). Servers are configured to use them but consumer hardware has so much backwards compatibility issues with Windows and Linux that the default page size remains 4k in practice. Still, if you can play with the right settings, setting up the TLB to be of these larger page sizes leads to 10%+ improvements as more data effectively fits in the TLB-cache (a process necessary before the real cache is hit).
 
Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.
It generally isn't, but we turned it into a bit of fun :)
 
I know where AMD is aiming with this ...

Since node shrinking will continue to be a tougher problem (less nm = less process yields, more heat density, etc), AMD wants to make place for bigger CCDs even with 4nm or 3nm. L3 cache takes size of roughly 4 Zen 5 cores. Putting that cache below cores would allow not only putting more cores into a CCD, but also expanding L3 cache and other caches, too. This way AMD can easily reach 10-12 cores per CCD with 96+ MB of cache in regular non-X3D processors.

Putting cache below CCD also allows for significant core clocks boost, basically the same clocks as you'd get with non-X3D CPUs.

One may start to think whether this is not the beginning of an end of X3D processors as we know them.
 
I don't think that's needed with cache size is this large, and all cores are connected to all cache anyway. I'm talking ONE SINGLE V-cache chip for ALL cores.

I haven't heards about such a thing, sounds like a really bad idea. AMD just moved V-cache in order to cool the CCD properly, that would one step forward, three steps backwards.
That would be a bad choice as that would make things even slower.
Searching trough memory takes time and the bigger it is the more time it takes.

Giving more cores access to the same memory also racks up penalties.
each core will only have limited time to read and write to the memory, and coordinating everything becomes even harder.

Also note that L3 isn't something that makes everything faster, if you look at the benchmarks provided here by the People of TPU you will see that it's only interesting for virtualisation and gaming.
And since gaming doesn't scale with an increasing number of cores. a second CCD with access to a big cache is worthless for gaming.
as for virtualisation the shared L3 is nothing but a security risk.

it's a joke that refers to https://www.imdb.com/title/tt0105929/
 
That would be a bad choice as that would make things even slower.
Searching trough memory takes time and the bigger it is the more time it takes.

Giving more cores access to the same memory also racks up penalties.
each core will only have limited time to read and write to the memory, and coordinating everything becomes even harder.
Well yeah, that's what I meant with "hard or complicated". You're correct in theory, but we have no grasp of where the practical limit currently is for doing this.
Also note that L3 isn't something that makes everything faster, if you look at the benchmarks provided here by the People of TPU you will see that it's only interesting for virtualisation and gaming.
Not sure why you're telling me this lol, I never said it makes everything faster. You're jumping to conclusions here.
And since gaming doesn't scale with an increasing number of cores. a second CCD with access to a big cache is worthless for gaming.
I've never said that. Also, that's not the only reason for doing it.
 
Well yeah, that's what I meant with "hard or complicated". You're correct in theory, but we have no grasp of where the practical limit currently is for doing this.

Not sure why you're telling me this lol, I never said it makes everything faster. You're jumping to conclusions here.

I've never said that. Also, that's not the only reason for doing it.
I think I forgot writing down that it probably wasn't worth the extra cost it would involve given the aforementioned which is why i listed them...
 
I think I forgot writing down that it probably wasn't worth the extra cost it would involve given the aforementioned which is why i listed them...
My point is in the post before.

The 16 cores with all V-cache is not necessarily about thinking you need more than 8 cores for games. It's for people who wants 16 cores for work, but not wanting a compromize in either way with that high price. Moved and double V-cache might help there. Unified, shared V-cache would be a possible next step, but maybe not feasible for one reason or another.

Then there's conflicing info about recommended hardware for Space marine 2 4k, for instance. I haven't read into it, but 12 cores is recommended (both AMD and Intel) on Steam.
 
My point is in the post before.

The 16 cores with all V-cache is not necessarily about thinking you need more than 8 cores for games. It's for people who wants 16 cores for work, but not wanting a compromize in either way with that high price. Moved and double V-cache might help there. Unified, shared V-cache would be a possible next step, but maybe not feasible for one reason or another.

Then there's conflicing info about recommended hardware for Space marine 2 4k, for instance. I haven't read into it, but 12 cores is recommended (both AMD and Intel) on Steam.

Unified double (or even multiple, in the case of Epyc) V-cache is the future. But to achieve this, they must first overcome the internal fabric bottleneck so accessing data across any chiplet or part of the chip is effectively seamless. This will probably happen when they move from 2.5D packaging (the current chiplet system) into a fully 3D system like Foveros/Intel's 3D tiling system. This physical closeness should allow a ultra-high-bandwidth link that will make such a thing possible.
 
I highly doubt this even possible as the substrate/PCB has all the connections for the cpu on it's layer, also the cpu are flip chips & have been for a while.
It's possible if the bottom die has contact pads on both sides. TSV makes that possible.
 
Interesting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets it's losing what I like about the X3D chips in the first place, and that is amazing gaming performance at low-ish power. If it's about the same as the 7800X3D I'm just gonna get the 7800X3D, lest they whoopsie a new IOD on these with more gen 5 lanes and CKD support.
You do realize you can undervolt and underclock it as you need, in order to hit YOUR power efficiency targets? Why should your goal hamper others' ambition to go fast.
 
Interesting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets
No, they're the same, 120 W TDP.

It's just that the 9800X3D actually can make use of it, not really a drawback. Just change it if you're not happy with it.
 
Thing is, you're sacrificing productivity by 3% as well since the other CCD won't clock as high. So the overall picture might be slightly different but I agree with the fact that matching 9800X3D while taking a 6% hit in productivity vs 9950X is better than being 3% slower while getting a 3% hit in productivity.

Again another person who didn't read the article or simply doesn't understand.

No, if what's stated ends up being correct in that the thermal issue is solved and clocks are the same between the X3D and non-X3D part productivity performance will be equal to or better than non-X3D parts. It would eliminate the downside to X3D chips.
 
Again another person who didn't read the article or simply doesn't understand.

No, if what's stated ends up being correct in that the thermal issue is solved and clocks are the same between the X3D and non-X3D part productivity performance will be equal to or better than non-X3D parts. It would eliminate the downside to X3D chips.

I read it and it's really not hard to understand the article but the part about not losing clocks is pure speculation. Turns out they were incorrect anyway and looking at the boost clocks between 9700x and 9800X3D, there's still a hit to clocks albeit less than before.

So yeah, adding L3 to both CCD's would reduce productivity for a minor gain in performance. What's worse is that it'll increase performance for unwanted situations which they would want to mitigate through drivers anyway because ideally you want the gaming cores to be pinned to one CCD. In situations where it jumps to another, it won't match the 9800X3D's performance simply because of the latency incurred to jump to the other CCD.

So you're looking at a slight benefit for games in edge cases and a slight hit to productivity for a CPU that costs more. Pretty sure AMD said the same during 7950X3D launch when they did the math. Whether that changes remains to be seen
 
This is already becoming true for the "TLB", translation lookaside buffer.

A "Page" has been set at 4096 bytes since the 1980s (even ARM systems are paged at 4k). There's a 4096 entry TLB in Zen5, meaning there is 4096 (entries) x 4096 bytes (per entry with default pages) == 16MB of RAM indexed in the Virtual RAM page table before the CPU Core runs out of entries.

That's smaller than Zen5 x3d L3 cache. In fact, this curious slowdown has been true for quite a few generations (and is likely a reason why Zen5 upgraded from 3072 entry into 4096 entry TLB between Zen4 and Zen5).

--------

Modern computers can theoretically use "HugePages" (2MB or 1GB in size). Servers are configured to use them but consumer hardware has so much backwards compatibility issues with Windows and Linux that the default page size remains 4k in practice. Still, if you can play with the right settings, setting up the TLB to be of these larger page sizes leads to 10%+ improvements as more data effectively fits in the TLB-cache (a process necessary before the real cache is hit).
That's just the TLB for data. In addition, there's a 2048 entry L2 TLB for instructions. Zen CPUs also can coalesce 4 consecutive pages into one TLB entry so one Zen 5 core can cover 64 MB of cache with the L2 data TLB.

Zen 4 also has page coalescing capability. There weren’t specifics on whether this mechanism changed in Zen 4, though performance counter unit mask descriptions indicate it’s still present. Assuming Zen 4 can coalesce up to four consecutive 4K pages like Zen 2 and 3, the 3072 entry L2 DTLB can cover up to 48 MB which is great news. While Zen 2/3’s 2048 entry L2 DTLB already preformed reasonably well, more is always better.
 
Back
Top