Friday, September 27th 2024

AMD Ryzen 9 9950X3D and 9900X3D to Feature 3D V-cache on Both CCD Chiplets

Earlier this week, we got rumors that AMD is rushing in the Ryzen 7 9800X3D 8-core/16-thread "Zen 5" processor with 3D V-cache for a late-October debut. The 9800X3D succeeds the popular 7800X3D, and AMD probably hopes it will have a competitive gaming processor in time for Intel's Core Ultra 2-series "Arrow Lake-S" launch. In the previous article, it was reported that the higher core-count 9000X3D series processor models, the Ryzen 9 9950X3D and Ryzen 9 9900X3D, would arrive some time in Q1 2025, because it was reported that the chips have certain "new features" compared to their predecessors, the 7950X3D and 7900X3D. At the time, we even explored the possibility of AMD giving both 8-core CCDs on the processor 3D V-cache. Turns out, this is where things are headed.

A new report by Benchlife.info claims that the higher core-count 9950X3D and 9900X3D will implement 3D V-cache on both CCD chiplets, giving these processors an impressive 192 MB of L3 cache (96 MB per CCD), and 208 MB or 204 MB of "total cache" (L2+L3). The report also says that AMD is planning a Ryzen 5 9600X3D chip, its second attempt at taking on Intel's Core i5 lineup, following its very recent release of the Ryzen 5 7600X3D, which ended up 1-3% short of the Core i5-14600K in gaming workloads. There's no word on whether the 9600X3D will launch in October alongside the 9800X3D, or in Q1-2025 with the Ryzen 9 9000X3D series.
Documentation indicates that the max 3DVCache is still 64 MB, for a total of 96 MB L3 per CCD.
The introduction of 3D V-cache on both CCDs of the 9950X3D and 9900X3D could be interesting, as both chiplets will be capable of gaming workloads at a uniform performance level. On the 7950X3D and 7900X3D, OS scheduler-level QoS logic ensure gaming workloads are scheduled to the CCD with the 3D V-cache, while multithreaded productivity workloads are allowed to spread across both CCDs.
Source: Benchlife.info
Add your own comment

126 Comments on AMD Ryzen 9 9950X3D and 9900X3D to Feature 3D V-cache on Both CCD Chiplets

#51
Chrispy_
I'm unsure if this is a net win or not. For sure, it will add to the manufacturing cost which is almost certainly going to translate to higher MSRPs on these two CPUs.

Games are the single biggest beneficiary of the extra cache, but they need that cache to be unified. For productivity applications, the reduced clockspeed of the CCD with v-cache is actually a downside that hinders performance.

So, if games can't take advantage because the cache isn't unified between both CCDs, and the extra cache hurts application performance in everything else because of the reduction in clockspeeds, I do not think there is any point including v-cache on both CCDs. It's just added cost that seems like a lose-lose scenario.

If AMD have found a way to link the cache so that any core can access v-cache on either die, then AMD have truly made a winner and I suspect the 7800X3D's gaming performance crown is about to be blown away by the 9950X3D. We can hope, right?
Posted on Reply
#52
Makaveli
AnotherReaderI'm not sure if inter die requests were ever a factor. If they were, then EPYC X would suffer much more than a hypothetical 5950X3D with 192 MB of L3 cache.
EPYC X workload usually isn't gaming though.
Posted on Reply
#53
FoulOnWhite
AMDs AM5 saviour has arisen just as with AM4(kind of) these will most like sell a ton.
Posted on Reply
#54
biffzinker
RogueSixIt is quite the contrary to what you say. Scheduling will now be even more important to the point it becomes SUPER-DUPER-MEGA-EXTRA-important with cache on both CCDs. For this dual cache setup to work correctly, games/apps (via the scheduler) always need to request the cached data from the "correct" cache on the "correct" CCD or else you will suffer latencies from hell if/when data needs to be fetched from the cache across the CCDs because e.g. Core 3 requests data that was previously stored to the cache by Core 14 on the other CCD. Can't have a scenario like that. Ever.

So, both the scheduler and the CPU always need to "know" exactly "who" (which core) cached something (what) and where it was cached to avoid the dreaded inter-CCD and inter-cache latencies. This is definitely going to be a challenge and very complex on the level of correct scheduling and correct CCD assignment etc.

AMD does not exactly have the best track record when it comes to these scheduling and core assignment shenanigans so I would be quite surprised if they get this to work flawlessly out of the gate.
Personally, I have avoided multi CCD CPUs like the plague due to the Xbox GameBar and 'GameMode On' requirements (I have a PC and not a console, you muppets). It will be interesting to see if the GameBar requirement will be dropped now(?) since core parking will no longer be required.

We'll have to wait and see how well this is gonna work in practice. I would expect some growing pains, to say the least...
You do realize cache snooping has existed since there was Two Socket, and Four Socket Pentium Pro boards? It would of also been active for AMD's AthlonXP MP chips on a two Socket A board. A similar occurrence happened on Intel's side when they took a shortcut to a quad core chip by using two Penryn dual-core dies with the Core 2 Quad 6600, and Core 2 Quad 9000 series. The cache snooping had to go out over the shared FSB.



Posted on Reply
#55
rv8000
AusWolfIt's the usual "X3D if you game, normal if you don't" narrative again. Personally, I don't mind. Those higher clocks don't give you that much more performance anyway - only more power consumed and heat.
Except now you’ll be losing more non gaming performance due to both chiplets being potentially limited on frequency, plus the additional cost. Price to performance will tank, glad we’re all gaming at 720p though with 4090s.
Posted on Reply
#56
ThomasK
igormpMost consumers will get no benefit from such extra cache.
Sure, you're the expert.
Posted on Reply
#57
Ruru
S.T.A.R.S.
Prima.VeraWhat happened to the 8xxx series? Why the jump from 7 to 9??
There are APUs in the 8k series lineup. Though they skipped the 6k series from desktops entirelly.
Posted on Reply
#58
AusWolf
rv8000Except now you’ll be losing more non gaming performance due to both chiplets being potentially limited on frequency, plus the additional cost. Price to performance will tank, glad we’re all gaming at 720p though with 4090s.
I upgraded to a 7800X3D from a 7700X, and I can tell you, the performance difference between the two is negligible. Circa 18k vs 19k points in Cinebench R23. That's what? 5 percent? Except that the 7800 can do it with 80 W, while the 7700 needs to max out its 142 W limit to achieve the higher number. With all this in mind, plus with a better gaming performance, I don't care about that 5% at all.
Posted on Reply
#59
xacid
AusWolfI upgraded to a 7800X3D from a 7700X, and I can tell you, the performance difference between the two is negligible. Circa 18k vs 19k points in Cinebench R23. That's what? 5 percent? Except that the 7800 can do it with 80 W, while the 7700 needs to max out its 142 W limit to achieve the higher number. With all this in mind, plus with a better gaming performance, I don't care about that 5% at all.
Really not going to see improvement with the 3D cache in cinebench. Thing that benefits the most is gaming.
Posted on Reply
#60
AnotherReader
MakaveliEPYC X workload usually isn't gaming though.
That's right, but EPYC X workloads are far more likely than gaming to utilize more than 8 cores. With 11 more CCDs, rather than 1 additional CCD, the likelihood of a miss hitting in another CCD's cache should be higher for many of these cases. In addition, AMD has had snoop filters for a long time. We don't know if the Ryzen IOD includes this, but given that the IO die is using a more advanced process without any increase in die size, there might be enough space for one.
Posted on Reply
#61
kapone32
dgianstefanilaptop APUs
What about the 8700G? Is that a laptop APU?

Now the narrative can be satisfied. I guess we will see what happens as AMD said it made no difference so time to see if the AMD engineers are wrong.
Posted on Reply
#62
thesmokingman
kapone32Now the narrative can be satisfied. I guess we will see what happens as AMD said it made no difference so time to see if the AMD engineers are wrong.
I haven't stayed at a Holiday Inn recently so I'll just wait for the real info to come out. /s
Posted on Reply
#63
mkppo
What is the point of this? Games shouldn't cross CCD's and will ideally be pinned to one. What is the other CCD with cache going to do?

Their previous approach worked just fine and with scheduler improvements would've improved further. This will literally not do anything other than make uninformed people happy that there's extra victim cache on both CCD's which will do nothing in reality.

Don't think this is going to happen. AMD can't care any less about the people wanting extra cache on both CCD's thinking it'll magically improve performance. It won't.
Posted on Reply
#64
Neo_Morpheus
usinameFinally, now the people will see that the 3D cache on both dies is useless and will stop crying for this
Maybe something changed, but I do recall AMD stating that it didnt make any difference having both CCDs with 3D cache.
But since everyone keep trashing Zen 5...
AusWolfI upgraded to a 7800X3D from a 7700X, and I can tell you, the performance difference between the two is negligible. Circa 18k vs 19k points in Cinebench R23. That's what? 5 percent? Except that the 7800 can do it with 80 W, while the 7700 needs to max out its 142 W limit to achieve the higher number. With all this in mind, plus with a better gaming performance, I don't care about that 5% at all.
Something that has been the norm since the first X3D chip came out, but since the norm everywhere is to trash AMD regardless, you end up with the current narrative used everywhere that Zen 5 is absolute trash because its not a good gaming CPU but continuing bellow..
xacidThing that benefits the most is gaming.
Please do repeat that to all the haters that are trashing the current Zen 5 CPU's due to not being good in ...gaming...

I am curious to see if this time, adding 3D cache on both CCD's works. And saving on popcorn, for when the price is announced...:roll:
Posted on Reply
#65
rv8000
AusWolfI upgraded to a 7800X3D from a 7700X, and I can tell you, the performance difference between the two is negligible. Circa 18k vs 19k points in Cinebench R23. That's what? 5 percent? Except that the 7800 can do it with 80 W, while the 7700 needs to max out its 142 W limit to achieve the higher number. With all this in mind, plus with a better gaming performance, I don't care about that 5% at all.
Slightly different story with a dual CCD sku, of which we don’t know what the frequencies will be and whether or not crossing the ccx is still going to cause performance drops as long as scheduling is left up to software/windows. Primarily why a hardware scheduler would’ve been a better solution, as gaming software is largely limited by console hw specs and not utilizing more than 8 cores anyways. A proper implementation of your “best of both worlds” dig would absolutely be a better solution when it comes to price performance, and even providing better efficiency with a hybridized design, but I’ll leave you to your concession.
Posted on Reply
#66
igormp
ThomasKSure, you're the expert.
I mean, feel free to point out any use that's relevant to consumers other than the already mentioned ones (gaming, hpc and cfd). You linked a DC scenario, which doesn't reflect consumer usage at all.
AnotherReaderThat's right, but EPYC X workloads are far more likely than gaming to utilize more than 8 cores. With 11 more CCDs, rather than 1 additional CCD, the likelihood of a miss hitting in another CCD's cache should be higher for many of these cases. In addition, AMD has had snoop filters for a long time. We don't know if the Ryzen IOD includes this, but given that the IO die is using a more advanced process without any increase in die size, there might be enough space for one.
EPYC X workloads should also be embarrasingly parallel and not need to do such cross-core requests. Reminder that, even with improvments, the cross-CCD latency has the same cost as actually accessing RAM.
I'm pretty sure the L3 cache on Zen is coherent, but it's a victim cache whose cross-CCD costs doesn't make much sense to care about this that much, as far as I know.
Posted on Reply
#67
ThomasK
igormp(gaming, hpc and cfd).
You replied to my comment on someone else's post, which I'll repost below.
Finally, now the people will see that the 3D cache on both dies is useless and will stop crying for this
Now you answered it yourself.

Thanks.
Posted on Reply
#68
AnotherReader
igormpI mean, feel free to point out any use that's relevant to consumers other than the already mentioned ones (gaming, hpc and cfd). You linked a DC scenario, which doesn't reflect consumer usage at all.


EPYC X workloads should also be embarrasingly parallel and not need to do such cross-core requests. Reminder that, even with improvments, the cross-CCD latency has the same cost as actually accessing RAM.
I'm pretty sure the L3 cache on Zen is coherent, but it's a victim cache whose cross-CCD costs doesn't make much sense to care about this that much, as far as I know.
I agree that the latency of getting a line from a remote L3 cache is in the same ballpark as a hit in better than JEDEC DRAM. EPYC X workloads are much more parallel than gaming, but I doubt that inter core latency is hampering games right now, especially considering Intel's cores handle gaming well and in their case, only the 8 P cores have decent inter core latency. The vast majority of games don't need more than 8 cores so the issue of inter core latency is expected to be inconsequential.
Posted on Reply
#69
igormp
ThomasKYou replied to my comment on someone else's post, which I'll repost below.


Now you answered it yourself.

Thanks.
But I do agree with the poster you originally replied to lol
I don't see a point of a 2x CCD v-cache CPU for games, the 9800x3D is likelly still going to be faster, so no real benefit. I may be wrong on that, of course, so we need to wait for the product to become a thing and benchmarks to come out.

CFD and HPC I don't even think are really significative because, as I said above, those are better suited to platforms with way more channels.
AnotherReaderThe vast majority of games don't need more than 8 cores so the issue of inter core latency is expected to be inconsequential.
I agree with that, but then it means that having the 2 CCDs with the extra cache is kinda pointless to begin with.
Posted on Reply
#70
Dr. Dro
RogueSixIt is quite the contrary to what you say. Scheduling will now be even more important to the point it becomes SUPER-DUPER-MEGA-EXTRA-important with cache on both CCDs. For this dual cache setup to work correctly, games/apps (via the scheduler) always need to request the cached data from the "correct" cache on the "correct" CCD or else you will suffer latencies from hell if/when data needs to be fetched from the cache across the CCDs because e.g. Core 3 requests data that was previously stored to the cache by Core 14 on the other CCD. Can't have a scenario like that. Ever.

So, both the scheduler and the CPU always need to "know" exactly "who" (which core) cached something (what) and where it was cached to avoid the dreaded inter-CCD and inter-cache latencies. This is definitely going to be a challenge and very complex on the level of correct scheduling and correct CCD assignment etc.

AMD does not exactly have the best track record when it comes to these scheduling and core assignment shenanigans so I would be quite surprised if they get this to work flawlessly out of the gate.
Personally, I have avoided multi CCD CPUs like the plague due to the Xbox GameBar and 'GameMode On' requirements (I have a PC and not a console, you muppets). It will be interesting to see if the GameBar requirement will be dropped now(?) since core parking will no longer be required.

We'll have to wait and see how well this is gonna work in practice. I would expect some growing pains, to say the least...
Nothing that doesn't already occur with every regular Ryzen 9 since the 3950X! This approach is MUCH, MUCH preferred over the hybrid garbage. It's really tempting me! Although my money would be better spent on a RTX 5090...
Posted on Reply
#71
DeathtoGnomes
Chrispy_I suspect the 7800X3D's gaming performance crown is about to be blown away by the 9950X3D. We can hope, right?
YOU TAKE THAT BACK! I want my 7800X3D to wear the crown one more generation. :D
Posted on Reply
#72
wNotyarD
DeathtoGnomesYOU TAKE THAT BACK! I want my 7800X3D to wear the crown one more generation. :D
Huh... Won't at least the 9800X3D take the crown anyway?
Posted on Reply
#73
TomWeng
Finally 9950x3d is the most powerful gaming processor in the world.
Posted on Reply
#74
freeagent
I bet they will tune it so it kicks axe.

X3D on dual CCD probably wasnt the best of ideas on previous models, though none of us really know that for certain.. I am willing to bet they will make it make a difference.

Gaming on Ryzen 9 is not nearly as bad as some make it out to be lol..
Posted on Reply
#75
Dr. Dro
wNotyarDHuh... Won't at least the 9800X3D take the crown anyway?
Yes
freeagentGaming on Ryzen 9 is not nearly as bad as some make it out to be lol..
Doesn't make sense from a cost perspective, even with the prices as of late. It's primarily the 7900X3D, and to a lesser extent the other 6+6 models of current and prior generations.
Posted on Reply
Add your own comment
Oct 6th, 2024 12:02 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts