• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Ryzen 3000 memory controller has "half the performance" on single CCD CPUs

TheLostSwede

News Editor
Joined
Nov 11, 2004
Messages
18,900 (2.50/day)
Location
Sweden
System Name Overlord Mk MLI
Processor AMD Ryzen 7 7800X3D
Motherboard Gigabyte X670E Aorus Master
Cooling Noctua NH-D15 SE with offsets
Memory 32GB Team T-Create Expert DDR5 6000 MHz @ CL30-34-34-68
Video Card(s) Gainward GeForce RTX 4080 Phantom GS
Storage 1TB Solidigm P44 Pro, 2 TB Corsair MP600 Pro, 2TB Kingston KC3000
Display(s) Acer XV272K LVbmiipruzx 4K@160Hz
Case Fractal Design Torrent Compact
Audio Device(s) Corsair Virtuoso SE
Power Supply be quiet! Pure Power 12 M 850 W
Mouse Logitech G502 Lightspeed
Keyboard Corsair K70 Max
Software Windows 10 Pro
Benchmark Scores https://valid.x86.fr/yfsd9w
This is an interesting observation that I had missed until now.
It would appear AMD has "cheaped out" on their memory controller a bit and it only has "half the performance" on CPUs with only one CCD in them, during write operations.
Seemingly it has little affect in most applications, but if you're doing something that does a lot of intensive memory writes, you might want to consider getting a dual CCD CPU.
It does seem to have a small affect on the memory latency though.
Just a heads up, as it was not something that was particularly clear from AMD's side.

126375


Source: https://www.guru3d.com/articles_pages/amd_ryzen_7_3700x_ryzen_9_3900x_review,21.html
 
Last edited:
It would appear AMD has "cheaped out" on their memory controller a bit and it only runs at "half speed" )16 vs 32-bit) on CPUs with only one CCD in them
It's 64Bytes/Cycle when reading from memory but drops down to 32Bytes/Cycle for writes. A review somewhere mentioned it.
 
Last edited:
I guess the source link got it slightly wrong, but yeah, still half speed for writes on single CCD CPUs.
 
32 bytes * 1666 megahertz = 57.6 GBps

As long as each arrow is not a sum of 16B/cycle in each direction, a single CCD should have enough bandwidth to handle dual channel throughput.

My educated guess is that for some reason in this test the IF divider got set to 1:2.
 
DRAM:FSB ratio on AIDA screenshots is 54:3 for both, should that reflect the divider?
If they switch different CPUs around leaving divider could happen by mistake, I suppose.
 
I guess the source link got it slightly wrong, but yeah, still half speed for writes on single CCD CPUs.
The reason for the 32B/cycle writes was explained as they happen less often than reading from memory.

Anyone up for a deep dive into the Zen 2 core?
 
The reason for the 32B/cycle writes was explained as they happen less often than reading from memory.

Anyone up for a deep dive into the Zen 2 core?
Which sort of makes sense, but it also makes the single CCD CPUs look "bad" in some synthetic benchmarks. It doesn't seem to make much of a real world difference though and even the latency difference seems to be a mostly moot issue, if there even is a difference, since I've seen other tests that shows none.
 
latency difference seems to be a mostly moot issue, if there even is a difference, since I've seen other tests that shows none.
The Zen2 L3 Cache Latency is up compared to Zen+, and memory latency is a touch higher.
lat3900log.png
lat2700log.png

Anandtech said:
In terms of the DRAM latency, it seems that the new Ryzen 3900X has regressed by around 10ns when compared to the 2700X (Note: Just take into the leading edge of the “Structural Estimate” figures as the better estimate) with ~74-75.5ns versus ~65.7ns.

It also looks like Zen2’s L3 cache has also gained a few cycles: A change from ~7.5ns at 4.3GHz to ~8.1ns at 4.6GHz would mean a regression from ~32 cycles to ~37 cycles.
 
L1 and L2 are pretty much even. L3 Cache latency is slightly up but there is also twice as much L3 Cache. Memory latency is simply an inevitable tradeoff of chiplet design.
Overall it is still a noticeable improvement.
 
What is a CCD CPU?
 
This is an interesting observation that I had missed until now.
It would appear AMD has "cheaped out" on their memory controller a bit and it only runs at "half speed" on CPUs with only one CCD in them, during write operations.
Seemingly it has little affect in most applications, but if you're doing something that does a lot of intensive memory writes, you might want to consider getting a dual CCD CPU.
It does seem to have a small affect on the memory latency though.
Just a heads up, as it was not something that was particularly clear from AMD's side.

View attachment 126375

Source: https://www.guru3d.com/articles_pages/amd_ryzen_7_3700x_ryzen_9_3900x_review,21.html
Halfs performance, not speed , they run at the same speed but with one less ccd doing 16b writes that's half the writes and reads coincidentally since that one ccd can't read as much as two.
 
Halfs performance, not speed , they run at the same speed but with one less ccd doing 16b writes that's half the writes and reads coincidentally since that one ccd can't read as much as two.
Right, yes. Edited the titles to make that more obvious.
 
We caught that in testing and was in the review.. I don't think it matters much, however.

Hearing "CCD" brings me back to Catholic school/baptism/confirmation days... lol
 
We caught that in testing and was in the review.. I don't think it matters much, however.

Hearing "CCD" brings me back to Catholic school/baptism/confirmation days... lol

It doesn't seem to matter much in 99% of applications, that's for sure, at least judging by the benchmarks. It was just one of those things I really hadn't realised they'd done.
Admittedly it's right there in their presentations (if you compare 1x CCD vs 2x CCD CPUs), but they obviously didn't highlight it, for reasons.
It just something worth getting out there for those 1% scenarios that people might run and they might be surprised why the performance suffer.
 
Can the IO Die be overclocked?
 
This is basically like an old "dual-cpu + northbridge" design inside a small box, with bus speeds updated to modern day.

I remember an old Dual Pentium III system that I used many years ago, it suffered from exactly same thing: Removing one CPU would half memory bandwidth, even if the memory was still connected to the same northbridge. The "FSB" from one single CPU simply couldn't keep up.
Old ....... New
Pentium "Core" = CCD
Intel FSB = AMD IF
I/O die = Northbridge (memory controller) + half of Southbridge

Tbh, as long as application performance is fine, this is a non-issue.
 
Can the IO Die be overclocked?

Doubt it, everything on a chip that is outside the core/cache/MC usually runs at vastly different speeds and requires tight timing so that it can communicate with other chips and buses.
 
It's a bit complicated... lol

I own a site and work for another, much larger site (YHPM).
Now now, no need to be so shy, I'm sure there won't be any hard feelings if you share with everyone...
 
Hexus.net has similar explanation to Anandtech and Overclockers.uk:
AMD says that this is a calculated design choice for Zen 2, due to most client workloads not writing as much. Halving the data link write speed between CCD and cIOD saves area, improves power, and has ancillary knock-on benefits, too. The downside is half-write speed because of the slowness of the data fabric in that direction.

I'm more curious about the max bandwidth of IF at 1800MHz or 1900MHz (for capable silicon) and what this means for all data (especially PCIE) that needs to go through the IF. I'm sure AMD has calculated it to be sufficient for all or most needs, but does anyone know any numbers? Thanks.
 
This is the exact discussion I've been searching desperately for for the past few days... I'm a VFX artist by day, indie filmmaker by night who needs to build a new PC for a personal project-- I've already bought all parts except CPU and need to complete the build ASAP, but I'm really worried about this whole "half the memory for writes" thing on the new Ryzen's, because I suspect CG animation filmmaking workflows will be a 1% sort of thing vs. who these processors seem to be built for (gamers, etc.).

I animate in 3D (Maya) 2D (After Effects), and edit in Premiere (though I may switch to Resolve soon). A huge part of my workflow is being able to play back previews of animation in real time (the less dropped frames, the better). I'm not terribly tech savvy and only build a workstation every 5-7 years, so it's hard for me to discern what involves writing to RAM vs. reading from it, but it seems to me that something we call "RAM previews" in AE would be writing to RAM, and I do that a lot, and really need it to be fast. Render/export speed is less important to me, as I do that far less often and can let my computer render while I'm doing other things. You guys seem to really understand a lot about this issue (I've had trouble finding people that do), so I'd appreciate it very much if you could tell me whether you think my workflow would improve or suffer if I went with, say, a 3600x vs. a 2600x, etc.? Thanks!
 
Back
Top