• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

I am going to wait to see Ryzen 5 in action. We have not concrete information about those chips and how they OC. Overclocking 8 cores is a different animal than overclocking 4 cores...historically at least. And I don't think the limit in OC is entirely the architecture, but we will find out.

I assume the arch is fine. It's the LPP process that wasn't intended for such clocks.
 
Is this some beta d/l as my AIDA is not offering an update from 5.08.40 at the moment? Thanks!

And "2 x Octal core" doesn't seem right. When a 4790k says "quadcore" so maybe it needs some more fixing. :)
 
Last edited:
That's a bit cheeky to not put it out for Extreme!

Thanks Dave.
 
The issue is I'm perfectly up for beta testing (and do some in other areas too), maybe they should add an opt-in for that.

In other news, my Asrock X370 (pro gaming) ships today.
 
Question, I've read here and in other places that part of the CCX bus congestion issue for games is that PCIe data is also shoved over the CCX bus.

Has anyone done any tests to see if the issue is greater for GPU's on the chipset PCIe lanes vs GPU's on the CPU embedded PCIe lanes?

(EDIT: Fix CPU lanes with PCIe lanes)
 
Last edited:
Question, I've read here and in other places that part of the CCX bus congestion issue for games is that PCIe data is also shoved over the CCX bus.

Has anyone done any tests to see if the issue is greater for GPU's on the chipset PCIe lanes vs GPU's on the CPU embedded CPU lanes?

That would be a cool thing to test!
 
this is windows load balancing working like it id on nehalems and first gen skylakes

basicly windows treats ryzen as a massive 16 core cpu instead of 8c 16t

The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.
 
The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.

I think NUMA would require a separate memory controller for each CCX, which is shared between ccx's on ryzen. But yeah, somewhat of an hybrid thing would be the real deal. For now lets hope that 4000MHz memory support gets there...
 
Question, I've read here and in other places that part of the CCX bus congestion issue for games is that PCIe data is also shoved over the CCX bus.

Has anyone done any tests to see if the issue is greater for GPU's on the chipset PCIe lanes vs GPU's on the CPU embedded PCIe lanes?

(EDIT: Fix CPU lanes with PCIe lanes)

GPUs can only use the lanes on the Ryzen CPUs, they don't connect to the Southbridge. So 16x or 8x/8x, off the CPU.

The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.

I'm wondering if the higher speed of copy operations on the L3 was specifically tweaked to speed up copies between the two L3s, allowing both CCXs to work from the same data after copying things over, if that would even help... but looks like the new version of AIDA makes this whole CCX intercommunication "bug" a non-issue.

Naples has a ton of PCIe lanes connecting two sockets together on dual socket configs. Somewhere at AMD there must have been people who worked on intercommunication between the 2 CCXs. I don't buy the theory that AMD simply dropped the ball and put out a chip with a glaring architectural flaw. If there are limitations of Ryzen I expect to find compromises that were made after intense discussion. Although they don't have a foundry, they do have the ability to do limited production in house for testing and research purposes. It really feels like people are way underestimating AMD and the quality of their product.
 
Last edited by a moderator:
I wonder if that is accurate (about the PCIe lanes). I'm in conversation on IRC with several people using pass-trough (for virtualization) and they are explicitly speaking about the issues they have between GPU's on the CPU based bus and ones on a chipset hosted PCIe slot. Seems some boards have crappy IOMMU groupings causing weirdness with GPUs.
 
I wonder if that is accurate (about the PCIe lanes). I'm in conversation on IRC with several people using pass-trough (for virtualization) and they are explicitly speaking about the issues they have between GPU's on the CPU based bus and ones on a chipset hosted PCIe slot. Seems some boards have crappy IOMMU groupings causing weirdness with GPUs.

edit: Sorry, I didn't read your post carefully enough. I'll leave the pic up though, maybe someone will find it useful. But, yea, I have no idea what those guys on IRC are talking about. Aren't they mistaken in thinking that one of their GPUs is running off the chipset?

14878984098.gif


Taken from
https://rog.asus.com/articles/techn...platform-and-its-x370-b350-and-a320-chipsets/
 
AMD will tighten up this L3 Latence. It will get better and better.
 
edit: Sorry, I didn't read your post carefully enough. I'll leave the pic up though, maybe someone will find it useful. But, yea, I have no idea what those guys on IRC are talking about. Aren't they mistaken in thinking that one of their GPUs is running off the chipset?

No, they are not mistaken, you could for example have 3 GPUs in there.

2 from the CPU and one from the chipset (with the associated latency).

In fact what a lot of the folks using VM want to do is have all 3 cards in separate I/O groups so you can e.g. have one card for your host O/S and the others each dedicated to a VM.
If the groups/UEFI are right, you could have a slower card off the chipset and have that as the host OSes' card (boot graphics) and then two powerful cards connected to the VMs or whatever.
 
The day there is 4GHz ram, 4GHz chip and a nice high capacity (64GB sounds nice) I will be throwing cash at AMD.
 
The day there is 4GHz ram, 4GHz chip and a nice high capacity (64GB sounds nice) I will be throwing cash at AMD.
Seeing how Ram Speed makes a huge performance difference in Ryzen, yes Agreed.
 
The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.

What an interesting suggestion.

Your paradigm of splitting, for coding purposes, the 8 cores into discrete 4 core ccxS & 8MB L3 cache blocks. & then minimising interaction between them, could speed some apps considerably.

I am a newb~, but i mused similarly in the context of a poor mans vega pro ssg (a 16GB $5000+ Vega w/ an onboard 4x 960 pro raid array).

if you install an Affordable 8 lane vega and an 8 lane 2x nvme adapter, so both link to the same 16 lane ccx (as a 16 lane card does e.g.) , then the gpu and the 2x nvme raid array may be able to talk very directly, and ~share the same 8MB cpu L3 cache. It doesnt bypass the shared pcie bus like Vega SSG, but it could be minimal latency, and enhanced by specialised large block size formatting for; swapping, workspace, temp files and graphics.

Vega 56/64 of course, have a dedicated HBCC subsystem for such gpu cache extension using nvme arrays. Done right, it promises a pretty good illusion of ~unlimited gpu memory/address space. Cool indeed.

As you see, a belated post from me. We now have evidence in the perf figures of single ccx zen/vega apuS. Yes, inter ccx interconnects have dragged Ryzen ~IPC down.
 
The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.

Devs probably won't have a choice. It's only a matter of time before intel announces their copy of Ryzen.
 
Back
Top