• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Could Solve Memory Bottlenecks of its MCM CPUs by Disintegrating the Northbridge

Wouldnt it be much better to just make the memory controller modular? just thinking out loud.

Im just saying this because im not sure if more then one memory controller is beneficial at all when you have a multi cpu setup...

I know... its a bit out of the box but yeah
 
Last edited:
Wouldnt it be much better to just make the memory controller modular? just thinking out loud.

Im just saying this because im not sure if more then one memory controller is beneficial at all when you have a multi cpu setup...
What do you mean by "modular"?
For reference, checkout the last paragraph here for on overview of the current implementation: https://en.wikichip.org/wiki/amd/infinity_fabric
 
What do you mean by "modular"?
For reference, checkout the last paragraph here for on overview of the current implementation: https://en.wikichip.org/wiki/amd/infinity_fabric

Ah... so memory controllers stack their perfomance/bandwidth.
Well... thought it may be a better idea to just combine the memory controllers into one big die. One you can upgrade, the same way as you can with cpu`s.
 
Ah... so memory controllers stack their perfomance/bandwidth.
Well... thought it may be a better idea to just combine the memory controllers into one big die. One you can upgrade, the same way as you can with cpu`s.
Once you run the traces out to the board / another socket, the latency goes through the roof.
 
Ah... so memory controllers stack their perfomance/bandwidth.
Well... thought it may be a better idea to just combine the memory controllers into one big die. One you can upgrade, the same way as you can with cpu`s.
Well, you're on to something. That's how things worked before Athlon64 and Core: the memory controller was in the so called northbridge - a standalone chip sitting on the motherboard. While obviously a more flexible design, it turns out it doesn't cut it anymore in modern systems.

Btw, welcome to TPU ;)
 
Once you run the traces out to the board / another socket, the latency goes through the roof.

Well, you're on to something. That's how things worked before Athlon64 and Core: the memory controller was in the so called northbridge - a standalone chip sitting on the motherboard. While obviously a more flexible design, it turns out it doesn't cut it anymore in modern systems.

Btw, welcome to TPU ;)



yeah i know... but wouldnt it be much easier to reserve PCI-e lanes this way.
Im not saying that this is the solution. its just that with thinking out of the box one might find new ways to improve their product.

and thanks bug
 
For everyone here saying this is bad solution or that it will create more problems here is some educational material for you : I mean do you think the peoples working on these designs are ignorants or something ? Obviously this new design will resolve many problems !
 
Last edited:
Imho, this type of connectivity between CCXs is only meant for the next EPYC and Threadripper. And for this type of usage it is excellent and ingenious indeed. For Desktop Ryzens my opinion is that they will just improve the already existing connectivity. It is more than enough. And with 8C/16T CCX, most Ryzens will have just one CCX which means no added latency from the IF.
 
Imho, this type of connectivity between CCXs is only meant for the next EPYC and Threadripper. And for this type of usage it is excellent and ingenious indeed. For Desktop Ryzens my opinion is that they will just improve the already existing connectivity. It is more than enough. And with 8C/16T CCX, most Ryzens will have just one CCX which means no added latency from the IF.
Ideally, AMD will want a design that scales across product lines. Otherwise they have to keep redesigning the CCX. But there's no telling which solution they'll choose.
 
Ideally, AMD will want a design that scales across product lines. Otherwise they have to keep redesigning the CCX. But there's no telling which solution they'll choose.
Since EPYC and TR is already made seperately than desktop Ryzen CPUs and they are making money from that, it is very viable to contintue doing that, especially when they will raise the game by adding many more cores and decreasing latency for the market sections those are needed most.
 
Since EPYC and TR is already made seperately than desktop Ryzen CPUs and they are making money from that, it is very viable to contintue doing that, especially when they will raise the game by adding many more cores and decreasing latency for the market sections those are needed most.
What do you mean "separately"? Aren't they all just the same CCXs in different layouts?
 
Hmmmmm maybe this gives some MERIT to the HARDOCP foum post which details that zen2 has some "newish" IF implementation...
 
Hmmmmm maybe this gives some MERIT to the HARDOCP foum post which details that zen2 has some "newish" IF implementation...
Believe it or not, IF is the Achille's heel for Zen. It was bound to be reworked in future incarnations.
 
Fabric solutions always create more problems than they solve once it becomes this complex, the ring bus approach may be simpler and offer more throughput and lower latentcy if they can get it wide or fast enough.

AMD brought most of this on themselves, technical issues with ZEN, bulldozer, and other designs and latency to cache and memory has never truly been solved for years and "add more cores" has always been the solution. They need to build a memory controller for a 8 core that can be expanded to these insane core and thread counts, where a little latency added to a server workload with custom aware of penalties software handling the threads can mask it.
 
But how does this effect minimum latency? Right now with the current approach there is a somewhat wide delta between min and max latency depending on which core is communicating with what. When an app is running locally on a ccx the latency is excellent, when both ccx's are needed then the latency slightly increases, and lastly when needing to connect to other chips on the module for one workload then latency maxes out. This central north bridge might lower that max latency and make the gap between min and max much smaller, however from a high level one can expect min latency to take a big hit and increase drastically.
I don't think core-to-core communication between threads is the problem, but rather memory and cache accesses. The impact is greater than just taking the extra jump through the other CCX, it also "borrows" memory bandwidth from that CCX, which can lead to additional bottlenecks.

Most applications are very sensitive to memory latency, so redesigning this approach in future Zen iterations seems like a very good idea. Keeping cache and memory controllers as efficient and low latency as possible is one of the keys to increasing IPC.
 
What do you mean "separately"? Aren't they all just the same CCXs in different layouts?
By separately I mean they have different packaging and layout maybe. And that's exactly the difference between the supposed new layout of the upcoming EPYC and a normal Ryzen if that stays the same in its layout aspect.
 
By separately I mean they have different packaging and layout maybe. And that's exactly the difference between the supposed new layout of the upcoming EPYC and a normal Ryzen if that stays the same in its layout aspect.
But you were suggesting different IF implementations between Ryzen and Epyc. That would not mean simply a different layout, but also different CCXs. Which, as I said, would add to the costs. Unless I misunderstood something.
 
The cache needs to be low latency, therefore it has to be on the same die.



It's going to be less actually, on average.



If the communication between the cores is hampered as you say , how would that affect the single thread performance ? It's the exact opposite of what you are describing, leaving only the cores and cache on each die would allow for higher clocks and therfore higher single thread performance and higher performance in general.

There are 2 differents situations, first inter-core communications with cores in different dies will require a third die in between to communicate. Second, single threaded performance would be lower because the memory controller won't be on-die, that is why AMD implemented the new Dynamic Local Mode.
 
Separated I/O for Zen 2 has been in the leaks for at least a month already...
Even in may there were already rumours of a similar idea being passed around at intel, however as we all know intel is very far behind on the whole MCM architecture and as such it will be at least a year before any of their offerings are even doing the rounds being sampled before their retail release.

This is the way forward for the high end CPU market and anyone who says it isn't is just impossibly deluded...

I hope to see AMD continue their competitive streak in the high end, they have set high targets but I am pretty sure they will be achieved. I also hope they put a bit more time into refining the 8 core and lower chips to be more competitive on the gaming side.
 
Second, single threaded performance would be lower because the memory controller won't be on-die

Just a blanket statement, nobody has a clue if that's going to have any impact whatsoever. Chances are it wont, if the leaks are true.
 
So you've passed judgement on an undisclosed (publicly) design based on some of your assumptions & TR2, what about waiting for evidence or results?

I agree, but that applies both sides of the way.
 
Back
Top