Tuesday, September 7th 2021

"Zen 3" Chiplet Uses a Ringbus, AMD May Need to Transition to Mesh for Core-Count Growth

AMD's "Zen 3" CCD, or compute complex die, the physical building-block of both its client- and enterprise processors, possibly has a core count limitation owing to the way the various on-die bandwidth-heavy components are interconnected, says an AnandTech report. This cites what is possibly the first insights AMD provided on the CCD's switching fabric, which confirms the presence of a Ring Bus topology. More specifically, the "Zen 3" CCD uses a bi-directional Ring Bus to connect the eight CPU cores with the 32 MB of shared L3 cache, and other key components of the CCD, such as the IFOP interface that lets the CCD talk to the I/O die (IOD).

Imagine a literal bus driving around a city block, picking up and dropping off people between four buildings. The "bus" here resembles a strobe, the buildings resemble components (cores, uncore, etc.,) while the the bus-stops are ring-stops. Each component has its ring-stops. To disable components (eg: in product-stack segmentation), SKU designers simply disable ring-stops, making the component inaccessible. A bi-directional Ring Bus would see two "vehicles" driving in opposite directions around the city block. The Ring Bus topology comes with limitations of scale, mainly resulting from the latency added from too many ring-stops. This is precisely why coaxial ring-topology faded out in networking.
Intel realized in the early 2010s that it could not scale up CPU core counts on its monolithic processor dies beyond a point using Ring Bus, and had to innovate the Mesh Topology. The Mesh is a more advanced ringbus but with additional points of connectivity between components, making halfway between a Ring Bus and full-interconnectivity (in which each component is directly interconnected with the other, an impractical solution at scale). AMD's recipe for extreme core-count processors, such as the 64-core EPYC, is in using 8-core CCDs (each with an internal bi-directional Ring Bus), that are networked at the sIOD.

It's interesting to note here, that AMD didn't always use a Ring Bus on its CCDs. Older "Zen 2" chiplets with 4-core CCX (CPU complex) used full interconnectivity between four components (i.e. four CPU cores and their slices of the shared L3 cache). This was illustrated more looking at the slide, where AMD mentioned "same latency" for a core to access every other L3 slice (which wouldn't quite be possible even with a bi-directional Ring Bus). This begins to explain AMD's rationale behind the 4-core CCX. Eventually the performance benefit of a monolithic 8-core CCX interconnected with a bi-directional Ring Bus won out, so AMD went with this approach for "Zen 3."

For the future, AMD might need to let go of Ring Bus to scale beyond a certain number of CPU cores per CCD, AnandTech postulates. This is for the same reason Intel ditched Ring Bus for high core-count processors—latency. The CCD of the future could be made up of three distinct dies stacked up: the topmost die could be made up of cache, the middle die of the CPU cores, and the bottom die of a Mesh Interconnect. The next logical step would be to scale this interconnect layer into a silicon interposer with several CPU+cache dies stacked on top.
Source: AnandTech
Add your own comment

26 Comments on "Zen 3" Chiplet Uses a Ringbus, AMD May Need to Transition to Mesh for Core-Count Growth

#1
Fouquin
"Intel realized in the early 2010s that it could not scale up CPU core counts on its monolithic processor dies beyond a point using Ring Bus"

But then they did, anyway.

Yes, ring busses stop scaling very quickly beyond a few cores. Intel's mesh has one ugly side affect, and that is that cores get "stranded" in no mans land much too far away from the memory landings. The mesh has to be insanely fast to compensate, and that means a metric ton of overhead to power, and increased latency. (Look at how Skylake's HCC even with 10 cores disabled overpowers the LCC with only two cores disabled, both running at the same TB3.0 frequencies.)
Posted on Reply
#2
eidairaman1
The Exiled Airman
Fouquin"Intel realized in the early 2010s that it could not scale up CPU core counts on its monolithic processor dies beyond a point using Ring Bus"

But then they did, anyway.

Yes, ring busses stop scaling very quickly beyond a few cores. Intel's mesh has one ugly side affect, and that is that cores get "stranded" in no mans land much too far away from the memory landings. The mesh has to be insanely fast to compensate, and that means a metric ton of overhead to power, and increased latency. (Look at how Skylake's HCC even with 10 cores disabled overpowers the LCC with only two cores disabled, both running at the same TB3.0 frequencies.)
I recall a Radeon gpu using ring bus topology, perhaps they will do a hybrid...
Posted on Reply
#3
lynx29
This is easy to fix. 6 or 12 cores is the only amount games need, it's gpu's that slow us down in games at this point.

so focus on increased clocks for those 6 to 12 cores and stop acting like a ***** about more cores.
Posted on Reply
#4
Darksider92
lynx29This is easy to fix. 6 or 12 cores is the only amount games need, it's gpu's that slow us down in games at this point.

so focus on increased clocks for those 6 to 12 cores and stop acting like a ***** about more cores.
Do you really think that AMD will only focus on making gaming centric CPUs?? That's super dumb tbh
Their Epyc/threadripper lineup with be effected if they can't scale up the core count. Which is one of their strong points against Intel....
Watch Dr Ian's video maybe you will get a better idea.
Posted on Reply
#5
eidairaman1
The Exiled Airman
Darksider92Do you really think that AMD will only focus on making gaming centric CPUs?? That's super dumb tbh
Their Epyc/threadripper lineup with be effected if they can't scale up the core count. Which is one of their strong points against Intel....
Watch Dr Ian's video maybe you will get a better idea.
Yup bread and butter is in Servers/Workstations.
Posted on Reply
#6
lynx29
Darksider92Do you really think that AMD will only focus on making gaming centric CPUs?? That's super dumb tbh
Their Epyc/threadripper lineup with be effected if they can't scale up the core count. Which is one of their strong points against Intel....
Watch Dr Ian's video maybe you will get a better idea.
these things are separate now. why can't they remain separate? AMD has plenty of money and resources now.
Posted on Reply
#8
Darksider92
lynx29these things are separate now. why can't they remain separate? AMD has plenty of money and resources now.
Nope, their entire Zen micro architecture is built around manufacturing efficiency. Their Epyc cores are the same as their desktop ones (to a point) making it easy to scale and develop. Yes their desktop CPUs have a slightly different used cases but they are still counting on the efficiency and core count advantage.
Posted on Reply
#9
lynx29
Darksider92Nope, their entire Zen micro architecture is built around manufacturing efficiency. Their Epyc cores are the same as their desktop ones (to a point) making it easy to scale and develop. Yes their desktop CPUs have a slightly different used cases but they are still counting on the efficiency and core count advantage.
hmm, I guess I don't understand then, all 3 CPU's you just mentioned are still different sockets, so the factories still have to change their process for each one. but you are correct, i think that is just out of my conceptual field of understanding.

meh. its all good. i just hope i get a next gen cpu and gpu Fall of 2022. :D then i am retiring for prob 10 years. maybe more.
Posted on Reply
#10
Darksider92
lynx29hmm, I guess I don't understand then, all 3 CPU's you just mentioned are still different sockets, so the factories still have to change their process for each one. but you are correct, i think that is just out of my conceptual field of understanding.

meh. its all good. i just hope i get a next gen cpu and gpu Fall of 2022. :D then i am retiring for prob 10 years. maybe more.
Different sockets yes but still the same design and the same foundry. They are all Zen cores. Literally the same from top to bottom. Just different configurations. So trust me core count does matter for AMD. And i am happy with it because I was sick and tired of getting the same quad core from Intel for 7 generations. I also still have my 2600K on my other rig as a reminder of what we have been thru. And how innovation stalled in the lack of competition.
Posted on Reply
#11
dyonoctis
lynx29these things are separate now. why can't they remain separate? AMD has plenty of money and resources now.
Separate? The only difference between ryzen and epyc is the i/o die and memory controller, otherwise it's the same core...
Nvidia and Intel are both bigger than AMD by a fair margin, and they both understood that ML, A.I, data center is where you can make bank. AMD ressources needs to be there, the competition is fierce, and their pockets are still smaller than the competition.

Don't forget that we got DLSS because data center technology found it's way in a consumer product. Intel is taking the same approach.

You can compare this to how motor sport technology ends up making better conssumer cars. Gaming doesn't have to exist in a bubble to get better shared development can be good.
Posted on Reply
#12
DeathtoGnomes
So AMD only needs this 'mesh' (or glue if you ask Intel) for Epyc/Tweedripper to advance, desktops are fine with 16 cores or less.
Posted on Reply
#14
Vayra86
lynx29these things are separate now. why can't they remain separate? AMD has plenty of money and resources now.
For gaming all they need is control over what cores to use (the lowest latency ones), and then its down to binning the high frequency chips as overpriced gamur chips.

Its an illusion Intel or AMD will ever design a gaming chip, why would they, CPUs slaughter gaming loads in realtime already. Any half decent midrange CPU saturates any GPU on the market right now. Let alone the fact there is no market even without that fact. CPU is general purpose for MSDT.
Posted on Reply
#15
Fouquin
eidairaman1I recall a Radeon gpu using ring bus topology, perhaps they will do a hybrid...
Terascale used a ring bus PHY back in 2007. It's a cheap and easy way to connect high speed memory to data hungry execution blocks.
Posted on Reply
#16
Haile Selassie
Ring bus is pretty common in GPU memory bus design. Most cards have two ring buses running in opposite direction (one read, the other write).
Posted on Reply
#17
Crackong
Isn't the Zen4 genoa rumored to pack 50% more cores just by adding 50% more chiplets ?

Unless the software side really demands "More than 8 cores" + "ultra low latency between cores" so desperately, otherwise AMD won't have to redesign >8 cores CCX in the near future....
Posted on Reply
#18
Wirko
CrackongIsn't the Zen4 genoa rumored to pack 50% more cores just by adding 50% more chiplets ?

Unless the software side really demands "More than 8 cores" + "ultra low latency between cores" so desperately, otherwise AMD won't have to redesign >8 cores CCX in the near future....
Lower latency matters a lot or not at all, depending on the (server or HPC) application.

Apart from the latency issue, AMD probably has things like 144-core CPU in its plans, which can be achieved with as 18 chiplets x 8 cores, or 12x12, or 9x16. Imagine the monster interconnect, of any topology, on the I/O die, that needs to efficiently connect 18 chiplets to each other, as well as to memory, PCIe and other stuff. In that regard, 12 or 9 certainly is better than 18.
Posted on Reply
#19
Chrispy_
This article from Ian Cutress is just speculation and conjecture; AMD will do whatever is best for AMD and regardless of the compromises involved we will buy it as long as it's better than Intel's equivalent.

Scaling core connectivity is and always has been a compromise, not sure why Ian Cutress dragged this out today, especially since I had to double take that this was a current article and not one from Zen3's launch last year.
Posted on Reply
#20
Tardian
thesmokingmanMust be a slow news day for Cutress.
Oxford, not rogues. AnandTech is the other site where one might find technical information that can usually be relied upon.:cool:
WirkoLower latency matters a lot or not at all, depending on the (server or HPC) application.

Apart from the latency issue, AMD probably has things like 144-core CPU in its plans, which can be achieved with as 18 chiplets x 8 cores, or 12x12, or 9x16. Imagine the monster interconnect, of any topology, on the I/O die, that needs to efficiently connect 18 chiplets to each other, as well as to memory, PCIe and other stuff. In that regard, 12 or 9 certainly is better than 18.
Imagine the defect rate.
Posted on Reply
#21
TheoneandonlyMrK
eidairaman1I recall a Radeon gpu using ring bus topology, perhaps they will do a hybrid...
Exactly this , it is not a ring, testing proves it, it's already a hybrid ring with rings in ring's, not A Ring so not a ring buss.
Posted on Reply
#22
Makaveli
lynx29This is easy to fix. 6 or 12 cores is the only amount games need, it's gpu's that slow us down in games at this point.

so focus on increased clocks for those 6 to 12 cores and stop acting like a ***** about more cores.
Some people do more than play games on their PC's. Gaming is but a small part of the industry and the enterprise market is what drives core counts up, and we are basically using the hand me downs and left over parts in the consumer space.
Posted on Reply
#23
mechtech
Chrispy_This article from Ian Cutress is just speculation and conjecture; AMD will do whatever is best for AMD and regardless of the compromises involved we will buy it as long as it's better than Intel's equivalent.

Scaling core connectivity is and always has been a compromise, not sure why Ian Cutress dragged this out today, especially since I had to double take that this was a current article and not one from Zen3's launch last year.
Indeed and Agreed
Posted on Reply
#25
thesmokingman
Chrispy_This article from Ian Cutress is just speculation and conjecture; AMD will do whatever is best for AMD and regardless of the compromises involved we will buy it as long as it's better than Intel's equivalent.

Scaling core connectivity is and always has been a compromise, not sure why Ian Cutress dragged this out today, especially since I had to double take that this was a current article and not one from Zen3's launch last year.
Adored covered this topic 3 years ago too. Anandtech tries to put on airs sometimes. Just because you interview an engineer doesn't make you an engineer, smh. I'm pretty sure as you suggested, AMD knows wtf they are doing... hell they're the pioneers of chiplets lmao.
Posted on Reply
Add your own comment