AMD Granite Ridge "Zen 5" Processor Annotated

Wirko · Oct 8, 2024

thegnome said:
Very interesting, seems like they optimized the die's very well this time. only problem is the aged and slow way they are connected. Why are the two dies so far from eachother when it would seem be faster and more efficient to be close... On TR/Epyc it's acceptable because of the heat and the much more capable IO die.

Density of wires. You can see part of the complexity in the pic that @Tek-Check attached. Here you see one layer but the wires are spread across several layers.

That's also the probable reason why AMD couldn't move the IOD farther to the edge, and CCDs closer to the centre. There are just too many wires for signals running from the IOD to the contacts at the other side of the substrate. The 28 PCIe lanes take four wires each, for example.

Minus Infinity · Oct 8, 2024

btarunr said:
They created the cIOD so it spares them development costs for the uncore for at least 2 generations (worked for Ryzen 3000 and Ryzen 5000).

So, if they stick with AM5 for Zen 6, they might develop a new cIOD. Maybe switch to N5, give it an RDNA 3.5 iGPU, faster memory controllers, and maybe even an NPU.

Strix Halo is getting new cIOD made on 3nm. One would expect Zen 6 to do so too. Zen 6 is about about fixing all the failings with the current design, including high ccd-to-ccd core latency.

JWNoctis · Oct 8, 2024

Minus Infinity said:
Strix Halo is getting new cIOD made on 3nm. One would expect Zen 6 to do so too. Zen 6 is about about fixing all the failings with the current design, including high ccd-to-ccd core latency.

Strix Halo is arguably closer to a respectably capable GPU that happened to have CPU chiplets hanging off the bus, than a regular CPU. Zen 5 CCD-to-CCD latency regression is said to have been restored to Zen 4 level in the latest firmware, though it remains to be seen if Zen 6 would do better, without trade-offs elsewhere.

They could well implement some IBM-like evict-to-other-chiplet virtual-L4-cache scheme, if they could do significantly better than memory latency with that. DRAM latency is only going to get worse.

mkppo · Oct 8, 2024

Minus Infinity said:
Strix Halo is getting new cIOD made on 3nm. One would expect Zen 6 to do so too. Zen 6 is about about fixing all the failings with the current design, including high ccd-to-ccd core latency.

CCD to CCD latency doesn't really matter though and is always something you wish to avoid anyway, but CCD to IOD does matter. The former is already fixed and back to Zen 4 numbers and was a simple power saving thing they turned off.

Zen 6 will get a new IOD which will help the IO bottlenecks.

Igb · Oct 8, 2024

In the image of the article I count 7 cores annotated. Is that… correct?

For me it does not make sense. I don’t think I saw a 7 core sku ever, and even if they eventually release one it will be for sure a 8 core CCD with one disabled.

AusWolf · Oct 8, 2024

Igb said:
In the image of the article I count 7 cores annotated. Is that… correct?

For me it does not make sense. I don’t think I saw a 7 core sku ever, and even if they eventually release one it will be for sure a 8 core CCD with one disabled.

It's 8 cores, with the first core annotated. Here you go:

Wirko · Oct 8, 2024

mkppo said:
CCD to CCD latency doesn't really matter though and is always something you wish to avoid anyway, but CCD to IOD does matter. The former is already fixed and back to Zen 4 numbers and was a simple power saving thing they turned off.

CCD to CCD latency is fixed but still huge. It's really hard to understand where those ~80 ns come from. There must be some very complex switching logic and cache coherency logic on the IOD, or something.
CCD to IOD latency can't be observed directly but CCD to RAM latency seems fine, no issues here (68 ns in AIDA64 on launch day review).

mkppo said:
Zen 6 will get a new IOD which will help the IO bottlenecks.

Hopefully. That should be high on AMD's priority list. Another possibility for improvement would be a direct CCD to CCD connection in addition to the existing ones.

AusWolf · Oct 8, 2024

Wirko said:
CCD to CCD latency is fixed but still huge. It's really hard to understand where those ~80 ns come from. There must be some very complex switching logic and cache coherency logic on the IOD, or something.
CCD to IOD latency can't be observed directly but CCD to RAM latency seems fine, no issues here (68 ns in AIDA64 on launch day review).

I assume that the second interconnect on the CCD is only used on Epyc, but not on Ryzen. So, inter-CCD communication on Ryzen is still done via the IO die.

Wirko · Oct 8, 2024

AusWolf said:
I assume that the second interconnect on the CCD is only used on Epyc, but not on Ryzen. So, inter-CCD communication on Ryzen is still done via the IO die.

There are no direct CCD-to-CCD links in any of the AMD processors.

AusWolf · Oct 8, 2024

Wirko said:
There are no direct CCD-to-CCD links in any of the AMD processors.

So what's this?

As I understand it, there's one "IFOP PHY" responsible for communication with the IO die, and the other one is inactive on Ryzen. Or is it some kind of double-wide communication bus with the Epyc IO die?

Wirko · Oct 8, 2024

Tek-Check said:
EPYC processors with 4 or fewer chiplets use both GMI links (wide GMI) to increase the bandwidth from 36 GB/s to 72 GB/s (page 11 of the file attached). By analogy, that is the case for Ryzen processors too. On the image below, both wide GMI3 links on both chiplets connect to two GMI ports on IOD, two links (wide GMI) from chiplet 1 to GMI3 port 0 and another two links (wide GMI) from chiplet 2 to GMI port 1 on IOD. We can see four clusters of links.

That IFOP or GMI is still a bit of a mystery, with too little documentation available (some is here). May I ask you to do a tek-check of the data I compiled, calculated and listed here below?

A single (not wide) IFOP (GMI) interface in the Zen 4 architecture has:

- a 32-bit wide bus in each direction (single-ended, 1 wire per bit)
- 3000 MHz default clock in Ryzen CPUs (2250 MHz in Epyc CPUs)
- quad data rate transfers, which calculates to...
- 12 GT/s in Ryzen (9 GT/s in Epyc)
- 48 GB/s per direction in Ryzen (36 GB/s in Epyc)

L'Eliminateur · Oct 8, 2024

AusWolf said:
So what's this?

View attachment 366587

As I understand it, there's one "IFOP PHY" responsible for communication with the IO die, and the other one is inactive on Ryzen. Or is it some kind of double-wide communication bus with the Epyc IO die?

indeed it is so, AFAIK client Ryzen does not use wide GMI, only one IFOP per die, i haven't seen any documentation that says they use wide GMI

Tek-Check said:
There are no direct chiplet-to-chiplet interconnects, that is correct. Everything goes through IF/IOD. I should have been more explicit in wording, but replied quickly, on-the-go.

EPYC processors with 4 or fewer chiplets use both GMI links (wide GMI) to increase the bandwidth from 36 GB/s to 72 GB/s (page 11 of the file attached). By analogy, that is the case for Ryzen processors too. On the image below, both wide GMI3 links on both chiplets connect to two GMI ports on IOD, two links (wide GMI) from chiplet 1 to GMI3 port 0 and another two links (wide GMI) from chiplet 2 to GMI port 1 on IOD. We can see four clusters of links.

We do not have a shot of a single chiplet CPU that exposes GMI link, but the principle should be the same, aka IF bandwidth should be 72 GB/s, like on EPYCs with four and fewer chiplets, and not 36 GB/s.

View attachment 366496
View attachment 366495

* from page 11
INTERNAL INFINITY FABRIC INTERFACES connect the I/O die with each CPU die using 36 Gb/s Infinity Fabric links. (This is known internally as the Global Memory Interface [GMI] and is labeled this way in many figures.) In EPYC 9004 and 8004 Series processors with four or fewer CPU dies, two links connect to each CPU die for up to 72 Gb/s of connectivity

i don't think client ryzen uses wide GMI, that's only reserved for special EPYC

Igb · Oct 8, 2024

AusWolf said:
It's 8 cores, with the first core annotated. Here you go:
View attachment 366579

Failed to see that. Should not check this with no coffee. Thanks!

Kapone33 · Oct 8, 2024

Wirko said:
CCD to CCD latency is fixed but still huge. It's really hard to understand where those ~80 ns come from. There must be some very complex switching logic and cache coherency logic on the IOD, or something.
CCD to IOD latency can't be observed directly but CCD to RAM latency seems fine, no issues here (68 ns in AIDA64 on launch day review).

Hopefully. That should be high on AMD's priority list. Another possibility for improvement would be a direct CCD to CCD connection in addition to the existing ones.

How long is 80 nanoseconds? How many nanos are 1 second?

AusWolf · Oct 8, 2024

kapone32 said:
How long is 80 nanoseconds? How many nanos are 1 second?

It's 80 billionth of a second. It's apparently enough time for some folks to make breakfast or something.

Tek-Check · Oct 8, 2024

AusWolf said:
I assume that the second interconnect on the CCD is only used on Epyc, but not on Ryzen. So, inter-CCD communication on Ryzen is still done via the IO die.

AusWolf said:
As I understand it, there's one "IFOP PHY" responsible for communication with the IO die, and the other one is inactive on Ryzen. Or is it some kind of double-wide communication bus with the Epyc IO die?

L'Eliminateur said:
indeed it is so, AFAIK client Ryzen does not use wide GMI, only one IFOP per die, i haven't seen any documentation that says they use wide GMI

L'Eliminateur said:
i don't think client ryzen uses wide GMI, that's only reserved for special EPYC

Let's try to get to the bottom of this by focusing on what we know, what is visible on Ryzen die and what could be inferred.

Zen 5 image	Zen 2 image

- first of all, it looks like that High Yields provided an old image of Zen 2 communication lanes by adding the layer of old lanes onto Zen 5 photo
- we can clearly see this on CCD level, as GMIs were placed in the middle of CCD with two 4-core CCXs on Zen 2
- so, the left image is not a genuine Zen 5 communication diagram from the video, but it must have lanes positioned in a different way
- with that out the way, let's move on

- what do you mean by "wide GMI"? Both GMI ports?
- each CCD has two GMI ports and IOD also has the same two GMI ports. Each GMI port is '9-wide', which means each GMI PHY has nine logic areas. - each of the nine logic areas within GMI port translates into PHY that could get one or two communication lanes. This is visible.
- what we do not know is whether all those IF lanes are wired to one GMI port only; not visible on the image; topology documentation is scarce
- it'd be great to see EPYC tolopogy of IF lanes on CPUs that have 4 and fewer CCDs; this would bring us closer to the anwer

Wirko said:
That IFOP or GMI is still a bit of a mystery, with too little documentation available (some is here). May I ask you to do a tek-check of the data I compiled, calculated and listed here below? A single (not wide) IFOP (GMI) interface in the Zen 4 architecture has:

- a 32-bit wide bus in each direction (single-ended, 1 wire per bit)
- 3000 MHz default clock in Ryzen CPUs (2250 MHz in Epyc CPUs)
- quad data rate transfers, which calculates to...
- 12 GT/s in Ryzen (9 GT/s in Epyc)
- 48 GB/s per direction in Ryzen (36 GB/s in Epyc)

- Zen 4 is 32B/cycle read, 16B/cycle write
- more on this here: https://chipsandcheese.com/p/amds-zen-4-part-3-system-level-stuff-and-igpu
- Gigabyte leak: https://chipsandcheese.com/p/details-on-the-gigabyte-leak
- read speeds are much faster than write speeds over IF
- read bandwidth does not increase much when two CCDs operate
- write bandwidth almost doubles with two CCDs

Screenshot 2024-10-08 at 22-17-32 AMD’s Zen 4 Part 3 System Level Stuff and iGPU.png

Screenshot 2024-10-08 at 22-17-38 AMD’s Zen 4 Part 3 System Level Stuff and iGPU.png

Wirko · Oct 9, 2024

Tek-Check said:
it'd be great to see EPYC tolopogy of IF lanes on CPUs that have 4 and fewer CCDs; this would bring us closer to the anwer

You mean this slide, or are you looking for something more detailed?

(taken from Tom's)

Also, you've mentioned 36 GB/s and 72 GB/s before, and 9 bits here. It's obvious that 9 bits include the parity bit, but I don't understand what numbers AMD took to calculate 36 and 72 GB/s - unles that includes parity too.

Tek-Check · Oct 9, 2024

Wirko said:
You mean this slide, or are you looking for something more detailed?

I have this slide and entire presentation in .pdf. Ideal image or diagram would be EPYC with 4 or fewer CCDs.
I am looking for those detailed die shots where we can see physical wiring, or a diagram that shows them all.
This would allow us to see how they wire two GMI ports from each CCD.
When AMD says "GMI-Narrow", do they mean one GMI port only? And "GMI-Wide" means both GMI ports? It would make sense.
The next question is whether they wire all nine logic parts of a single GMI port to Infinity Fabric. If so, how many single wires, how many double ones?
I do not know.

Of course, AMD does not want to give up some secrets about their IF sauce and wiring, such as the speed of the fabric itself and how they wire it to CCDs and IOD. This is beyond my pay grade.

What we know from AMD:
1. CCD on >32C EPYC Zen4 was configured for 36 Gbps throughput per link on one GMI port to IF ("GMI-Narrow"). 36 Gbps = 4.5 GB/s per link
2. CCD on ≤32C EPYC Zen4 was configured for 36x2 Gbps throughput per link on two GMI ports to IF ("GMI-Wide"). 72 Gbps = 9 GB/s per dual link

The answer we need here is how many IF links does one GMI port provide? Is it 9? There are 9 pieces of logic on die per GMI port. Are they all used on PHY level? If 9 links are used, the throughput would be 9x36 Gbps = 324 Gbps = 40.5 GB/s for one CCD, and 648 Gbps = 81 GB/s for "GMI-wide"

Chips&Cheese testing of IF on Ryzen:
3. one CCD on Ryzen Zen4 has throughput of ~63 GB/s towards DDR5-6000 memory via IF; two CCDs ~77 GB/s
This shows the speed of 504 Gbps for one CCD and 616 Gbps for two CCDs.

- if only one GMI port is used on Ryzen and we assumed there are 9 links in each GMI port, this gives us 9x36 Gbps = 324 Gbps = 40.5 GB/s.
- the measured throughput by C&C was 63 GB/s on read speed, so more links would be needed on one CCD to achieve this throughput
- it seems physically impossible to use both GMI ports on Ryzen CCD and connect those to one GMI port on IOD.
- therefore, it could be the case that IF was configured to run faster on Ryzen CPUs

How does this sit?

Tek-Check · Oct 12, 2024

Wirko said:
You mean this slide, or are you looking for something more detailed?

We now have a confirmation from EPYC Zen 5 file:
"INTERNAL INFINITY FABRIC INTERFACES connect the I/O die with each CPU die using a total of 16 36 Gb/s Infinity Fabric links."

Going back to our previous considerations. What we know from AMD about theoretical bandwidth:
1. CCD on >32C EPYC Zen4 was configured for 36 Gbps throughput per link on one GMI port to IF ("GMI-Narrow"). 36 Gbps = 4.5 GB/s per link
- 16 links x 36 Gbps = 576 Gbps = 72 GB/s
2. CCD on ≤32C EPYC Zen4 was configured for 36x2 Gbps throughput per link on two GMI ports to IF ("GMI-Wide"). 72 Gbps = 9 GB/s per dual link
- 16 links x 72 Gbps = 1152 Gbps = 144 GB/s

3. one CCD on Ryzen Zen4 has throughput of ~63 GB/s towards DDR5-6000 memory via IF; two CCDs ~77 GB/s
So yes, it looks like one GMI port is used.

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	Kuro
Processor	AMD Ryzen 7 7800X3D@65W
Motherboard	MSI MAG B650 Tomahawk WiFi
Cooling	Thermalright Phantom Spirit 120 EVO
Memory	Corsair DDR5 6000C30 2x48GB (Hynix M)@6000 30-36-36-76 1.36V
Video Card(s)	PNY XLR8 RTX 4070 Ti SUPER 16G@200W
Storage	Crucial T500 2TB + WD Blue 8TB
Case	Lian Li LANCOOL 216
Power Supply	MSI MPG A850G
Software	Ubuntu 24.04 LTS + Windows 10 Home Build 19045
Benchmark Scores	17761 C23 Multi@65W

System Name	GraniteXT
Processor	Ryzen 9950X
Motherboard	ASRock B650M-HDV
Cooling	2x360mm custom loop
Memory	2x24GB Team Xtreem DDR5-8000 [M die]
Video Card(s)	RTX 3090 FE underwater
Storage	Intel P5800X 800GB + Samsung 980 Pro 2TB
Display(s)	MSI 342C 34" OLED
Case	O11D Evo RGB
Audio Device(s)	DCA Aeon 2 w/ SMSL M200/SP200
Power Supply	Superflower Leadex VII XG 1300W
Mouse	Razer Basilisk V3
Keyboard	Steelseries Apex Pro V2 TKL

System Name	My second and third PCs are Intel + Nvidia
Processor	AMD Ryzen 7 7800X3D @ 45 W TDP Eco Mode
Motherboard	MSi Pro B650M-A Wifi
Cooling	Noctua NH-U9S chromax.black push+pull
Memory	2x 24 GB Corsair Vengeance DDR5-6000 CL36
Video Card(s)	PowerColor Reaper Radeon RX 9070 XT
Storage	2 TB Corsair MP600 GS, 4 TB Seagate Barracuda
Display(s)	Dell S3422DWG 34" 1440 UW 144 Hz
Case	Corsair Crystal 280X
Audio Device(s)	Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply	750 W Seasonic Prime GX
Mouse	Logitech MX Master 2S
Keyboard	Logitech G413 SE
Software	Bazzite (Fedora Linux) KDE Plasma

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

Processor	R7-7700X
Motherboard	Gigabyte X670 Aorus Elite AX
Cooling	Scythe Fuma 2 rev B
Memory	no name DDR5-5200
Video Card(s)	Some 3080 10GB
Storage	dual Intel DC P4610 1.6TB
Display(s)	Gigabyte G34MQ + Dell 2708WFP
Case	Lian-Li Lancool III black no rgb
Power Supply	CM UCP 750W
Software	Win 10 Pro x64

System Name	Best AMD Computer
Processor	AMD 7900X3D
Motherboard	Asus X670E E Strix
Cooling	In Win SR36
Memory	GSKILL DDR5 32GB 5200 30
Video Card(s)	Sapphire Pulse 7900XT (Watercooled)
Storage	Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s)	GIGABYTE FV43U
Case	Corsair 7000D Airflow
Audio Device(s)	Corsair Void Pro, Logitch Z523 5.1
Power Supply	Deepcool 1000M
Mouse	Logitech g7 gaming mouse
Keyboard	Logitech G510
Software	Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores	Firestrike: 46183 Time Spy: 25121