Friday, March 17th 2017

AMD Ryzen Infinity Fabric Ticks at Memory Speed

Memory clock speeds will go a long way in improving the performance of an AMD Ryzen processor, according to new information by the company, which reveals that Infinity Fabric, the high-bandwidth interconnect used to connect the two quad-core complexes (CCXs) on 6-core and 8-core Ryzen processors with other uncore components, such as the PCIe root-complex, and the integrated southbridge; is synced with the memory clock. AMD made this revelation in a response to a question posed by Reddit user CataclysmZA.

Infinity Fabric, a successor to HyperTransport, is AMD's latest interconnect technology that connects the various components on the Ryzen "Summit Ridge" processor, and on the upcoming "Vega" GPU family. According to AMD, it is a 256-bit wide bi-directional crossbar. Think of it as town-square for the chip, where tagged data and instructions change hands between the various components. Within the CCX, the L3 cache performs some inter-core connectivity. The speed of the Infinity Fabric crossbar on a "Summit Ridge" Ryzen processor is determined by the memory clock. When paired with DDR4-2133 memory, for example, the crossbar ticks at 1066 MHz (SDR, actual clock). Using faster memory, according to AMD, hence has a direct impact on the bandwidth of this interconnect.
Source: CataclysmZA on Reddit
Add your own comment

95 Comments on AMD Ryzen Infinity Fabric Ticks at Memory Speed

#76
cadaveca
My name is Dave
bug
Think about it: even if the whole L3 cache was left in there, the only way to address it would be over InfinityFabric. Same as reading from RAM.
Exactly. If they kill 2 cores on each CCX, 1/2 the L3 would effectively be an L4 cache over the infinity fabric. Which might not be a bad thing. :p



With this picture, you can see that the L3 cache is surrounded by 4 cores. Each core (with SMT) is effectively it's own device with its own L3, and we have 16 blocks of L3 in each CCX.
Posted on Reply
#77
L'Eliminateur
wow this is incredibly SHITTY, no matter how you spin it, at the tech level it's terribly bad, letme expand:

first you have a non-monolithic-ish CPU design -not quite MCM but you could call it "MCM on die"- that communicates with each other on a 256bit bus as well as the other uncore parts with a crossbar configuration, that by tiself already kills your performance as the inter CCX comm use this slow-ass bus, as pc perspective benchmarks showed it wreaks havoc on cache coherency and L3 access beyond local cache(intel does not have this issue).

And then you compound that by tying the bus to EXTERNAL RAM speed by making the memory controllers the "bus master" essentially, that's beyond bad design, it's appallingly bad

Intel with their Xeon HCC(high core count) does something similar as their ring bus max at 16 cores, so for 22 cores it has 2 ring busses that connect to eachother with bus bridges that have a very small impact of performance and each ring has a dedicated memory controller, in BIOS you can enable "cluster on die" mode which turns the single chip(it's still a monolithic core) into a numa-node for performance reasons(to stop the right ring from acesing the left rin ram space over the bridges), THAT is what you call an elegant and sophisticated design, the ring bus does not depend on the RAM nor core clocks

on one hand, AMD touts ryzen as the expensive intel killer... but they need to most expensive un-buyable memory to actually perform better?, top kek there AMD...

so Intel has not only a very big IPC lead with KBL, but they can maintain that IPC regardless of whatever shitty cheap ram you throw into the system.

Also remember that this is going into naples, and server ECC RDIMM memory tops at 2400, plus massive fabric overhead, as naples will be a MCM(mayeb even interposer to route the massive 256 bit bus) of 4 ryzen dies, each die provides 2 channels of ram so inter-chip ram/cache will be slow as molasses.
Posted on Reply
#78
bug
cadaveca
Exactly. If they kill 2 cores on each CCX, 1/2 the L3 would effectively be an L4 cache over the infinity fabric. Which might not be a bad thing. :p



With this picture, you can see that the L3 cache is surrounded by 4 cores. Each core (with SMT) is effectively it's own device with its own L3, and we have 16 blocks of L3 in each CCX.
Well, that "L4 cache" would be accessible at the same speed as RAM (because of InfinityFabric), so it would be pretty pointless. And that's on top of the thing that no OS or application is L4 cache aware in order to do anything useful with it.
Posted on Reply
#79
cdawall
where the hell are my stars
L'Eliminateur
so Intel has not only a very big IPC lead with KBL, but they can maintain that IPC regardless of whatever shitty cheap ram you throw into the system.
No they don't.

bug
Think about it: even if the whole L3 cache was left in there, the only way to address it would be over InfinityFabric. Same as reading from RAM.
I don't disagree I am just letting you know how it is listed. It could be used as a fallover cash. Fill the L3 and use the crap over infinity fabric, similar to Nvidia's 3.5gb+512mb 970.
Posted on Reply
#80
L'Eliminateur
cdawall
No they don't.
yes they do, KBL IPC is far far higher than ryzen, ryzen IPC compares somewhat to broadwell and KBL is 2 gens above and all benchs do sustain that.
Posted on Reply
#81
cdawall
where the hell are my stars
L'Eliminateur
yes they do, KBL IPC is far far higher than ryzen, ryzen IPC compares somewhat to broadwell and KBL is 2 gens above and all benchs do sustain that.
It's not surprising - but it is a bit shocking to see in this form: Kaby Lake truly does offer zero to the consumer in terms of clock for clock performance. (In fact, a couple of the results show it slower than the Skylake, but these are within the margin of error.) Enthusiasts and analysts have often lamented the "slow" progression of IPC changes on Intel's Core architecture since the introduction of Sandy Bridge, increasing just 3-6% on the product release cadence.
source

There isn't even a 10% difference when one says "far far higher" that is like saying the IPC is "far far higher" between bulldozer and skylake something to the terms of 50%. Single digit differences are not "far far higher"
Posted on Reply
#82
newtekie1
Semi-Retired Folder
erek
Pretty sad about 2x 2-core CCX modules to make up a 4-core :( Just want rid of Infinite Fabric slowness / limitations :(
The only real limitation is when a core on one CCX has to access cache on the other CCX. In that case, the L3 on the other CCX acts more like an L4, it is still faster than accessing system RAM. There is no getting rid of the Infinite Fabric, it is an interconnect between the CPU core and the rest of the system.
Posted on Reply
#83
Super XP
They need to somehow increase the speed of Infinity Fabric. Hence faster DDR4 ram. I am wondering why AMD didn't make Infinity Fabric's speed based on the CPU frequency. And why they chose the Ram.
Posted on Reply
#86
__isomorph__
L'Eliminateur
wow this is incredibly SHITTY, no matter how you spin it, at the tech level it's terribly bad, letme expand:

first you have a non-monolithic-ish CPU design -not quite MCM but you could call it "MCM on die"- that communicates with each other on a 256bit bus as well as the other uncore parts with a crossbar configuration, that by tiself already kills your performance as the inter CCX comm use this slow-ass bus, as pc perspective benchmarks showed it wreaks havoc on cache coherency and L3 access beyond local cache(intel does not have this issue).

And then you compound that by tying the bus to EXTERNAL RAM speed by making the memory controllers the "bus master" essentially, that's beyond bad design, it's appallingly bad

Intel with their Xeon HCC(high core count) does something similar as their ring bus max at 16 cores, so for 22 cores it has 2 ring busses that connect to eachother with bus bridges that have a very small impact of performance and each ring has a dedicated memory controller, in BIOS you can enable "cluster on die" mode which turns the single chip(it's still a monolithic core) into a numa-node for performance reasons(to stop the right ring from acesing the left rin ram space over the bridges), THAT is what you call an elegant and sophisticated design, the ring bus does not depend on the RAM nor core clocks

on one hand, AMD touts ryzen as the expensive intel killer... but they need to most expensive un-buyable memory to actually perform better?, top kek there AMD...

so Intel has not only a very big IPC lead with KBL, but they can maintain that IPC regardless of whatever shitty cheap ram you throw into the system.

Also remember that this is going into naples, and server ECC RDIMM memory tops at 2400, plus massive fabric overhead, as naples will be a MCM(mayeb even interposer to route the massive 256 bit bus) of 4 ryzen dies, each die provides 2 channels of ram so inter-chip ram/cache will be slow as molasses.
christ!!! thanks for the post. i read enough: not buying Ryzen to do my 1st build ever. even though i am a noob, i could feel something was amiss as i pored over countless reviews, articles, and forum posts re Ryzen. smells like smoke and the bird is crashing down in flames, it seems. perhaps AMD will codename v2 'Phoenix'. that'd be funny.

how i was hoping though to get something sensible to kick Intel's ass at a lower price point. damn, what a downer.
Posted on Reply
#87
__isomorph__
i now hang my head in shame. everybody, cancel my last post. after watching this :

the fog of war has lifted and it now seems clear Ryzen 7, especially 1700, but also 1800X, is the superior CPU.

watch the stats of Ryzen vs 7700K running BF1 starting @ 12:57. as the reviewer says, the Intel chip has no headroom left and is at its limit, and with optimization! whereas Ryzen, without any optimization, is barely breaking a sweat with plenty of headroom left. i predict that once the optims start rolling in, carnage and mayhem will ensue leaving the broken carcass of the Intel 7700K on the floor like the rotten corpse that it is.
Posted on Reply
#88
__isomorph__
L'Eliminateur
wow this is incredibly SHITTY, no matter how you spin it, at the tech level it's terribly bad, letme expand:
i take back my previous comment. check out this reviewer's youtube vid i post above.
Posted on Reply
#89
ratirt
__isomorph__
i now hang my head in shame. everybody, cancel my last post. after watching this :

the fog of war has lifted and it now seems clear Ryzen 7, especially 1700, but also 1800X, is the superior CPU.

watch the stats of Ryzen vs 7700K running BF1 starting @ 12:57. as the reviewer says, the Intel chip has no headroom left and is at its limit, and with optimization! whereas Ryzen, without any optimization, is barely breaking a sweat with plenty of headroom left. i predict that once the optims start rolling in, carnage and mayhem will ensue leaving the broken carcass of the Intel 7700K on the floor like the rotten corpse that it is.
That was a great video. And I like that dude's Scottish accent :) Anyway that explains everything. It is really nice that somebody managed to make such a video and explained how those RyZen and Intel CPU's go. RyZen is better from intel I7 7700K and within a year RyZen will mop the floor with the I7 7xxx series. This video shows how you should be looking at the CPU's. Great point of view and the only one that's actually valid.

BTW.
I wonder when we would expect a game RyZen fully ready. I wish to see how those CPU's are performing. Also how will the Ryzen+ perform and when they will release it :)
Posted on Reply
#90
SkOrPn
Legacy-ZA
Dual Channel seems to be one of the problems, if they brought it out with Triple / Quad, these would have performed way better.
They are bringing out quad channel memory for their High-End Ryzen Desktop Platform which will have 12 core and 16 core Ryzen parts for the real PC Enthusiast crowd, assuming that crowd exists. This not yet announced platform will require a new chipset probably X399 (or something along those lines) and a new much larger LGA socket, which will undoubtedly require more expensive boards. Expect $300-400 boards and $600-800 Ryzen parts using "roughly" the same socket as Naples. I say roughly because AMD will probably have the board designs built in such a way that Naple chips may not work on them at all, or heck maybe they will.

I would love to see Asus release a Rampage+Extreme X399 tier using 12 or 16 core Ryzen parts. That would be a first for AMD. Although not sure we have Enthusiasts of that caliber any longer these days. Ten years ago I would have dropped 5K on a machine just because, but today I don't even want to spend 2K if I can help it.
Posted on Reply
#91
Super XP
Infinity Fabric specs is a 256-bit wide bi-directional crossbar. Would a 512-bit wide bi-directional crossbar be more beneficial?
Posted on Reply
#92
bug
Super XP
Infinity Fabric specs is a 256-bit wide bi-directional crossbar. Would a 512-bit wide bi-directional crossbar be more beneficial?
I'm pretty sure AMD has measured it from all points of view and 256 is the best compromise today. Whether 512 bit wouldn't add any significant improvements or would have done so by (further) sacrificing memory compatibility or blowing out max TDP, there must be a solid reason they went with 256.

And please note IF in its current incarnation is not a problem. It can have an impact on some workloads that are rather rare during typical desktop usage. It's only mentioned because users need to know about it lest they go with Ryzen and later find out (for whatever reason) they need to run of of these workloads for significant periods of time. For 99.99% users, IF has literally no impact.
Posted on Reply
#93
uuuaaaaaa
SkOrPn
They are bringing out quad channel memory for their High-End Ryzen Desktop Platform which will have 12 core and 16 core Ryzen parts for the real PC Enthusiast crowd, assuming that crowd exists. This not yet announced platform will require a new chipset probably X399 (or something along those lines) and a new much larger LGA socket, which will undoubtedly require more expensive boards. Expect $300-400 boards and $600-800 Ryzen parts using "roughly" the same socket as Naples. I say roughly because AMD will probably have the board designs built in such a way that Naple chips may not work on them at all, or heck maybe they will.

I would love to see Asus release a Rampage+Extreme X399 tier using 12 or 16 core Ryzen parts. That would be a first for AMD. Although not sure we have Enthusiasts of that caliber any longer these days. Ten years ago I would have dropped 5K on a machine just because, but today I don't even want to spend 2K if I can help it.
Given AMD's history it would not surprise me to see consumer grade boards compatible with Naples if the socket is the same. Heck We could see a dual socket Naples board like the old eVGA SR-2, now that would be monstrous!
Posted on Reply
#94
eidairaman1
The Exiled Airman
uuuaaaaaa
Given AMD's history it would not surprise me to see consumer grade boards compatible with Naples if the socket is the same. Heck We could see a dual socket Naples board like the old eVGA SR-2, now that would be monstrous!
Uses different socket, akin to 2011-3 or 2066
Posted on Reply
#95
uuuaaaaaa
eidairaman1
Uses different socket, akin to 2011-3 or 2066
I am aware that this new HEDT platform will use a different socket. If they use the same LGA socket that they have on their Naples server platform, we could see some 32C/64T parts on "consumer" computers. In the past AMD has allowed it.
Posted on Reply
Add your own comment