Friday, March 17th 2017

AMD Ryzen Infinity Fabric Ticks at Memory Speed

Memory clock speeds will go a long way in improving the performance of an AMD Ryzen processor, according to new information by the company, which reveals that Infinity Fabric, the high-bandwidth interconnect used to connect the two quad-core complexes (CCXs) on 6-core and 8-core Ryzen processors with other uncore components, such as the PCIe root-complex, and the integrated southbridge; is synced with the memory clock. AMD made this revelation in a response to a question posed by Reddit user CataclysmZA.

Infinity Fabric, a successor to HyperTransport, is AMD's latest interconnect technology that connects the various components on the Ryzen "Summit Ridge" processor, and on the upcoming "Vega" GPU family. According to AMD, it is a 256-bit wide bi-directional crossbar. Think of it as town-square for the chip, where tagged data and instructions change hands between the various components. Within the CCX, the L3 cache performs some inter-core connectivity. The speed of the Infinity Fabric crossbar on a "Summit Ridge" Ryzen processor is determined by the memory clock. When paired with DDR4-2133 memory, for example, the crossbar ticks at 1066 MHz (SDR, actual clock). Using faster memory, according to AMD, hence has a direct impact on the bandwidth of this interconnect.

Source: CataclysmZA on Reddit
Add your own comment

95 Comments on AMD Ryzen Infinity Fabric Ticks at Memory Speed

#1
eidairaman1
So in other news it is memory bandwidth intensive. Can it utilize the bandwidth properly and show considerable gains in memory performance unlike Piledriver?
Posted on Reply
#2
ratirt
Hmm. one thought then. So those low latencies were actually architectural not like some people said windows scheduler problem? Wonder if they will release refurbished Ryzen now or how that is going to work.. also it would be great if the "crossbar" connection You mentioned tick not with half the speed of the memory but full speed. That would kick things up a notch I'd say.
Posted on Reply
#3
eidairaman1
ratirt said:
Hmm. one thought then. So those low latencies were actually architectural not like some people said windows scheduler problem?
Wonder if they will release refurbished Ryzen now or how that is going to work.. also it would be great if the "crossbar" connection You mentioned tick not with half the speed of the memory but full speed. That would kick things up a notch I'd say.


There is a Reason the Ryzen Logo is an Incomplete circle, it means the arch is open to improvements big and small.
Posted on Reply
#4
Legacy-ZA
This is the only thing I don't like what I read about the Ryzen CPU's.

Is there room for improvement? Yep. Will it cost you a new motherboard and CPU in the near future? Yep.
Posted on Reply
#5
Caring1
Why couldn't they make it 512 bit instead to increase the bandwidth?
Posted on Reply
#6
IceScreamer
Legacy-ZA said:
This is the only thing I don't like what I read about the Ryzen CPU's.

Is there room for improvement? Yep. Will it cost you a new motherboard and CPU in the near future? Yep.
New CPU sure, but I don't think you'll need a new board, seeing how AMD is staying on this platform for about 4 years, and Zen 2 is supposedly coming out sooner than that.
Posted on Reply
#7
ratirt
Caring1 said:
Why couldn't they make it 512 bit instead to increase the bandwidth?
That's the point. Maybe they couldn't do it. But since they see were it is now maybe they want this somewhere else. :) Meaning they will increase the bandwidth :)
Posted on Reply
#8
geon2k2
This was already known, and some application see a real benefit from improved interconnect/memory speed:

Posted on Reply
#9
Taloken
ratirt said:
Hmm. one thought then. So those low latencies were actually architectural not like some people said windows scheduler problem? Wonder if they will release refurbished Ryzen now or how that is going to work.. also it would be great if the "crossbar" connection You mentioned tick not with half the speed of the memory but full speed. That would kick things up a notch I'd say.
It actually tick with the full speed. The real frequency of a DDR module is always half its DDR-rating (eg DDR4-3200 -> 1600 MHz).
Posted on Reply
#10
ratirt
Taloken said:
It actually tick with the full speed. The real frequency of a DDR module is always half its DDR-rating (eg DDR4-3200 -> 1600 MHz).
Yeah right. Forgot it is dual channel DDR.
Thanks for clarification. Well then in this case the only thing is to get the memory with higher frequency although I'm wondering now if it is worth additional money? Will this better performing memory really make noticeable difference. From a consumer stand point this difference should be noticeable if you wanna go with good 3200Mhz mem. Otherwise it's pointless.
Posted on Reply
#11
chaosmassive
AMD need to drop this "CPU block style", interface between between 'group' of CPUs tend to be bottlenecked by bandwidth
look at back, Intel C2Q, Pentium D linked via FSB speed, but ultimately dropped it
AMD need to make real 'individual' cores, with shared L3 cache across 8 cores like Intel do

I dont know, maybe AMD try to save R&D cost by making 'blue print' of 4 cores configuration and simply 'copy-paste' cores to silicon
Posted on Reply
#12
Legacy-ZA
IceScreamer said:
New CPU sure, but I don't think you'll need a new board, seeing how AMD is staying on this platform for about 4 years, and Zen 2 is supposedly coming out sooner than that.
Dual Channel seems to be one of the problems, if they brought it out with Triple / Quad, these would have performed way better.
Posted on Reply
#13
erek
Why did they even decide against a Monolithic design? Can't believe we're talking about two separate modules called CCXs (CPU Complex)... just seems like an obsolete design back to the first dual cores that had to reach out to the FSB to communicate between each other. This is unbelievable to me, I know it's better than going out to the FSB, but it imagine how crazy Ryzen could of been with a Monolithic design... it'd be crazy fast I imagine...

Tired of anything related to modules with slow interconnects.
Posted on Reply
#14
the54thvoid
Very happy now that I hunted for 3200 GSkill memory for my build. I knew it responded better to frequency but i also knew the compatibility was an issue.
Posted on Reply
#15
uuuaaaaaa
Zen's uarch makes sense from a server perspective, also Naples (32C/64T server zen) runs on eight channel memory. At least there will be a reason to buy high end ultra fast ram now! (let's wait for motherboard support...)
Posted on Reply
#16
NC37
erek said:
Why did they even decide against a Monolithic design? Can't believe we're talking about two separate modules called CCXs (CPU Complex)... just seems like an obsolete design back to the first dual cores that had to reach out to the FSB to communicate between each other. This is unbelievable to me, I know it's better than going out to the FSB, but it imagine how crazy Ryzen could of been with a Monolithic design... it'd be crazy fast I imagine...

Tired of anything related to modules with slow interconnects.
Could be limitations in what is and isn't patented. As well as other limitations we don't know about and AMD's engineers might.

Look back at the old G4 chips in classic Macs. In the generation where there was a switch from the 7410 to the 7450, the 10s held a performance advantage due to shorter data paths. The 50s didn't get anywhere till the 55s when they brought in L3 and found ways to negate the longer paths. But Motorola couldn't just go back to the 7410s at the time. They'd only clock up to 600-650Mhz. The pathways being so short causes problems with running faster than that. Apple was in the big push to 1Ghz back then so, they opted to go with the less optimal 50s in order to get the Mhz.

Limitations in a design, forced the engineers to adopt a less optimal design. People really didn't know about it till the more hardcore Mac clockers got into the designs and really analyzed it. Which took longer back then than these days.
Posted on Reply
#17
nem..
i do guess than intel platform have highter support of ram than ryzen , but its not , ryzen have highter native support for run 2667mhz without OC .

AMD X370

Support for DDR4 3600(O.C.) / 3400(O.C.) / 3200(O.C.) / 2933(O.C.) / 2667* / 2400 / 2133 MHz memory modules

GA-Z270-Gaming K3

Support for DDR4 3866(O.C.) / 3800(O.C.) / 3733(O.C.) / 3666(O.C.) / 3600(O.C.) / 3466(O.C.) / 3400(O.C.) / 3333(O.C.) / 3300(O.C.) /3200(O.C.) / 3000(O.C.) / 2800(O.C.) / 2666(O.C.) / 2400 / 2133 MHz memory modules

link. http://www.gigabyte.com/Motherboard/GA-Z270-Gaming-K3-rev-10#sp


link. http://www.gigabyte.com/Motherboard/GA-AX370-Gaming-K7-rev-10#sp
Posted on Reply
#18
medi01
eidairaman1 said:
So in other news it is memory bandwidth intensive.
Huh?
This is a core cluster to core cluster thing, normalmemory isn't even involved.

Runs at memory frequency is quite a revelation, actually, what you state doesn't cover it like, at all.


chaosmassive said:
AMD need to make real 'individual' cores, with shared L3 cache across 8 cores like Intel do
They might do that, once people will actually start buying their products and they have more money to spend on R&D.
Posted on Reply
#19
Aenra
@nem.. dude how many more threads are you going to post that in? We got it :)
Posted on Reply
#21
deu
ratirt said:
Hmm. one thought then. So those low latencies were actually architectural not like some people said windows scheduler problem? Wonder if they will release refurbished Ryzen now or how that is going to work.. also it would be great if the "crossbar" connection You mentioned tick not with half the speed of the memory but full speed. That would kick things up a notch I'd say.
The latency is due to architechtural differences but can be solved in making the scheduler handle task differently. Basically Ryzen have core-complexes and the latency is due to handing task from on complex to another. Ryzen is 4(8)+(4(8) or 3+3 or 2+2. This is not necessary a problem if the scheduler KNOWS to minimize taskhandling across. So if scheduler identifies a Ryzen CPU i could handle all gaming on one complex (and everything other on the other complex and the issues that have caused problems in performance can somewhat be corrected) (Ryzen still have a clock disadvantage), but in everything above 1080p this should be miniscule.)

Feel free to correct me if im wrong but do it in an nice way :)
Posted on Reply
#22
deu
chaosmassive said:
AMD need to drop this "CPU block style", interface between between 'group' of CPUs tend to be bottlenecked by bandwidth
look at back, Intel C2Q, Pentium D linked via FSB speed, but ultimately dropped it
AMD need to make real 'individual' cores, with shared L3 cache across 8 cores like Intel do

I dont know, maybe AMD try to save R&D cost by making 'blue print' of 4 cores configuration and simply 'copy-paste' cores to silicon
All this is done to make a cheaper CPU (contra 1100 dollars) you get a 399 dollars. If taken into account there is as I understand it REALLY few downsides except inter-complex communication latencies, but unless you have ONE application that needs 16 cores intertwined it should not be a problem. (U want to keep your task on the same core anyway) Im not saying that you cant create an application that cant expose this "bottleneck", but it would not make sense to code an application that way) (If we talk gaming / everyday user applications.)

In fact can anyone name application where this WILL be a problem (granted that the scheduler understand the CCX architecture.) Im not trying to be smart or anything but I cant come up with one where this would actually be a limit (granted that the architecture was taken into account.)
Posted on Reply
#23
mastrdrver
Good video showing how talking across the CCXs through the fabric hurts performance. This also shows that MS Windows 10 scheduler need some tweaking.

Youtube: JbryPYcnscA
Posted on Reply
#24
BiggieShady
deu said:
Feel free to correct me if im wrong but do it in an nice way :)
deu said:
The latency is due to architechtural differences but can be solved in making the scheduler handle task differently.
I'll be making up numbers to illustrate the point here, the core reason is that you only can tweak the scheduler to improve the performance in one use case (gaming) by 0.5% and at the same time degrade performance in the other use cases (productivity) by 20% ... and you can't have both behaviors in the scheduler because games and other software run often at the same time.
Caring1 said:
Why couldn't they make it 512 bit instead to increase the bandwidth?
It would be great if latency issues could be fixed by increasing the bandwidth, but sadly it ain't so
Posted on Reply
#25
IceScreamer
Legacy-ZA said:
Dual Channel seems to be one of the problems, if they brought it out with Triple / Quad, these would have performed way better.
Yea, never thought about that actually, makes sense now that you mention it.

Also a question, could this Infinity Fabric in theory enable on-die Crossfire/SLI connection between two GPUs, removing (or reducing) the need for software?
Posted on Reply
Add your own comment