Thursday, August 10th 2017

AMD Ryzen Threadripper Memory and PCIe Detailed: What an MCM Entails

AMD built its Ryzen Threadripper HEDT (high-end desktop) processor as a multi-chip module (MCM) of two 8-core "Summit Ridge" dies, each with its own dual-channel memory controller, and PCI-Express interface. This is unlike the competing Core "Skylake-X" from Intel, which is a monolithic 18-core die with a quad-channel DDR4 interface and 44-lane PCIe on one die. AMD has devised some innovative methods of overcoming the latency issues inherent to an MCM arrangement like the Ryzen Threadripper, by tapping into its nUMA technology innovation.

To the hardware, four 8 GB DDR4 memory modules populating the four memory channels of a Ryzen Threadripper chip is seen as 16 GB controlled by each of the two "Summit Ridge" dies. To the software, it is a seamless block of 32 GB. Blindly interleaving the four 8 GB memory modules for four times the bandwidth of a single module isn't as straightforward as it is on the Core X, and is fraught with latency issues. A thread being processed by a core on die-A, having half of its memory allocation on memory controlled by a different die, is hit with latency. AMD is overcoming this by treating memory on a Ryzen Threadripper machine like a 2-socket machine, in which each socket has its own memory.
Software needs to be optimized to see Threadripper as featuring two memory allocation modes - Distributed Mode, and Local Mode. In Distributed Mode, all four memory channels are interleaved with a priority of giving the app access to the highest bandwidth. In Local Mode, the an app loads memory controlled by a particular die first, and only then begins to load memory controlled by the neighboring die. The priority here is latency. In its internal tests, the Distributed Mode yields higher memory bandwidth at the expense of latency (not by much, though); while the Local Mode does the opposite (provides the least latency at the expense of bandwidth).

AMD exhaustively marketed the Ryzen Threadripper as featuring 64 PCI-Express gen 3.0 lanes. They weren't counting the general-purpose lanes from the chipset, because those are gen 2.0. AMD arrived at the number 64 by adding up 32 PCIe gen 3.0 lanes from each of the two "Summit Ridge" silicons, including the 4 lanes typically reserved as chipset-bus (the interconnect between the processor and the AMD X399 chipset). On a typical Threadripper-powered machine 4 out of 64 lanes are permanently allocated as chipset-bus. 32 lanes are wired out as PEG (PCI-Express Graphics) lanes, driving either two graphics cards at full x16 bandwidth, or four cards at x8 bandwidth, each. But wait, that still leaves us with 28 lanes. These can either be used to wire out a third set of PEG slots (one x16 or two x8), or up to three M.2 slots with x4 bandwidth, leaving the remaining lanes for other onboard controllers.
Holding it all together is AMD InfinityFabric, a high-performance interconnect which connects two quad-core CCX units within a "Summit Ridge" dies, and the two "Summit Ridge" dies themselves on the Threadripper MCM. The interconnect keeps memory latency under 133 ns for a core to address the "farthest" memory (DIMMs controlled by the neighboring die. And is energy-efficient in that it consumes 2 pico-Joules per bit pushed. Threadripper features an inter-die, bi-directional bandwidth of 102.22 GB/s.
Add your own comment

10 Comments on AMD Ryzen Threadripper Memory and PCIe Detailed: What an MCM Entails

#1
Jism
Reviews are popping up. Threadripper 1950x is in general applications 40% faster then intels counterpart. The game changes where apps or games rely on single-core performance where intel takes the overhand. All to say, AMD is back.
Posted on Reply
#2
cdawall
where the hell are my stars
Jism said:
Reviews are popping up. Threadripper 1950x is in general applications 40% faster then intels counterpart. The game changes where apps or games rely on single-core performance where intel takes the overhand. All to say, AMD is back.
I'm impressed by the wattage game.
Posted on Reply
#3
Frick
Fishfaced Nincompoop
Jism said:
Reviews are popping up. Threadripper 1950x is in general applications 40% faster then intels counterpart. The game changes where apps or games rely on single-core performance where intel takes the overhand. All to say, AMD is back.
These are bad times for generalizations. "General applications" can be interpreted however one likes, so prepare to be bashed. :p
Posted on Reply
#4
Chaitanya
Seems like high margin workstation market is also slipping from intel's hands. Now looking forward to seeing what Epyc can do.
Posted on Reply
#5
cadaveca
My name is Dave
Epyc is twice the cores, but half the speed. So roughly exactly the same as Threadripper.
Posted on Reply
#6
ikeke
cadaveca said:
Epyc is twice the cores, but half the speed. So roughly exactly the same as Threadripper.
Actually, not.

Epyc 7601 is twice the cores, with 12thread maintaining full turbo (3,2Ghz) and full package keeping 2.7Ghz under load.

So if you want to make it mathematical its

TR 1950x 32threads x 3,4Ghz= 108,8thread/Ghz
https://en.wikichip.org/wiki/amd/ryzen_threadripper/1950x
Epyc 7601 64threads x 2,7Ghz= 172,8thread/Ghz
https://en.wikichip.org/wiki/amd/epyc/7601

It all boils down to graph seen here, i think. Add voltage -> win frequency = lose efficiency
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/
Posted on Reply
#7
FR@NK
btarunr said:
that still leaves us with 28 lanes. These can either be used to wire out a third set of PEG slots (one x16 or two x8)
No bta, you cant get a 3rd 16x slot from the remaining 28 lanes. Remember these 28 lanes are really 14x2 from the two different dies. The mainboards that you have seen with a 3rd 16x slot is using lanes from the chipset.

Even the highest end board from asus only has two 16x slots:

Posted on Reply
#8
JMccovery
FR@NK said:
No bta, you cant get a 3rd 16x slot from the remaining 28 lanes. Remember these 28 lanes are really 14x2 from the two different dies. The mainboards that you have seen with a 3rd 16x slot is using lanes from the chipset.

Even the highest end board from asus only has two 16x slots:


Why would the remaining 28 be configured as 14 per die?

I think the simplest configuration would be:
Die #0: x16 + x16
Die #1: x8 + x8 + 3x M.2 + chipset

The most complex configuration would probably be:
Die #0: x16 + x8 + two M.2
Die #1: x16 + x8 + one M.2 + chipset
Posted on Reply
#9
FR@NK
JMccovery said:
Why would the remaining 28 be configured as 14 per die?

I think the simplest configuration would be:
Die #0: x16 + x16
Die #1: x8 + x8 + 3x M.2 + chipset

The most complex configuration would probably be:
Die #0: x16 + x8 + two M.2
Die #1: x16 + x8 + one M.2 + chipset
The PCIe controller on each die is only capable of having one 16x connection even though there is 32 lanes available.

Die #0:
x16 and supports bifurcation for x8 + x8
x8 and no bifurcation support. These lanes are not connected on AM4 boards.
x4 and supports bifurcation for x2 + x2 or can be used for SATAexpress ports.
x4 Chipset connection.

Die #1:
x16 and supports bifurcation for x8 + x8
x8 and no bifurcation support.
x4 and supports bifurcation for x2 + x2 or can be used for SATAexpress ports.
x4 and supports bifurcation for x2 + x2 or can be used for SATAexpress ports.
Posted on Reply
#10
Vlada011
Jism said:
Reviews are popping up. Threadripper 1950x is in general applications 40% faster then intels counterpart. The game changes where apps or games rely on single-core performance where intel takes the overhand. All to say, AMD is back.
I don't know how much real gamers who spend a lot playing games buy 1000$ processors.
Mostly people who don't want only domination in game, who want amazing processor.
I think Intel will not success to charge 2000$ so easy as they get for 1000$ in previous years when Intel dominate.
Even domination with strongest i9-7890XE will be smaller than dominatio of i7-4960X or i7-5960X over AMD.
Now I think a lot of people will build Threadripper.
That's always happen when you are only option years and overprice product,
fool customers with 5% improvements per core for every new chipset, they turn against you immediately when competition show up.
Intel launch even products of absolutely same performance only difference in fabric frequency decide who is faster. They sell processors on that way years and because good ROG Motherboards. When you see such beatifull boards you think about upgrade even if you don't need.
Intel will lost good percent of market. Coffee Lake will decrease AMD s profit but nothing significantly.
Posted on Reply