• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Ryzen Threadripper Memory and PCIe Detailed: What an MCM Entails

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,372 (7.67/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
AMD built its Ryzen Threadripper HEDT (high-end desktop) processor as a multi-chip module (MCM) of two 8-core "Summit Ridge" dies, each with its own dual-channel memory controller, and PCI-Express interface. This is unlike the competing Core "Skylake-X" from Intel, which is a monolithic 18-core die with a quad-channel DDR4 interface and 44-lane PCIe on one die. AMD has devised some innovative methods of overcoming the latency issues inherent to an MCM arrangement like the Ryzen Threadripper, by tapping into its nUMA technology innovation.

To the hardware, four 8 GB DDR4 memory modules populating the four memory channels of a Ryzen Threadripper chip is seen as 16 GB controlled by each of the two "Summit Ridge" dies. To the software, it is a seamless block of 32 GB. Blindly interleaving the four 8 GB memory modules for four times the bandwidth of a single module isn't as straightforward as it is on the Core X, and is fraught with latency issues. A thread being processed by a core on die-A, having half of its memory allocation on memory controlled by a different die, is hit with latency. AMD is overcoming this by treating memory on a Ryzen Threadripper machine like a 2-socket machine, in which each socket has its own memory.





Software needs to be optimized to see Threadripper as featuring two memory allocation modes - Distributed Mode, and Local Mode. In Distributed Mode, all four memory channels are interleaved with a priority of giving the app access to the highest bandwidth. In Local Mode, the an app loads memory controlled by a particular die first, and only then begins to load memory controlled by the neighboring die. The priority here is latency. In its internal tests, the Distributed Mode yields higher memory bandwidth at the expense of latency (not by much, though); while the Local Mode does the opposite (provides the least latency at the expense of bandwidth).

AMD exhaustively marketed the Ryzen Threadripper as featuring 64 PCI-Express gen 3.0 lanes. They weren't counting the general-purpose lanes from the chipset, because those are gen 2.0. AMD arrived at the number 64 by adding up 32 PCIe gen 3.0 lanes from each of the two "Summit Ridge" silicons, including the 4 lanes typically reserved as chipset-bus (the interconnect between the processor and the AMD X399 chipset). On a typical Threadripper-powered machine 4 out of 64 lanes are permanently allocated as chipset-bus. 32 lanes are wired out as PEG (PCI-Express Graphics) lanes, driving either two graphics cards at full x16 bandwidth, or four cards at x8 bandwidth, each. But wait, that still leaves us with 28 lanes. These can either be used to wire out a third set of PEG slots (one x16 or two x8), or up to three M.2 slots with x4 bandwidth, leaving the remaining lanes for other onboard controllers.



Holding it all together is AMD InfinityFabric, a high-performance interconnect which connects two quad-core CCX units within a "Summit Ridge" dies, and the two "Summit Ridge" dies themselves on the Threadripper MCM. The interconnect keeps memory latency under 133 ns for a core to address the "farthest" memory (DIMMs controlled by the neighboring die. And is energy-efficient in that it consumes 2 pico-Joules per bit pushed. Threadripper features an inter-die, bi-directional bandwidth of 102.22 GB/s.



View at TechPowerUp Main Site
 
Last edited by a moderator:
Joined
Dec 30, 2010
Messages
2,098 (0.43/day)
Reviews are popping up. Threadripper 1950x is in general applications 40% faster then intels counterpart. The game changes where apps or games rely on single-core performance where intel takes the overhand. All to say, AMD is back.
 

cdawall

where the hell are my stars
Joined
Jul 23, 2006
Messages
27,680 (4.27/day)
Location
Houston
System Name All the cores
Processor 2990WX
Motherboard Asrock X399M
Cooling CPU-XSPC RayStorm Neo, 2x240mm+360mm, D5PWM+140mL, GPU-2x360mm, 2xbyski, D4+D5+100mL
Memory 4x16GB G.Skill 3600
Video Card(s) (2) EVGA SC BLACK 1080Ti's
Storage 2x Samsung SM951 512GB, Samsung PM961 512GB
Display(s) Dell UP2414Q 3840X2160@60hz
Case Caselabs Mercury S5+pedestal
Audio Device(s) Fischer HA-02->Fischer FA-002W High edition/FA-003/Jubilate/FA-011 depending on my mood
Power Supply Seasonic Prime 1200w
Mouse Thermaltake Theron, Steam controller
Keyboard Keychron K8
Software W10P
Reviews are popping up. Threadripper 1950x is in general applications 40% faster then intels counterpart. The game changes where apps or games rely on single-core performance where intel takes the overhand. All to say, AMD is back.

I'm impressed by the wattage game.
 

Frick

Fishfaced Nincompoop
Joined
Feb 27, 2006
Messages
18,930 (2.85/day)
Location
Piteå
System Name Black MC in Tokyo
Processor Ryzen 5 5600
Motherboard Asrock B450M-HDV
Cooling Be Quiet! Pure Rock 2
Memory 2 x 16GB Kingston Fury 3400mhz
Video Card(s) XFX 6950XT Speedster MERC 319
Storage Kingston A400 240GB | WD Black SN750 2TB |WD Blue 1TB x 2 | Toshiba P300 2TB | Seagate Expansion 8TB
Display(s) Samsung U32J590U 4K + BenQ GL2450HT 1080p
Case Fractal Design Define R4
Audio Device(s) Line6 UX1 + some headphones, Nektar SE61 keyboard
Power Supply Corsair RM850x v3
Mouse Logitech G602
Keyboard Cherry MX Board 1.0 TKL Brown
VR HMD Acer Mixed Reality Headset
Software Windows 10 Pro
Benchmark Scores Rimworld 4K ready!
Reviews are popping up. Threadripper 1950x is in general applications 40% faster then intels counterpart. The game changes where apps or games rely on single-core performance where intel takes the overhand. All to say, AMD is back.

These are bad times for generalizations. "General applications" can be interpreted however one likes, so prepare to be bashed. :p
 
Joined
Mar 7, 2011
Messages
3,931 (0.82/day)
Seems like high margin workstation market is also slipping from intel's hands. Now looking forward to seeing what Epyc can do.
 
Joined
May 6, 2012
Messages
184 (0.04/day)
Location
Estonia
System Name Steamy
Processor Ryzen 7 2700X
Motherboard Asrock AB350M-Pro4
Cooling Wraith Prism
Memory 2x8GB HX429C15PB3AK2/16
Video Card(s) R9 290X WC
Storage 960Evo 500GB nvme
Case Fractal Design Define Mini C
Power Supply Seasonic SS-660XP2
Software Windows 10 Pro
Benchmark Scores http://hwbot.org/user/kinski/ http://valid.x86.fr/qfxqhj https://goo.gl/uWkw7n
Epyc is twice the cores, but half the speed. So roughly exactly the same as Threadripper.

Actually, not.

Epyc 7601 is twice the cores, with 12thread maintaining full turbo (3,2Ghz) and full package keeping 2.7Ghz under load.

So if you want to make it mathematical its

TR 1950x 32threads x 3,4Ghz= 108,8thread/Ghz
https://en.wikichip.org/wiki/amd/ryzen_threadripper/1950x
Epyc 7601 64threads x 2,7Ghz= 172,8thread/Ghz
https://en.wikichip.org/wiki/amd/epyc/7601

It all boils down to graph seen here, i think. Add voltage -> win frequency = lose efficiency
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/
 
Joined
Apr 30, 2006
Messages
1,181 (0.18/day)
Processor 7900
Motherboard Rampage Apex
Cooling H115i
Memory 64GB TridentZ 3200 14-14-14-34-1T
Video Card(s) Fury X
Case Corsair 740
Audio Device(s) 8ch LPCM via HDMI to Yamaha Z7 Receiver
Power Supply Corsair AX860
Mouse G903
Keyboard G810
Software 8.1 x64
that still leaves us with 28 lanes. These can either be used to wire out a third set of PEG slots (one x16 or two x8)

No bta, you cant get a 3rd 16x slot from the remaining 28 lanes. Remember these 28 lanes are really 14x2 from the two different dies. The mainboards that you have seen with a 3rd 16x slot is using lanes from the chipset.

Even the highest end board from asus only has two 16x slots:

 
Joined
Jan 20, 2012
Messages
94 (0.02/day)
No bta, you cant get a 3rd 16x slot from the remaining 28 lanes. Remember these 28 lanes are really 14x2 from the two different dies. The mainboards that you have seen with a 3rd 16x slot is using lanes from the chipset.

Even the highest end board from asus only has two 16x slots:


Why would the remaining 28 be configured as 14 per die?

I think the simplest configuration would be:
Die #0: x16 + x16
Die #1: x8 + x8 + 3x M.2 + chipset

The most complex configuration would probably be:
Die #0: x16 + x8 + two M.2
Die #1: x16 + x8 + one M.2 + chipset
 
Joined
Apr 30, 2006
Messages
1,181 (0.18/day)
Processor 7900
Motherboard Rampage Apex
Cooling H115i
Memory 64GB TridentZ 3200 14-14-14-34-1T
Video Card(s) Fury X
Case Corsair 740
Audio Device(s) 8ch LPCM via HDMI to Yamaha Z7 Receiver
Power Supply Corsair AX860
Mouse G903
Keyboard G810
Software 8.1 x64
Why would the remaining 28 be configured as 14 per die?

I think the simplest configuration would be:
Die #0: x16 + x16
Die #1: x8 + x8 + 3x M.2 + chipset

The most complex configuration would probably be:
Die #0: x16 + x8 + two M.2
Die #1: x16 + x8 + one M.2 + chipset

The PCIe controller on each die is only capable of having one 16x connection even though there is 32 lanes available.

Die #0:
x16 and supports bifurcation for x8 + x8
x8 and no bifurcation support. These lanes are not connected on AM4 boards.
x4 and supports bifurcation for x2 + x2 or can be used for SATAexpress ports.
x4 Chipset connection.

Die #1:
x16 and supports bifurcation for x8 + x8
x8 and no bifurcation support.
x4 and supports bifurcation for x2 + x2 or can be used for SATAexpress ports.
x4 and supports bifurcation for x2 + x2 or can be used for SATAexpress ports.
 
Joined
Sep 22, 2012
Messages
1,010 (0.24/day)
Location
Belgrade, Serbia
System Name Intel® X99 Wellsburg
Processor Intel® Core™ i7-5820K - 4.5GHz
Motherboard ASUS Rampage V E10 (1801)
Cooling EK RGB Monoblock + EK XRES D5 Revo Glass PWM
Memory CMD16GX4M4A2666C15
Video Card(s) ASUS GTX1080Ti Poseidon
Storage Samsung 970 EVO PLUS 1TB /850 EVO 1TB / WD Black 2TB
Display(s) Samsung P2450H
Case Lian Li PC-O11 WXC
Audio Device(s) CREATIVE Sound Blaster ZxR
Power Supply EVGA 1200 P2 Platinum
Mouse Logitech G900 / SS QCK
Keyboard Deck 87 Francium Pro
Software Windows 10 Pro x64
Reviews are popping up. Threadripper 1950x is in general applications 40% faster then intels counterpart. The game changes where apps or games rely on single-core performance where intel takes the overhand. All to say, AMD is back.

I don't know how much real gamers who spend a lot playing games buy 1000$ processors.
Mostly people who don't want only domination in game, who want amazing processor.
I think Intel will not success to charge 2000$ so easy as they get for 1000$ in previous years when Intel dominate.
Even domination with strongest i9-7890XE will be smaller than dominatio of i7-4960X or i7-5960X over AMD.
Now I think a lot of people will build Threadripper.
That's always happen when you are only option years and overprice product,
fool customers with 5% improvements per core for every new chipset, they turn against you immediately when competition show up.
Intel launch even products of absolutely same performance only difference in fabric frequency decide who is faster. They sell processors on that way years and because good ROG Motherboards. When you see such beatifull boards you think about upgrade even if you don't need.
Intel will lost good percent of market. Coffee Lake will decrease AMD s profit but nothing significantly.
 
Top