AMD Ryzen Threadripper 2920X Review 47

AMD Ryzen Threadripper 2920X Review

(47 Comments) »

Introduction

AMD Logo

AMD surgically disrupted Intel's entire Core X series of high-end desktop (HEDT) processors with its Ryzen Threadripper family, and extended its competitive lead with its Ryzen Threadripper 2000 series this August. In its first round, the company launched 16-core and 32-core Threadripper parts, and today, it is adding two more options with the 12-core Threadripper 2920X and 24-core Threadripper 2970WX. In the meantime, the company also introduced the Dynamic Local Mode feature for its 24-core and 32-core Threadripper WX family, which helps end users overcome many of the design quirks of the multi-chip module (MCM) in which half the dies don't have local memory access, bringing about significant improvements.



Until now, Intel has had the upper hand in HEDT processor core counts. By tapping into its "Skylake-X" HCC (high core-count) silicon, Intel launched 12-core, 14-core, 16-core, and 18-core LGA2066 processors. The 14-thru-18 core SKUs beat the first-generation Threadrippers in performance owing to higher IPC and lower latencies thanks to the monolithic die design. AMD priced its 12-core and 16-core first-gen Threadrippers competitively to Intel's 8-core and 10-core SKUs, exceeding them on price-performance. This meant leaving the $1000-$2000 market uncontested, for which Intel had already built a use-case (prosumers who need a lot of multi-threaded performance and don't want to shell out a lot of money on workstations with 2P Xeons), and thus, we have new 24-core and 32-core Threadripper 2000WX parts from AMD. We are yet to get our hands on the new Core X 9000-series, but those are not architecturally new.

In this review, we take a look at the 12-core Ryzen Threadripper 2920X. Much like the Threadripper 1920X from last year that it succeeds, the TR 2920X achieves 12 cores by being a multi-chip module of two 8-core dies configured with 6 cores each, which are 12 nm "Pinnacle Ridge" in this case. Each of the two dies has a 3+3 CCX configuration. You get all of the new "Zen+" micro-architecture features and higher clock speeds. AMD is also launching this chip at an SEP of $649, which is $150 cheaper than what the 1920X launched at.

We are testing the Ryzen Threadripper 2920X at stock, with Precision Boost Overclock enabled and set to max, and at our highest manual overclocking frequency of 4.15 GHz.

AMD Ryzen Threadripper Market Segment Analysis
 PriceCores /
Threads
Base
Clock
Max.
Boost
L3
Cache
TDPArchitectureProcessSocket
Ryzen 7 1700$1908 / 163.0 GHz3.7 GHz16 MB65 WZen14 nmAM4
Core i7-9600K$2806 / 63.7 GHz4.6 GHz9 MB95 WCoffee Lake14 nmLGA 1151
Core i7-8700$3006 / 123.2 GHz4.6 GHz12 MB65 WCoffee Lake14 nmLGA 1151
Ryzen 7 1700X$3208 / 163.4 GHz3.8 GHz16 MB95 WZen14 nmAM4
Ryzen 7 2700$2508 / 163.2 GHz4.1 GHz16 MB65 WZen12 nmAM4
Core i7-8700K$3906 / 123.7 GHz4.7 GHz12 MB95 WCoffee Lake14 nmLGA 1151
Core i7-9700K$4208 / 83.6 GHz4.9 GHz12 MB95 WCoffee Lake14 nmLGA 1151
Ryzen 7 2700X$3058 / 163.7 GHz4.3 GHz16 MB105 WZen12 nmAM4
Ryzen 7 1800X$2508 / 163.6 GHz4.0 GHz16 MB95 WZen14 nmAM4
Core i9-9900K$5808 / 163.6 GHz5.0 GHz16 MB95 WCoffee Lake14 nmLGA 1151
Threadripper 1920X$75012 /243.5 GHz4.0 GHz32 MB180 WZen14 nmSP3r2
Threadripper 1950X$95016 / 323.4 GHz4.0 GHz32 MB180 WZen14 nmSP3r2
Threadripper 2920X$65012 / 243.5 GHz4.3 GHz32 MB180 WZen12 nmSP3r2
Threadripper 2950X$90016 / 323.5 GHz4.4 GHz32 MB180 WZen12 nmSP3r2
Threadripper 2970WX$130024 / 483.0 GHz4.2 GHz64 MB250 WZen12 nmSP3r2
Threadripper 2990WX$175032 / 643.0 GHz4.2 GHz64 MB250 WZen12 nmSP3r2
Core i7-7900X$138010 / 203.3 GHz4.4 GHz13.75 MB140 WSkylake14 nmLGA 2066
Core i7-7920X$120012 / 242.9 GHz4.3 GHz16.5 MB140 WSkylake14 nmLGA 2066
Core i7-7940X$141514 / 283.1 GHz4.3 GHz18.25 MB165 WSkylake14 nmLGA 2066
Core i7-7960X$170016 / 322.8 GHz4.2 GHz22 MB165 WSkylake14 nmLGA 2066

A Closer Look


Much like the rest of the Ryzen Threadripper 2000-series, the Threadripper 2920X comes in a lunchbox-sized hard case with paperboard frills that show off the huge processor inside. As we observed in our recent Core i9-9900K review, such packaging may look good on a store shelf, but is quite pointless.


There's no cooler included with the processor. You use your own TR4 or SP3r2-compatible cooler that can handle thermal loads of at least 180 W. Two very important accessories are part of the package: a screwdriver for the double-square socket screws that hold the TR4 retention brace in place and an adapter that lets you use Asetek-made, round AIO pump-blocks. Don't lose the screwdriver because unlike Intel LGA sockets, the only way you can open the TR4 socket is by undoing those socket screws. This tool has also been calibrated for the ideal screw tension of the socket, so simply keep turning it until it clicks.


The Ryzen Threadripper 2920X is huge! When viewed from the top, the package is as big as a credit card. Thank goodness AMD decided to make this package an LGA, or good luck trying to find a bent pin in a 4,094-pin PGA.


As you can see, the orange plastic bracket is needed to mount the processor into the TR4 socket. It works to increase the surface area of the indented portion of the IHS, so the metal retention brace can hold the processor in place. It's a critical component and not packaging material, so don't discard it. You'll also notice that only screws hold the brace down; there's no lever-hinge mechanism like on Intel sockets.

The Threadripper Concept

Ryzen Threadripper 2920X is a multi-chip module of two 8-core, 12 nm "Pinnacle Ridge" dies, each of which controls two DDR4 memory channels and 32 PCIe lanes for a combined quad-channel DDR4 memory interface, and 64-lane PCIe. Two cores are disabled per die, and each die is configured in a 3+3 core CCX layout (similar to Ryzen 5 2600 six-core AM4 chips). You hence end up with 12 cores in all. You get the full 16 MB of L3 cache per die, and hence 32 MB of total L3 cache for the entire processor. Perhaps a coarse way of describing the 2920X would be "2P-on-a-stick." Under the IHS, the die closest to the key corner and the die diagonally opposite to it are active. The other two dies are "dummies." These are either just blank slabs of silicon or dies that have completely failed validation. They're just there to support the IHS and distribute mounting pressure.


Infinity Fabric is AMD's new high-bandwidth interconnect introduced alongside "Zen." It connects not just the two quad-core CCX chiplets on the "Pinnacle Ridge" die, but also handles inter-die communication. On the 2-die Threadripper 2920X, there's one Infinity Fabric link between the two active dies, with a bi-directional bandwidth of 50 GB/s when running at 1600 MHz (the actual DRAM frequency). So if you run faster or slower memory, Infinity Fabric's bandwidth will scale accordingly. It takes around 105 nanoseconds (ns) for a CPU core to access memory controlled by the neighboring die, and less than 65 ns to access memory controlled by its own die.


Unlike Core X processors that are built with four memory channels wired to a single die, Threadrippers have two dual-channel interfaces making up the quad channel. It is possible for an application to spread its memory across all four channels for higher bandwidth memory access, but at higher latency. Less parallelized applications, such as PC games (which still haven't managed to need >16 GB of memory), can benefit from lower latency. AMD figured out a way to give users and their operating systems control over how to allocate memory because of UMA and NUMA.


To that end, there are several selectable user modes through Ryzen Master, which reconfigure the processor on the fly (reboot required). Memory access mode can be toggled between "Distributed Mode" (default) and "Local Mode". Distributed maximizes memory bandwidth to applications and tries to keep latencies constant (but higher), no matter which core the software is running on. Local mode, on the other hand, splits the system into two NUMA nodes (think "processor groups"), which allows Windows to know which cores have the memory interface attached to them for it to put the loads on those cores first, to run them with lower memory latency. The second processor group has higher memory latency, which results in applications on those cores running slower. This mode can be useful for low-threaded application and games. Our performance results have an additional data set for "Local Mode" enabled.

A third configuration option is "Legacy Compatibility Mode", which lets you adjust the exposed processor count. Some older games have difficulty running on systems with more than 16 cores and will crash right at the start. Using that option, you can reduce the core count of Threadripper.

A few weeks ago, just as Intel refreshed its HEDT lineup with the Core X 9000-series, AMD introduced Dynamic Local Mode, a software feature part of Ryzen Master which significantly improves performance of 24-core and 32-core Threadripper WX-series models. It works by running a background process that automatically allocates workloads to dies with local memory access first, and only when those cores are completely saturated does it invoke the cores without local memory access. Since all dies on the 12-core and 16-core models have local memory access, Dynamic Local Mode isn't applicable.

Unlike the socket AM4 Ryzen chips, Threadrippers have an unchanged memory controller configuration from AMD's EPYC enterprise processors. The Ryzen Threadripper 2920X supports up to 2 TB of quad-channel memory with ECC support (something like that is restricted to the Xeon brand on the Intel platform). Then again, we doubt HEDT users are going to need more than the 128 GB of memory Core X processors support.

The PCI-Express configuration is interesting. The MCM puts out a total of 64 PCIe gen 3.0 lanes. On a typical motherboard, these lanes are wired out as two PCI-Express 3.0 x16 slots that run at x16 bandwidth all the time, two additional x16 slots that run at x8 bandwidth all the time (without eating into the bandwidth of another slot), three M.2-NVMe slots with x4 bandwidth, each, and the remaining 4 lanes serving as chipset bus.

The Zen+ Architecture


Each of the two dies in the Threadripper 2950X MCM is made out of the new 12 nm "Pinnacle Ridge" silicon by AMD. This chip is based on the new "Zen+" micro-architecture in which the "+" denotes refinement rather than a major architectural change.


AMD summarizes the "+" in "Zen+" as the coming together of the new 12 nm process that enables higher clock speeds, an updated SenseMI feature set, the updated Precision Boost algorithm that sustains boost clocks better under stress, and physical improvements to the cache and memory sub-systems, which add up to an IPC uplift of 3 percent (clock-for-clock) over the first-generation "Zen."

The biggest change of "Pinnacle Ridge" remains its process node. The switch to 12 nm resulted in a 50 mV reduction in Vcore voltage at any given clock speed, enabling AMD to increase clocks by around 0.25 GHz across the board. The switch also enables all-core overclocks well above the 4 GHz mark, to around 4.20 GHz. Last but not least, this increase in power efficiency enabled AMD to release the 32-core Threadripper 2990WX, which wasn't feasible before.

AMD also deployed faster cache SRAM and refined the memory controllers to bring down latencies significantly. L3 cache latency is 16 percent lower, L2 cache latency is a staggering 34 percent lower, L1 latencies are reduced by 13 percent, and DRAM (memory) latencies by 11 percent. This is where almost all of the IPC uplift comes from. AMD also increased the maximum memory clocks. The processor now supports up to DDR4-2933 (JEDEC).


Updates to the chip's on-die SenseMI logic include Precision Boost 2 and Extended Frequency Range (XFR) 2. Precision Boost 2 now switches from arbitrary 2-core and all-core boost targets to a perpetually all-core boosting algorithm that elevates the most stressed cores to the highest boost states in a linear fashion (i.e., boost frequency increases with load). Every core is running above the nominal clock when the processor isn't idling, which contributes to a multi-core performance uplift. Besides load, the algorithm takes into account temperature, current, and Vcore. Granularity is 0.25X base clock (25 MHz).


Extended Frequency Range 2 (XFR 2) builds on the success of XFR with a new all-core uplift beyond the maximum boost clock. If your cooling is good enough (60°C), XFR will now elevate all cores beyond the boost state as opposed to just the best few cores. AMD claims that with the most ideal cooling, XFR 2.0 will give you a staggering 7 percent performance uplift without any manual overclocking on your part.

Our Patreon Silver Supporters can read articles in single-page format.
Discuss(47 Comments)
Apr 25th, 2024 09:26 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts