AMD Ryzen Threadripper 2970WX Review 21

AMD Ryzen Threadripper 2970WX Review

(21 Comments) »

Introduction

AMD Logo

AMD took baby steps into the HEDT (high-end desktop) segment that eluded it for a decade, with the Ryzen Threadripper series and a modest line-up of just three SKUs. Those three were enough to plunge Intel's bloated "Skylake-X" Core X processor family into complete disarray. Intel's initial Core X family were built on the LCC (low core count) variant of Skylake-X, which has up to 10 cores. It responded to AMD's 12-core and 16-core Threadrippers with SKUs based on the HCC (high core count) die and up to 18 cores. Threadrippers continued to offer value under the $1000-mark, while Intel's HCC-based SKUs remained uncontested. AMD changed that with the 2nd generation Threadripper family which introduces the WX (workstation-enthusiast) sub-variants consisting of 24-core and 32-core models to blunt HCC.

AMD's Ryzen Threadripper processors are multi-chip modules of 8-core dies. The 16-core and 12-core models feature 2 dies, while the 24-core and 32-core ones have 4. These are both similar and dissimilar to the company's EPYC high core-count processors; similar in being 4-die MCMs, but dissimilar in the way they're wired out. Logically, an MCM with four "Pinnacle Ridge" dies must have a 8-channel DDR4 memory interface and 128 PCIe lanes. The SP3r2 package has wiring for all those. However, AMD didn't want to force users to upgrade motherboards just to have these 24-core or 32-core parts, and it probably didn't want them to cannibalize its high-margin EPYC products. So it decided to make the 4-die MCMs work on X399 by wiring memory and PCIe to just two of the four dies, while having the other two dies rely on the Infinity Fabric interconnect for memory and I/O access.



Cores on the dies with memory and PCIe wiring are called "I/O cores," while those without direct access are called "compute cores." AMD's highly customized scheduler extensions for Windows, from the Ryzen Master software, ensure that I/O cores are saturated with processing workloads first, before the compute cores. In theory, this should benefit applications that can scale beyond 16-core/32-thread, but if that workload is memory-intensive, it could drag down performance of all cores. The Ryzen Threadripper 2990WX is the company's flagship part, with 32 cores and 64 threads.

The Ryzen Threadripper 2970WX, which we're reviewing today, is 24-core/48-thread. AMD achieves this core count by disabling two cores per die. Within each die, one core per Zen Compute Complex (CCX) is disabled. The resulting CCX configuration is [3+3] + [3+3] + [3+3] + [3+3]. We wonder why AMD didn't go with full [4+4] + [4+4] for the I/O dies and [2+2] + [2+2] for the compute dies, ending up with two-thirds of its cores having direct I/O. One possible explanation is it probably didn't want I/O dies to saturate the memory bus too much, leaving the compute dies as dead weights in memory-intensive scenarios. The 2970WX offers the same clock speeds as the flagship 2990WX with 3.00 GHz stock and 4.20 GHz maximum Precision Boost frequency. This supposedly-workstation chip also has XFR (extended frequency range), which rewards good cooling with automatic overclocks beyond even the maximum boost frequency. You also get the full 64 MB of L3 cache, full quad-channel DDR4 memory interface, and full 64-lane PCIe interface.

AMD Ryzen Threadripper Market Segment Analysis
 PriceCores /
Threads
Base
Clock
Max.
Boost
L3
Cache
TDPArchitectureProcessSocket
Ryzen 7 1700$1908 / 163.0 GHz3.7 GHz16 MB65 WZen14 nmAM4
Core i7-9600K$2806 / 63.7 GHz4.6 GHz9 MB95 WCoffee Lake14 nmLGA 1151
Core i7-8700$3006 / 123.2 GHz4.6 GHz12 MB65 WCoffee Lake14 nmLGA 1151
Ryzen 7 1700X$3208 / 163.4 GHz3.8 GHz16 MB95 WZen14 nmAM4
Ryzen 7 2700$2508 / 163.2 GHz4.1 GHz16 MB65 WZen12 nmAM4
Core i7-8700K$3906 / 123.7 GHz4.7 GHz12 MB95 WCoffee Lake14 nmLGA 1151
Core i7-9700K$4208 / 83.6 GHz4.9 GHz12 MB95 WCoffee Lake14 nmLGA 1151
Ryzen 7 2700X$3058 / 163.7 GHz4.3 GHz16 MB105 WZen12 nmAM4
Ryzen 7 1800X$2508 / 163.6 GHz4.0 GHz16 MB95 WZen14 nmAM4
Core i9-9900K$5808 / 163.6 GHz5.0 GHz16 MB95 WCoffee Lake14 nmLGA 1151
Threadripper 1920X$75012 /243.5 GHz4.0 GHz32 MB180 WZen14 nmSP3r2
Threadripper 1950X$95016 / 323.4 GHz4.0 GHz32 MB180 WZen14 nmSP3r2
Threadripper 2920X$65012 / 243.5 GHz4.3 GHz32 MB180 WZen12 nmSP3r2
Threadripper 2950X$90016 / 323.5 GHz4.4 GHz32 MB180 WZen12 nmSP3r2
Threadripper 2970WX$130024 / 483.0 GHz4.2 GHz64 MB250 WZen12 nmSP3r2
Threadripper 2990WX$175032 / 643.0 GHz4.2 GHz64 MB250 WZen12 nmSP3r2
Core i7-7900X$138010 / 203.3 GHz4.4 GHz13.75 MB140 WSkylake14 nmLGA 2066
Core i7-7920X$120012 / 242.9 GHz4.3 GHz16.5 MB140 WSkylake14 nmLGA 2066
Core i7-7940X$141514 / 283.1 GHz4.3 GHz18.25 MB165 WSkylake14 nmLGA 2066
Core i7-7960X$170016 / 322.8 GHz4.2 GHz22 MB165 WSkylake14 nmLGA 2066

A Closer Look


Much like the rest of the Ryzen Threadripper 2000-series, the Threadripper 2970WX comes in a lunchbox-sized hard case with paperboard frills that show off the huge processor inside. As we observed in our recent Core i9-9900K review, such packaging may look good on a store shelf, but is quite pointless.


There's no cooler included with the processor. You use your own TR4 or SP3r2-compatible cooler that can handle thermal loads of at least 250 W, which usually means "watercooling." Two very important accessories are part of the package: a screwdriver for the double-square socket screws that hold the TR4 retention brace in place and an adapter that lets you use Asetek-made, round AIO pump-blocks. Don't lose the screwdriver because unlike Intel LGA sockets, the only way you can open the TR4 socket is by undoing those socket screws. This tool has also been calibrated for the ideal screw tension of the socket, so simply keep turning it until it clicks.


The Ryzen Threadripper 2970WX is huge! When viewed from the top, the package is as big as a credit card. Thank goodness AMD decided to make this package an LGA, or good luck trying to find a bent pin in a 4,094-pin PGA.


As you can see, the orange plastic bracket is needed to mount the processor into the TR4 socket. It works to increase the surface area of the indented portion of the IHS, so the metal retention brace can hold the processor in place. It's a critical component and not packaging material, so don't discard it. You'll also notice that only screws hold the brace down; there's no lever-hinge mechanism like on Intel sockets.

The Threadripper Concept

Ryzen Threadripper 2970WX is a multi-chip module of four 8-core, 12 nm "Pinnacle Ridge" dies. Each of the four dies has two cores disabled, which leaves us with 24 cores in all. Think of this as 4P Ryzen 5 six-core-on-a-stick. As we explained earlier, only two out of four dies has their memory controllers wired out to memory slots on the motherboard. Cores in these dies are called "I/O cores" by AMD. The other two dies have no direct access and rely on the Infinity Fabric interconnect to access memory controlled by a neighboring die. Cores from these dies are called "compute cores." The same scheme applies to other I/O, such as PCIe, SATA, USB, audio, etc. In the pre-IMC days, memory controllers were located on a separate chip on the motherboard, called a northbridge. In a way, those compute cores are configured like processors from that era.

Each of the I/O dies controls two DDR4 memory channels and 32 PCIe lanes for a combined quad-channel DDR4 memory interface and 64-lane PCIe. Despite disabled cores, you get the full 16 MB of L3 cache per die and hence, 64 MB of total L3 cache for the entire processor. Under the IHS, the die closest to the key corner and the die diagonally opposite to it are the I/O dies. The other two dies are compute dies. AMD decided not to wire out these dies on the platform to ensure backwards compatibility with X399 motherboards.


Infinity Fabric is AMD's new high-bandwidth interconnect introduced alongside "Zen." It connects not just the two quad-core CCX chiplets on the "Pinnacle Ridge" die, but also handles inter-die communication. On the 4-die Threadripper 2970WX, there's one Infinity Fabric link between the two active dies with a bi-directional bandwidth of 25 GB/s when running at 1600 MHz (the actual DRAM frequency). So if you run faster or slower memory, Infinity Fabric's bandwidth will scale accordingly. It takes around 105 nanoseconds (ns) for a CPU core to access memory controlled by the neighboring die, and less than 65 ns to access memory controlled by its own die.


Unlike Core X processors that are built with four memory channels wired to a single die, Threadrippers have two dual-channel interfaces making up quad-channel. It is possible for an application to spread its memory across all four channels for higher bandwidth memory access, but at higher latency. Less parallelized applications, such as PC games (which still haven't managed to need >16 GB of memory), can benefit from lower latency. AMD figured out a way to give users and their operating systems control over how to allocate memory because of UMA and NUMA.


To that end, there are several selectable user modes through Ryzen Master, which reconfigure the processor on the fly (reboot required). Memory access mode can be toggled between "Distributed Mode" (default) and "Local Mode". Distributed maximizes memory bandwidth to applications and tries to keep latencies constant (but higher), no matter which core the software is running on. Local mode, on the other hand, splits the system into two NUMA nodes (think "processor groups"), which allows Windows to know which cores have the memory interface attached to them for it to put the loads on those cores first, to run them with lower memory latency. The second processor group has higher memory latency, which results in applications on those cores running slower. This mode can be useful for low-threaded application and games. Our performance results have an additional data set for "Local Mode" enabled.

A third configuration option is "Legacy Compatibility Mode", which lets you adjust the exposed processor count. Some older games have difficulty running on systems with more than 16 cores and will crash right at the start. Using that option, you can reduce the core count of Threadripper.

A few weeks ago, just as Intel refreshed its HEDT lineup with the Core X 9000-series, AMD introduced Dynamic Local Mode, a software feature part of Ryzen Master which significantly improves performance of 24-core and 32-core Threadripper WX-series models. It works by running a background process that automatically allocates workloads to dies with local memory access first, and only when those cores are completely saturated does it invoke the cores without local memory access. Since all dies on the 12-core and 16-core models have local memory access, Dynamic Local Mode isn't applicable.

Unlike the socket AM4 Ryzen chips, Threadrippers have an unchanged memory controller configuration from AMD's EPYC enterprise processors. The Ryzen Threadripper 2920X supports up to 2 TB of quad-channel memory with ECC support (something like that is restricted to the Xeon brand in the Intel platform). Then again, we doubt HEDT users are going to need more than the 128 GB of memory Core X processors support.

The PCI-Express configuration is interesting. The MCM puts out a total of 64 PCIe gen 3.0 lanes. On a typical motherboard, these lanes are wired out as two PCI-Express 3.0 x16 slots that run at x16 bandwidth all the time, two additional x16 slots that run at x8 bandwidth all the time (without eating into the bandwidth of another slot), three M.2-NVMe slots with x4 bandwidth, each, and the remaining 4 lanes serving as chipset bus.

The Zen+ Architecture


Each of the four dies in the Threadripper 2970WX MCM is made out of the new 12 nm "Pinnacle Ridge" silicon by AMD. This chip is based on the new "Zen+" micro-architecture in which the "+" denotes refinement rather than a major architectural change.


AMD summarizes the "+" in "Zen+" as the coming together of the new 12 nm process that enables higher clock speeds, an updated SenseMI feature set, the updated Precision Boost algorithm that sustains boost clocks better under stress, and physical improvements to the cache and memory sub-systems, which add up to an IPC uplift of 3 percent (clock-for-clock) over the first-generation "Zen."

The biggest change of "Pinnacle Ridge" remains its process node. The switch to 12 nm resulted in a 50 mV reduction in Vcore voltage at any given clock speed, enabling AMD to increase clocks by around 0.25 GHz across the board. The switch also enables all-core overclocks well above the 4 GHz mark, to around 4.20 GHz. Last but not least, this increase in power efficiency enabled AMD to release the 32-core Threadripper 2990WX, which wasn't feasible before.

AMD also deployed faster cache SRAM and refined the memory controllers to bring down latencies significantly. L3 cache latency is 16 percent lower, L2 cache latency is a staggering 34 percent lower, L1 latencies are reduced by 13 percent, and DRAM (memory) latencies by 11 percent. This is where almost all of the IPC uplift comes from. AMD also increased the maximum memory clocks. The processor now supports up to DDR4-2933 (JEDEC).


Updates to the chip's on-die SenseMI logic include Precision Boost 2 and Extended Frequency Range (XFR) 2. Precision Boost 2 now switches from arbitrary 2-core and all-core boost targets to a perpetual all-core boosting algorithm that elevates the most stressed cores to the highest boost states in a linear fashion (i.e. boost frequency increases with load). Every core is running above the nominal clock when the processor isn't idling, which contributes to a multi-core performance uplift. Besides load, the algorithm takes into account temperature, current, and Vcore. Granularity is 0.25X base clock (25 MHz).


Extended Frequency Range 2 (XFR 2) builds on the success of XFR with a new all-core uplift beyond the maximum boost clock. If your cooling is good enough (60°C), XFR will now elevate all cores beyond the boost state as opposed to just the best few cores. AMD claims that with the most ideal cooling, XFR 2.0 will give you a staggering 7 percent performance uplift without any manual overclocking on your part.

Our Patreon Silver Supporters can read articles in single-page format.
Discuss(21 Comments)
Apr 25th, 2024 12:01 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts