AMD today launched its 2nd generation Ryzen Threadripper 2000 family of high-end desktop (HEDT) processors. When AMD realized in 2017 that it had a truly competitive CPU architecture on its hands with "Zen," it scampered to create client-segment variants of its enterprise EPYC multi-chip modules: the Ryzen Threadripper product family. A possible motivation for that could have been that even if AMD established a product leadership in the mainstream-desktop segment, Intel could get away with having the overall fastest client-segment processors thanks to its Core X HEDT family, which could influence sales of cheaper processors.
Some in the tech industry expected the first Threadripper series (with just 3 SKUs) to be a one-off, just as many had written off "Zen" itself as a one-trick pony. Our expectations from the 12 nm "Pinnacle Ridge" silicon were limited to better thermals being traded in for higher clocks. We were pleasantly surprised when the refined "Zen+" architecture not only met those expectations, but exceeded them with 3–5 percent IPC increments and tangible improvements made to the multi-core boosting algorithms, which translated to a restoration of competitiveness to Intel's then newly launched "Coffee Lake." It was only a matter of time until AMD would use this silicon to build newer Threadrippers.
Until now, Intel has had the upper hand in HEDT processor core counts. By tapping into its "Skylake-X" HCC (high core-count) silicon, Intel launched 12-core, 14-core, 16-core, and 18-core LGA2066 processors. The 14-thru-18 core SKUs beat the first-generation Threadrippers in performance owing to higher IPC and lower latencies thanks to the monolithic die design. AMD priced its 12-core and 16-core first-gen Threadrippers competitively to Intel's 8-core and 10-core SKUs, exceeding them on price-performance. This meant leaving the $1000-$2000 market uncontested, for which Intel had already built a use-case (prosumers who need a lot of multi-threaded performance and don't want to shell out a lot of money on workstations with 2P Xeons). And thus, we have new 24-core and 32-core Threadripper 2000WX parts from AMD.
In this review, we take a look at the 16-core Ryzen Threadripper 2950X. Much like the flagship Threadripper 1950X from last year that it succeeds, the TR 2950X achieves 16 cores by being a multi-chip module of two 8-core dies, which are 12 nm "Pinnacle Ridge" in this case. You get all of the new "Zen+" micro-architecture features and higher clock speeds. AMD is also launching this chip at an SEP of $899, which is $100 cheaper than what the 1950X launched at. Since the TR 1950X was already trading blows with Intel's Core i9-7900X and i9-7920X, the introduction of this chip at its price will only mount pressure on Intel to lower its prices.
We present four performance data sets for the Ryzen Threadripper 2950X in this review: stock, manual overclock to 4.15 GHz, Precision Boost Overdrive set to max, and PBO at max with Local Memory Access Mode enabled.
|Price||Cores / |
|Core i5-8600K||$259||6 / 6||3.6 GHz||4.3 GHz||9 MB||95 W||Coffee Lake||14 nm||LGA 1151|
|Ryzen 5 2600X||$229||6 / 12||3.6 GHz||4.2 GHz||16 MB||95 W||Zen+||12 nm||AM4|
|Ryzen 7 2700X||$329||8 / 16||3.7 GHz||4.3 GHz||16 MB||105 W||Zen+||12 nm||AM4|
|Core i7-8700K||$359||6 / 12||3.7 GHz||4.7 GHz||12 MB||95 W||Coffee Lake||14 nm||LGA 1151|
|Core i9-7900X||$900||10 / 20||3.3 GHz||4.3 GHz||13.75 MB||140 W||Skylake-X||14 nm||LGA 2066|
|Ryzen Threadripper 2950X||$900||16 / 32||3.5 GHz||4.4 GHz||32 MB||180 W||Zen+||12 nm||TR4|
A Closer Look
AMD tends to go overboard with packaging of its premium processors, and the Ryzen Threadripper 2950X is no exception. This chip comes in a lunchbox-sized hard case with paperboard frills that show off the huge processor inside. You get great views of the front and back of the processor.
There's no cooler included with the processor. You use your own TR4 or SP3r2-compatible cooler that can handle thermal loads of at least 180 W. Two very important accessories are part of the package: a screwdriver for the double-square socket screws that hold the TR4 retention brace in place, and an adapter that lets you use Asetek-made round AIO pump-blocks. Don't lose the screwdriver because unlike Intel LGA sockets, the only way you can open the TR4 socket is by undoing those socket screws. This tool has also been calibrated for the ideal screw tension of the socket, so simply keep turning it until it clicks.
The Ryzen Threadripper 2950X is huge! When viewed from the top, the package is as big as a credit-card. Thank goodness AMD decided to make this package an LGA, or good luck trying to find a bent pin in a 4,094-pin PGA.
As you can see, the orange plastic bracket is needed to mount the processor into the TR4 socket. It works to increase the surface area of the indented portion of the IHS, so the metal retention brace can hold the processor in place. It's a critical component and not packaging material, so don't discard it. You'll also notice that only screws hold the brace down; there's no lever-hinge mechanism like on Intel sockets.
The Threadripper ConceptRyzen Threadripper 2950X is a multi-chip module of two 8-core, 12 nm "Pinnacle Ridge" dies, each of which controls two DDR4 memory channels, and 32 PCIe lanes; for a combined quad-channel DDR4 memory interface, and 64-lane PCIe. Perhaps a coarse way of describing the 2950X would be "2P-on-a-stick." Under the IHS, the die closest to the key corner, and the die diagonally-opposite to it, are active. The other two dies are "dummies." These are either just blank slabs of silicon, or dies that have completely failed validation. They're just there to support the IHS and distribute mounting pressure.
Infinity Fabric is AMD's new high-bandwidth interconnect introduced alongside "Zen." It connects not just the two quad-core CCX chiplets on the "Pinnacle Ridge" die, but also handles inter-die communication. On the 2-die Threadripper 2950X, there's one Infinity Fabric link between the two active dies, with bi-directional bandwidth of 50 GB/s, when running at 1600 MHz (the actual DRAM frequency). So if you run faster or slower memory, Infinity Fabric's bandwidth will scale accordingly. It takes around 105 nanoseconds (ns) for a CPU core to access memory controlled by the neighboring die, and less than 65 ns to access memory controlled by its own die.
Unlike Core X processors that are built with four memory channels wired to a single die, Threadrippers have two dual-channel interfaces making up quad-channel. It is possible for an application to spread its memory across all four channels for higher bandwidth memory access, but at higher latency. Less parallelized applications such as PC games (which still haven't managed to need >16 GB of memory), can benefit from lower latency. AMD figured out a way to give users and their operating-systems control over how to allocate memory, thanks to UMA and NUMA.
To that end, there are several selectable user modes through Ryzen Master, which reconfigure the processor on the fly (reboot required). Memory access mode can be toggled between "Distributed Mode" (default) and "Local Mode". Distributed maximizes memory bandwidth to applications and tries to keep latencies constant (but higher), no matter which core the software is running on. Local mode, on the other hand, splits the system into two NUMA nodes (think "processor groups"), which allows Windows to know which cores have the memory interface attached to them for it to put the loads on those cores first, to run them with lower memory latency. The second processor group has higher memory latency, which results in applications on those cores running slower. This mode can be useful for low-threaded application and games. Our performance results have an additional data set for "Local Mode" enabled.
A third configuration option is "Legacy Compatibility Mode", which lets you adjust the exposed processor count. Some older games have difficulty running on systems with more than 16 cores and will crash right at the start. Using that option, you can reduce the core count of Threadripper.
Unlike the socket AM4 Ryzen chips, Threadrippers have an unchanged memory controller configuration from AMD's EPYC enterprise processors. The Ryzen Threadripper 2950X supports up to 2 TB of quad-channel memory with ECC support (something like that is restricted to the Xeon brand in the Intel platform). Then again, we doubt HEDT users are going to need more than the 128 GB of memory Core X processors support.
The PCI-Express configuration is interesting. The MCM puts out a total of 64 PCIe gen 3.0 lanes. On a typical motherboard, these lanes are wired out as two PCI-Express 3.0 x16 slots that run at x16 bandwidth all the time, two additional x16 slots that run at x8 bandwidth all the time (without eating into the bandwidth of another slot), three M.2-NVMe slots with x4 bandwidth, each, and the remaining 4 lanes serving as chipset bus.
The Zen+ Architecture
Each of the two dies in the Threadripper 2950X MCM is made out of the new 12 nm "Pinnacle Ridge" silicon by AMD. This chip is based on the new "Zen+" micro-architecture in which the "+" denotes refinement rather than a major architectural change.
AMD summarizes the "+" in "Zen+" as the coming together of the new 12 nm process that enables higher clock speeds, an updated SenseMI feature set, the updated Precision Boost algorithm that sustains boost clocks better under stress, and physical improvements to the cache and memory sub-systems, which add up to an IPC uplift of 3 percent (clock-for-clock) over the first-generation "Zen."
The biggest change of "Pinnacle Ridge" remains its process node. The switch to 12 nm resulted in a 50 mV reduction in Vcore voltage at any given clock speed, enabling AMD to increase clocks by around 0.25 GHz across the board. The switch also enables all-core overclocks well above the 4 GHz mark, to around 4.20 GHz. Last but not least, this increase in power efficiency enabled AMD to release the 32-core Threadripper 2990WX, which wasn't feasible before.
AMD also deployed faster cache SRAM and refined the memory controllers to bring down latencies significantly. L3 cache latency is 16 percent lower, L2 cache latency is a staggering 34 percent lower, L1 latencies are reduced by 13 percent, and DRAM (memory) latencies by 11 percent. This is where almost all of the IPC uplift comes from. AMD also increased the maximum memory clocks. The processor now supports up to DDR4-2933 (JEDEC).
Updates to the chip's on-die SenseMI logic include Precision Boost 2 and Extended Frequency Range (XFR) 2. Precision Boost 2 now switches from arbitrary 2-core and all-core boost targets to a perpetually all-core boosting algorithm that elevates the most stressed cores to the highest boost states, in a linear fashion (i.e. boost frequency increases with load). Every core is running above the nominal clock when the processor isn't idling, which contributes to a multi-core performance uplift. Besides load, the algorithm takes into account temperature, current, and Vcore. Granularity is 0.25X base clock (25 MHz).
Extended Frequency Range 2 (XFR 2) builds on the success of XFR with a new all-core uplift beyond the maximum boost clock. If your cooling is good enough (60°C), XFR will now elevate all cores beyond the boost state as opposed to just the best few cores. AMD claims that with the most ideal cooling, XFR 2.0 will give you a staggering 7 percent performance uplift without any manual overclocking on your part.
Our Patreon Silver Supporters can read articles in single-page format.