NVIDIA announced the GeForce GTX 1070 Ti performance-segment graphics card last week; today, the reviews are going live. The GTX 1070 Ti is NVIDIA's latest (and probably final) implementation of "Pascal". It's been close to 18 months since the NVIDIA "Pascal" GPU architecture made its debut with the GeForce GTX 1080, back in May 2016. It enjoyed virtually zero competition from AMD for the most part, which took another 14 months to come up with something that could compete with the GTX 1080 and GTX 1070, with the RX Vega 64 and RX Vega 56, respectively. The enthusiast-segment GTX 1080 Ti and TITAN Xp remain unchallenged. NVIDIA may have erred in differentiating the GTX 1070 from its bigger sibling.
NVIDIA's "second-best" implementations of performance-segment GPUs, such as the popular GTX 970, had often been configured too well for their price. A little effort in overclocking them would bring them up to the performance levels of their larger, better-endowed siblings. NVIDIA spaced the $399 (launch) GTX 1070 further apart from its sibling, the GTX 1080, than it spaced the GTX 970 apart from the GTX 980. The GTX 1070 lacks a quarter of the CUDA cores, TMUs, and other related components physically present on the "GP104" silicon, while the GTX 1080 maxes these out. The resulting performance gap between the two, which is around 15%-20%, may not look large (compared to the 24%-27% gap between the GTX 1080 and GTX 1080 Ti), but it was big enough for AMD to wedge into with the Radeon RX Vega 56.
The RX Vega 56 sits bang in the middle of the GTX 1070 and GTX 1080 in terms of performance. This gives buyers an attractive alternative to the GTX 1070 despite its higher price. Besides better gaming performance, the crypto-currency mining performance leadership and exotic tech, such as HBM2 memory, combine to make the RX Vega 56 look particularly appealing compared to the 18-month-old GTX 1070. NVIDIA had to do something ahead of the crucial Holiday shopping season. Enter the GTX 1070 Ti.
The GeForce GTX 1070 Ti is designed to fill the performance gap between the GTX 1070 and GTX 1080, but by being closer to the GTX 1080 than just halfway. This is probably needed for it to outperform the RX Vega 56. While the GTX 1070 lacks a quarter of the 20 "Pascal" streaming multiprocessors (each worth 128 CUDA cores), the GTX 1070 Ti lacks just one. This takes its CUDA core count all the way up to 2,432, which is just 128 fewer than the 2,560 of the GTX 1080, a staggering 512 more than the 1,920 of the GTX 1070.
Not to make the GTX 1070 Ti "too good," NVIDIA carried over the memory setup of the GTX 1070. You get 8 GB of older GDDR5 memory ticking at 8.00 GHz, which churns out 256 GB/s of memory bandwidth; in contrast to the newer 10 GHz GDDR5X memory on the GTX 1080 (320 GB/s) and faster 11 GHz memory on the GTX 1080 refresh (352 GB/s). The clock speeds are another interesting mix. The GTX 1070 Ti has the base clock of the GTX 1080, but the boost clock of the GTX 1070. So the GPU Boost multipliers are rather restrained. These, coupled with the inherently better energy efficiency of the "Pascal" architecture compared to AMD "Vega," make for an interesting answer by NVIDIA to AMD's latest challenge.
In this review, we are testing the reference-design GeForce GTX 1070 Ti Founders Edition card by NVIDIA, which sticks to reference-clock speeds of 1607 MHz core, 1683 MHz GPU Boost, and 8.00 GHz (GDDR5-effective) memory.
GeForce GTX 1070 Ti Market Segment Analysis
GTX 980 Ti
6 GB, GDDR5, 384-bit
R9 Fury X
4 GB, HBM, 4096-bit
8 GB, GDDR5, 256-bit
RX Vega 56
8 GB, HBM2, 2048-bit
GTX 1070 Ti
8 GB, GDDR5, 256-bit
8 GB, GDDR5X, 256-bit
RX Vega 64
8 GB, HBM2, 2048-bit
GTX 1080 Ti
11 GB, GDDR5X, 352-bit
The GeForce GTX 1070 Ti is based on NVIDIA's GP104, using the "Pascal" architecture. The biggest GPU is the GP100 driving the Tesla P100 HPC processor. The GP104 succeeds the GM204 (GTX 980, GTX 970), and despite having a smaller die at 314 mm² when compared to the 398 mm² of the GM204, it does feature a significantly higher transistor count at 7.2 billion when compared to the 5.2 billion of the GM204. This is due to NVIDIA's big move to the 16 nm FinFET process.
With each successive architecture since "Fermi," NVIDIA has been enriching the streaming multiprocessor (SM) by adding more dedicated resources and reducing shared resources within the graphics processing cluster (GPC), which leads to big performance gains. The story continues with "Pascal." Like the GM204 before it, the GP104 features four GPCs, super-specialized subunits of the GPU that share the PCI-Express 3.0 x16 host interface and the 256-bit GDDR5X memory interface through eight controllers. These controllers support both GDDR5X and GDDR5 memory.
Workload across the four GPCs is shared by the GigaThread Engine cushioned by 2 MB of cache. Each GPC holds five streaming multiprocessors (SMs), which is an increase from the four SMs each GPC held on the GM204. On the GTX 1070 Ti, one of these 20 SMs is disabled. The GPC shares a raster engine between these five SMs. The "Pascal" streaming multiprocessor features a 4th generation PolyMorph Engine, a component for key render setup operations. With "Pascal," the PolyMorph Engine includes specialized hardware for the new Simultaneous MultiProjection feature. Each SM also holds a block of eight TMUs.
Each SM continues to feature 128 CUDA cores. With the GTX 1070 Ti featuring 19/20 SMs, it hence features a total of 2,432 CUDA cores. Other vital specifications include 152 TMUs and 64 ROPs. NVIDIA claims to have worked on a new GPU internal circuit design and board channel paths to facilitate significantly higher clock speeds than what the GM204 is capable of. The GeForce GTX 1070 Ti ships with a staggering 1607 MHz GPU clock speed for a maximum GPU Boost frequency of 1683 MHz.
As we mentioned earlier, the GeForce GTX 1070 Ti carries over the memory subsystem of the GTX 1070 untouched. It features 8 GB of GDDR5 memory across the chip's 256-bit wide memory interface, clocked at 8.00 GHz (GDDR5-effective), which works out to a memory bandwidth of 256 GB/s. This is significantly lower than the 320 GB/s of the GTX 1080 and the 352 GB/s of the GTX 1080 refresh featuring 11 Gbps GDDR5X memory.
The "Pascal" architecture supports Asynchronous Compute as standardized by Microsoft. It adds to that with its own variation of the concept with "Dynamic Load Balancing."
The New Age of Multi-GPU
With Microsoft DirectX 12 introducing a standardized mixed multi-GPU mode in which a DirectX 12 (and above) 3D app can take advantage of any number and types of GPUs as long as they support the API features needed by the app, multi-GPU has changed forever. Instead of steering its GPU lineup toward that future, NVIDIA has spent some R&D on its proprietary SLI technology. With increasing resolutions and refresh rates straining the bandwidth of display connectors and inter-GPU communication in multi-GPU modes, NVIDIA decided that SLI needs added bandwidth. One way it saw to doing so was to task both SLI contact points on the graphics card in a 2-way configuration. Enter the SLI HB (high-bandwidth) bridge, a rigid SLI bridge that comes in 1U, 2U, and 3U slot spacings for a link between two GeForce "Pascal" graphics cards along both their SLI "fingers" (contact points). This allows a SLI duo to more reliably render at such resolutions as 4K at 60 Hz or 120 Hz, and 5K, or HDR-enabled resolutions. SLI could still work with a classic 2-way bridge at any resolution, but that could adversely affect performance upscaling, and the output won't be as smooth as with an SLI HB bridge. This also appears to be why NVIDIA discontinued official support for 3-way and 4-way SLI.
The GTX 1070 Ti still supports 3-way and 4-way SLI over the classic bridges that come with motherboards, but only in a few benchmarks. NVIDIA's regular driver updates only optimize for 2-way SLI. NVIDIA "Pascal" GPUs do support Microsoft DirectX 12's multi-display adapter (MDA) mode, but NVIDIA will not provide game-specific optimizations through its driver updates for MDA. That is the game developer's responsibility. The same applies to "explicit" LDA (linked display adapter).
New Display Connectors
The "Pascal" architecture features DisplayPort 1.4 even though it's only certified for up to DisplayPort 1.2. You can enjoy all the features of DisplayPort 1.3 and 1.4 just fine, such as HDR metadata transport. The GPU also supports HDMI 2.0b, the latest HDMI standard with support for HDR video. In the entire course of its presentation, NVIDIA did not mention whether "Pascal" supports VESA AdaptiveSync, which AMD is co-branding as FreeSync. All you need for it to work is a GPU that supports HDMI 2.0a or DisplayPort 1.2a (which are both satisfied by NVIDIA supporting HDMI 2.0b and DisplayPort 1.4). All that's needed is support on the driver's side. The GeForce GTX 1070 Ti features an HDMI 2.0b, a dual-link DVI-D, and three DisplayPort 1.4 connectors. The DVI connector lacks analog wiring, and, thus, the GTX 1070 Ti lacks support for D-Sub monitors through dongles.
With each new architecture over the past three generations, NVIDIA toyed with display sync. With "Kepler," it introduced Adaptive V-Sync, and by the time "Maxwell" came along, you had G-SYNC, and with "Pascal," the company is introducing a new feature called Fast Sync. NVIDIA states Fast Sync to be a low-latency alternative to V-Sync that eliminates frame-tearing (normally caused because the GPU's output frame-rate is above the display's refresh-rate) while letting the GPU render unrestrained from V-Sync, which reduces input latency. This works by decoupling the display pipelines and render output, which makes temporarily storing excessive frames that have been rendered in the frame buffer possible. The result is an experience with low input-lag (from V-Sync "off") and no frame-tearing (from V-Sync "on"). You will be able to enable Fast Sync for a 3D app by editing its profile in NVIDIA Control Panel; simply force Vertical Sync mode to "Fast."