ArchitectureThe GeForce GTX 1070 Ti is based on NVIDIA's GP104, using the Pascal architecture. The biggest GPU is the GP100 driving the Tesla P100 HPC processor. The GP104 succeeds the GM204 (GTX 980 and GTX 970), and despite having a smaller die at 314 mm² when compared to the 398 mm² of the GM204, it does feature a significantly higher transistor count at 7.2 billion when compared to the 5.2 billion of the GM204. This is due to NVIDIA's big move to the 16 nm FinFET process.
With each successive architecture since "Fermi," NVIDIA has been enriching the streaming multiprocessor (SM) by adding more dedicated resources and reducing shared resources within the graphics processing cluster (GPC), which leads to big performance gains. The story continues with "Pascal." Like the GM204 before it, the GP104 features four GPCs, super-specialized subunits of the GPU that share the PCI-Express 3.0 x16 host interface and the 256-bit GDDR5X memory interface through eight controllers. These controllers support both GDDR5X and GDDR5 memory.
Workload across the four GPCs is shared by the GigaThread Engine cushioned by 2 MB of cache. Each GPC holds five streaming multiprocessors (SMs), which is an increase from the four SMs each GPC held on the GM204. On the GTX 1070 Ti, one of these 20 SMs is disabled. The GPC shares a raster engine between these five SMs. The "Pascal" streaming multiprocessor features a 4th generation PolyMorph Engine, a component for key render setup operations. With "Pascal," the PolyMorph Engine includes specialized hardware for the new Simultaneous MultiProjection feature. Each SM also holds a block of eight TMUs.
Each SM continues to feature 128 CUDA cores. With the GTX 1070 Ti featuring 19/20 SMs, it hence features a total of 2,432 CUDA cores. Other vital specifications include 152 TMUs and 64 ROPs. NVIDIA claims to have worked on a new GPU internal circuit design and board channel paths to facilitate significantly higher clock speeds than what the GM204 is capable of. The GeForce GTX 1070 Ti ships with a staggering 1607 MHz GPU clock speed for a maximum GPU Boost frequency of 1683 MHz.
As we mentioned earlier, the GeForce GTX 1070 Ti carries over the memory subsystem of the GTX 1070 untouched. It features 8 GB of GDDR5 memory across the chip's 256-bit wide memory interface, clocked at 8.00 GHz (GDDR5-effective), which works out to a memory bandwidth of 256 GB/s. This is significantly lower than the 320 GB/s of the GTX 1080 and the 352 GB/s of the GTX 1080 refresh featuring 11 Gbps GDDR5X memory.
The "Pascal" architecture supports Asynchronous Compute as standardized by Microsoft. It adds to that with its own variation of the concept with "Dynamic Load Balancing."