Cayman, named after the lovely Cayman islands in the Caribbean, is AMD's new high-end GPU. It succeeds Cypress, on which were based Radeon HD 5800 series and the dual-GPU HD 5970. Cayman is built on existing 40 nm process at TSMC. Apart from the processor most of the components inside are the same as the ones found in the previous generation GPUs, except that the hierarchy of components is changed to add a degree of parallelism that goes a step ahead of even Barts. The SIMD cores are completely restructured, too.
With Cypress, there was only one graphics engine (that which computes preliminary data and instructions, and passes them on for low-level processing to the SIMD cores), and one dispatch processor that funneled data and instructions down to the two SIMD engine blocks. Barts introduced a degree of parallelism by giving each SIMD engine block its own dispatch processor, instruction and constant caches. Cayman is taking that a step further, by splitting even the graphics engines between the two SIMD engine blocks. This gives dedicated rasterizers, geometry assemblers to each block, but more importantly, doubles the number of tessellation units, with each graphics engine having one.
As mentioned earlier, AMD brought about a radical change in the stream processor design. Compared to the older VLIW5 design in which an SIMD core consisted of four simple and one complex stream processors with some common resources, the new design, dubbed VLIW4, combines four equally-capable complex stream processors, with two of the four getting special functions. Overall, with a stream processor count of 1536, the Radeon HD 6970 clocked at 880 MHz, is able to churn out a single-precision floating point (IEEE754-SP) performance of 2.7 TFLOPs, and double-precision performance (IEEE754-DP) of 675 GFLOPs. The VLIW4 architecture, hence is aimed to increase performance per mm² of die-area. The render back-ends, have also been redesigned to facilitate 2 times faster 16-bit integer and 32-bit floating-point operations.
In a nutshell, the Cayman die measures 389 mm², holding 2.64 billion transistors. It is built on the 40 nm TSMC process. It has 24 SIMD engines spread across two SIMD engine blocks. There are 1536 stream processors in all. There are 96 texture memory units (TMUs), and 32 raster operation processors (ROPs). New, faster memory controllers allow use of new 5.5 Gbps memory chips. The memory bus width is 256-bit, with which the GPU connects to eight 2 Gbit memory chips to archive 2 GB of total memory.