Next Generation Compute Unit
Call it GCN 5.0, or GCN 1.6, or even Next Gen GCN, it is clear that Vega builds upon the existing GCN microarchitecture with some improvements added. AMD distinguishes this by referring to their compute units as "Next Generation Compute units" or NGCUs. This is where the bulk of the magic.. err.. the engineering has happened. AMD can not just turn its back on GCN because the architecture is used in millions of consoles, which helps developers port their tech to PC in a more time- and cost-efficient way.
AMD has added support for 8-bit operations with NGCU, has retained the 16-bit floating point operations from Polaris, and continued to maintain FP32 and FP64 operation support as well. One new feature here is Rapid Packed Math wherein multiple 16-bit operations can be handled simultaneously between 32-bit operations. If a task has some complex 32-bit operations where precision is key, nothing changes. However, if your application is not demanding on precision - for example, if it is a lighting effect or change from one to another - you can use Rapid Packed Math to perform said operation as a 16-bit one, which has it take up less resources and increases performance throughput. AMD estimates a Vega NGCU to be able to handle 4-5x the number of operations per clock cycle relative to the previous CUs in Polaris. They demonstrate a usage case of Rapid Packed Math using 3DMark Serra - a custom demo created by Futuremark for AMD to show off this technology - wherein 16-bit integer and floating point operations result in as much as a 25% benefit in operation count.
AMD encourages developers to take a good look at their shaders and think where they need full 32-bit precision or why they can opt for 16-bit without losing any visual fidelity, but gaining significant performance improvements. For example, a noise-generating shader doesn't need 32-bit precision, 16-bit would be perfectly fine and still provides a value range large and differentiated enough for a decent noise effect.
Aiding in computation with Vega NGCU is added support for over 40 new ISA instructions that take advantage also of the increased IPC over Polaris. Here's the thing - some of these are very relevant to GPU mining. Need I say more on where this goes? AMD estimates a single NGCU to be able to handle as many as 512 simultaneous 8-bit operations.