Monday, February 12th 2024

AMD Zen 5 Details Emerge with GCC "Znver5" Patch: New AVX Instructions, Larger Pipelines

AMD's upcoming family of Ryzen 9000 series of processors on the AM5 platform will carry a new silicon SKU under the hood—Zen 5. The latest revision of AMD's x86-64 microarchitecture will feature a few interesting improvements over its current Zen 4 that it is replacing, targeting the rumored 10-15% IPC improvement. Thanks to the latest set of patches for GNU Compiler Collection (GCC), we have the patch set that proposes changes taking place with "znver5" enablement. One of the most interesting additions to the Zen 5 over the previous Zen 4 is the expansion of the AVX instruction set, mainly new AVX and AVX-512 instructions: AVX-VNNI, MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI.

AVX-VNNI is a 256-bit vector version of the AVX-512 VNNI instruction set that accelerates neural network inferencing workloads. AVX-VNNI delivers the same VNNI instruction set for CPUs that support 256-bit vectors but lack full 512-bit AVX-512 capabilities. AVX-VNNI effectively extends useful VNNI instructions for AI acceleration down to 256-bit vectors, making the technology more efficient. While narrow in scope (no opmasking and extra vector register access compared to AVX-512 VNNI), AVX-VNNI is crucial in spreading VNNI inferencing speedups to real-world CPUs and applications. The new AVX-512 VP2INTERSECT instruction is also making it in Zen 5, as noted above, which has been present only in Intel Tiger Lake processor generation, and is now considered deprecated for Intel SKUs. We don't know the rationale behind this inclusion, but AMD sure had a use case for it.
Next, we have a larger pipeline design. The Zen 5 integer unit has six ALUs compared to the four found in Zen 4. The Address Generation Unit (AGU) count is also higher, going from three to four. The floating point store pipelines are now doubled, and they are 256-bit each to handle a 512-bit floating point store from a single cycle. Some other instructions like cmov/setcc and floating point shuffles can now be handled by all ALUs in Zen 5, whereas in Zen 4, it was handled only by two ALUs. Apparently, the Zen 5 uArch is now handling most of the AVX-512 operations as a single slot pipeline cycle, rather than the old double pumping, which halved AVX-512 instructions into two 256-bit ones for processing on the 256-bit wide ALUs. Lastly, the patch notes that, once again, there will be no difference between Zen 5 and Zen 5c cores ISA-wise, same with Zen 4 and Zen 4c cores, where the latter only implemented smaller caches.
Show 29 Comments