Monday, February 12th 2024
AMD Zen 5 Details Emerge with GCC "Znver5" Patch: New AVX Instructions, Larger Pipelines
AMD's upcoming family of Ryzen 9000 series of processors on the AM5 platform will carry a new silicon SKU under the hood—Zen 5. The latest revision of AMD's x86-64 microarchitecture will feature a few interesting improvements over its current Zen 4 that it is replacing, targeting the rumored 10-15% IPC improvement. Thanks to the latest set of patches for GNU Compiler Collection (GCC), we have the patch set that proposes changes taking place with "znver5" enablement. One of the most interesting additions to the Zen 5 over the previous Zen 4 is the expansion of the AVX instruction set, mainly new AVX and AVX-512 instructions: AVX-VNNI, MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI.
AVX-VNNI is a 256-bit vector version of the AVX-512 VNNI instruction set that accelerates neural network inferencing workloads. AVX-VNNI delivers the same VNNI instruction set for CPUs that support 256-bit vectors but lack full 512-bit AVX-512 capabilities. AVX-VNNI effectively extends useful VNNI instructions for AI acceleration down to 256-bit vectors, making the technology more efficient. While narrow in scope (no opmasking and extra vector register access compared to AVX-512 VNNI), AVX-VNNI is crucial in spreading VNNI inferencing speedups to real-world CPUs and applications. The new AVX-512 VP2INTERSECT instruction is also making it in Zen 5, as noted above, which has been present only in Intel Tiger Lake processor generation, and is now considered deprecated for Intel SKUs. We don't know the rationale behind this inclusion, but AMD sure had a use case for it.Next, we have a larger pipeline design. The Zen 5 integer unit has six ALUs compared to the four found in Zen 4. The Address Generation Unit (AGU) count is also higher, going from three to four. The floating point store pipelines are now doubled, and they are 256-bit each to handle a 512-bit floating point store from a single cycle. Some other instructions like cmov/setcc and floating point shuffles can now be handled by all ALUs in Zen 5, whereas in Zen 4, it was handled only by two ALUs. Apparently, the Zen 5 uArch is now handling most of the AVX-512 operations as a single slot pipeline cycle, rather than the old double pumping, which halved AVX-512 instructions into two 256-bit ones for processing on the 256-bit wide ALUs. Lastly, the patch notes that, once again, there will be no difference between Zen 5 and Zen 5c cores ISA-wise, same with Zen 4 and Zen 4c cores, where the latter only implemented smaller caches.
Sources:
Phoronix, AnandTech Forums
AVX-VNNI is a 256-bit vector version of the AVX-512 VNNI instruction set that accelerates neural network inferencing workloads. AVX-VNNI delivers the same VNNI instruction set for CPUs that support 256-bit vectors but lack full 512-bit AVX-512 capabilities. AVX-VNNI effectively extends useful VNNI instructions for AI acceleration down to 256-bit vectors, making the technology more efficient. While narrow in scope (no opmasking and extra vector register access compared to AVX-512 VNNI), AVX-VNNI is crucial in spreading VNNI inferencing speedups to real-world CPUs and applications. The new AVX-512 VP2INTERSECT instruction is also making it in Zen 5, as noted above, which has been present only in Intel Tiger Lake processor generation, and is now considered deprecated for Intel SKUs. We don't know the rationale behind this inclusion, but AMD sure had a use case for it.Next, we have a larger pipeline design. The Zen 5 integer unit has six ALUs compared to the four found in Zen 4. The Address Generation Unit (AGU) count is also higher, going from three to four. The floating point store pipelines are now doubled, and they are 256-bit each to handle a 512-bit floating point store from a single cycle. Some other instructions like cmov/setcc and floating point shuffles can now be handled by all ALUs in Zen 5, whereas in Zen 4, it was handled only by two ALUs. Apparently, the Zen 5 uArch is now handling most of the AVX-512 operations as a single slot pipeline cycle, rather than the old double pumping, which halved AVX-512 instructions into two 256-bit ones for processing on the 256-bit wide ALUs. Lastly, the patch notes that, once again, there will be no difference between Zen 5 and Zen 5c cores ISA-wise, same with Zen 4 and Zen 4c cores, where the latter only implemented smaller caches.
29 Comments on AMD Zen 5 Details Emerge with GCC "Znver5" Patch: New AVX Instructions, Larger Pipelines
According to official Intel documentation it is supported only on Tiger Lake - Intel® Architecture Instruction Set Extensions and Future Features Programming Reference September 2023, page 18:
If what I have heard is correct, then with somewhat of a breakdown of IPC increases, does that 10-15% total IPC increase include that massive AVX/512 increase.? If so, what proportion of the total even uses AVX/512, if it is for example half of the entire test then the rest of the expected IPC increase is 5-7.5%, this is unrealistic and I am just using this as an example.
A large part of AMD's push for a massive AVX/512 push is to keep ahead of intel in the server and workstation markets, and to bring this to all of it's users before intel eventually fights back with it's upcoming AVX stuff, which FYI I have heard is already a total mess as it will be rolled out over a few years and not all products will get everything at the end of it, yet more of intel and it's delightful artificial product segmentation.! I have watched several MLID videos talking about Zen 5, he has repeatedly said 20% at the absolute most, but they are targetting 50% in AVX/512.