Monday, July 17th 2023

AMD Ryzen 7040 Series Phoenix APUs Surprisingly Performant with AVX-512 Workloads

Intel decided to drop the relatively new AVX-512 instruction set for laptop/mobile platforms when it was discovered that it would not work in conjunction with their E-core designs. Alder Lake was the last generation to (semi) support these sets thanks to P-cores agreeing to play nice, albeit with the efficiency side of proceedings disabled (via BIOS settings). Intel chose to fuse off AVX-512 support in production circa early 2022, with AMD picking up the slack soon after and working on the integration of AVX-512 into Zen 4 CPU architecture. The Ryzen 7040 series is the only current generation mobile platform that offers AVX-512 support. Phoronix decided to benchmark a Ryzen 7 7840U against older Intel i7-1165G7 (Tiger Lake) and i7-1065G7 (Ice Lake) SoCs in AVX-512-based workloads.

Team Red's debut foray into AVX-512 was surprisingly performant according to Phoronix's test results—the Ryzen 7 7840U did very well for itself. It outperformed the 1165G7 by 46%, and the older 1065G7 by an impressive 63%. The Ryzen 7 APU was found to attain the highest performance gain with AVX-512 enabled—a 54% performance margin over operating with AVX-512 disabled. In comparison Phoronix found that: "the i7-1165G7 Tiger Lake impact came in at 34% with these AVX-512-heavy benchmarks or 35% with the i7-1065G7 Ice Lake SoC for that generation where AVX-512 on Intel laptops became common."
Phoronix concluded: "Overall the AVX-512 usage across the AMD Zen 4 product spectrum has been great. The efficient AVX-512 usage on the mobile/laptop processors is great for those developers wanting to work and test code from their device, if wanting to use any AI / deep learning software for edge computing or related use-cases, or just enjoying other AVX-512 optimized software from CPU-based renderers to other creator software packages. Those wishing to go through this round of data I collected for Phoenix / Tiger Lake / Ice Lake can find it here with all the individual per-test metrics."
Sources: Tom's Hardware, Phoronix AMD Ryzen 7040 Series Review
Add your own comment

10 Comments on AMD Ryzen 7040 Series Phoenix APUs Surprisingly Performant with AVX-512 Workloads

#1
Count von Schwalbe
Do the Intel processors use dual-pumped 256-bit AVX-512? I know the desktop Zen 4 uses split-consecutive 256-bit, I assume that Phoenix does as well. Or is it mostly clock speed advantage giving the benefit?
Posted on Reply
#2
AnotherReader
Count von SchwalbeDo the Intel processors use dual-pumped 256-bit AVX-512? I know the desktop Zen 4 uses split-consecutive 256-bit, I assume that Phoenix does as well. Or is it mostly clock speed advantage giving the benefit?
Zen 4 and Tiger Lake have similar throughput for AVX-512. It's the greater number of cores coupled with much better energy efficiency that leads to the win. The numbers look even better when you notice that the 7840U has nearly half the power draw of the 1165 G7: 15.88 W vs 28.73 W.

Type of OperationZen 4 IPCTiger Lake IPCCascade Lake IPC
256-bit FMA1.901.991.94
512-bit FMA1.000.941.82
512-bit Vector Integer Add1.781.891.94
1:1 Mixed 256-bit and 512-bit FMA1.340.941.82
2:1 Mixed 256-bit and 512-bit FMA1.500.941.82

Execution throughput for various operations (Chips and Cheese)
Posted on Reply
#3
Cheeseball
Not a Potato
Count von SchwalbeDo the Intel processors use dual-pumped 256-bit AVX-512? I know the desktop Zen 4 uses split-consecutive 256-bit, I assume that Phoenix does as well. Or is it mostly clock speed advantage giving the benefit?
The 13th gen (and 12th gen, if I'm not mistaken) does not, which Intel took away:



This is why if one is into certain emulation this time around, it would be best to go with AMD CPUs.
Posted on Reply
#4
Count von Schwalbe
CheeseballThe 13th gen (and 12th gen, if I'm not mistaken) does not, which Intel took away:



This is why if one is into certain emulation this time around, it would be best to go with AMD CPUs.
Yeah, Raptor Lake never had it and only early Alder Lake samples could enable it.

I was mostly curious about the implementation as AMD was trying to save space wherever possible, and I was curious about the performance delta. Most Intel desktop processors with AVX-512 had a full-width implementation.
Posted on Reply
#5
kapone32
When are we going to acknowledge that the 7000 series APUs are all that and a stick of Gum too. I am just watching the ETA Prime Video on the newest Handheld and it comes with a 120Hz 1080P screen Gaming is super sweet as well with USB 4 ports as well. I can't wait for these to come to desktop. If they can spare any.
Posted on Reply
#6
Denver
AnotherReaderZen 4 and Tiger Lake have similar throughput for AVX-512. It's the greater number of cores coupled with much better energy efficiency that leads to the win. The numbers look even better when you notice that the 7840U has nearly half the power draw of the 1165 G7: 15.88 W vs 28.73 W.

Type of OperationZen 4 IPCTiger Lake IPCCascade Lake IPC
256-bit FMA1.901.991.94
512-bit FMA1.000.941.82
512-bit Vector Integer Add1.781.891.94
1:1 Mixed 256-bit and 512-bit FMA1.340.941.82
2:1 Mixed 256-bit and 512-bit FMA1.500.941.82

Execution throughput for various operations (Chips and Cheese)
Zen 4 doesn't need to lower TDP/clocks to run avx512, that's the main advantage.
Posted on Reply
#7
AnotherReader
DenverZen 4 doesn't need to lower TDP/clocks to run avx512, that's the main advantage.
Tiger Lake also solved that for Intel, but it was their 4th attempt.
Posted on Reply
#8
AlB80
Count von SchwalbeDo the Intel processors use dual-pumped 256-bit AVX-512? I know the desktop Zen 4 uses split-consecutive 256-bit, I assume that Phoenix does as well. Or is it mostly clock speed advantage giving the benefit?
There is one reliable explanation why Zen4 gets a solid boost from AVX-512. This is because x86 arch. x86 is a fossil trash.
x86 decoders cannot saturate execution units with operations to execute, and using SIMD with longer vector register gives a boost because it requires less commands to decode.
Posted on Reply
#9
AnotherReader
AlB80There is one reliable explanation why Zen4 gets a solid boost from AVX-512. This is because x86 arch. x86 is a fossil trash.
x86 decoders cannot saturate execution units with operations to execute, and using SIMD with longer vector register gives a boost because it requires less commands to decode.
Even though that's a very jaundiced take on x86, you're right about one thing. The idea that longer vectors are more energy efficient is correct, because the energy associated with instruction fetch and decode is halved for the same amount of work.
Posted on Reply
#10
Jun
Well glued together it seems.
Posted on Reply
May 16th, 2024 04:42 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts