• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Ryzen 7040 Series Phoenix APUs Surprisingly Performant with AVX-512 Workloads

T0@st

News Editor
Staff member
Joined
Mar 7, 2023
Messages
2,077 (4.60/day)
Location
South East, UK
Intel decided to drop the relatively new AVX-512 instruction set for laptop/mobile platforms when it was discovered that it would not work in conjunction with their E-core designs. Alder Lake was the last generation to (semi) support these sets thanks to P-cores agreeing to play nice, albeit with the efficiency side of proceedings disabled (via BIOS settings). Intel chose to fuse off AVX-512 support in production circa early 2022, with AMD picking up the slack soon after and working on the integration of AVX-512 into Zen 4 CPU architecture. The Ryzen 7040 series is the only current generation mobile platform that offers AVX-512 support. Phoronix decided to benchmark a Ryzen 7 7840U against older Intel i7-1165G7 (Tiger Lake) and i7-1065G7 (Ice Lake) SoCs in AVX-512-based workloads.

Team Red's debut foray into AVX-512 was surprisingly performant according to Phoronix's test results—the Ryzen 7 7840U did very well for itself. It outperformed the 1165G7 by 46%, and the older 1065G7 by an impressive 63%. The Ryzen 7 APU was found to attain the highest performance gain with AVX-512 enabled—a 54% performance margin over operating with AVX-512 disabled. In comparison Phoronix found that: "the i7-1165G7 Tiger Lake impact came in at 34% with these AVX-512-heavy benchmarks or 35% with the i7-1065G7 Ice Lake SoC for that generation where AVX-512 on Intel laptops became common."




Phoronix concluded: "Overall the AVX-512 usage across the AMD Zen 4 product spectrum has been great. The efficient AVX-512 usage on the mobile/laptop processors is great for those developers wanting to work and test code from their device, if wanting to use any AI / deep learning software for edge computing or related use-cases, or just enjoying other AVX-512 optimized software from CPU-based renderers to other creator software packages. Those wishing to go through this round of data I collected for Phoenix / Tiger Lake / Ice Lake can find it here with all the individual per-test metrics."

View at TechPowerUp Main Site | Source
 
Joined
Nov 15, 2021
Messages
2,751 (2.96/day)
Location
Knoxville, TN, USA
System Name Work Computer | Unfinished Computer
Processor Core i7-6700 | Ryzen 5 5600X
Motherboard Dell Q170 | Gigabyte Aorus Elite Wi-Fi
Cooling A fan? | Truly Custom Loop
Memory 4x4GB Crucial 2133 C17 | 4x8GB Corsair Vengeance RGB 3600 C26
Video Card(s) Dell Radeon R7 450 | RTX 2080 Ti FE
Storage Crucial BX500 2TB | TBD
Display(s) 3x LG QHD 32" GSM5B96 | TBD
Case Dell | Heavily Modified Phanteks P400
Power Supply Dell TFX Non-standard | EVGA BQ 650W
Mouse Monster No-Name $7 Gaming Mouse| TBD
Do the Intel processors use dual-pumped 256-bit AVX-512? I know the desktop Zen 4 uses split-consecutive 256-bit, I assume that Phoenix does as well. Or is it mostly clock speed advantage giving the benefit?
 
Joined
Nov 26, 2021
Messages
1,372 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Do the Intel processors use dual-pumped 256-bit AVX-512? I know the desktop Zen 4 uses split-consecutive 256-bit, I assume that Phoenix does as well. Or is it mostly clock speed advantage giving the benefit?
Zen 4 and Tiger Lake have similar throughput for AVX-512. It's the greater number of cores coupled with much better energy efficiency that leads to the win. The numbers look even better when you notice that the 7840U has nearly half the power draw of the 1165 G7: 15.88 W vs 28.73 W.

Type of OperationZen 4 IPCTiger Lake IPCCascade Lake IPC
256-bit FMA1.901.991.94
512-bit FMA1.000.941.82
512-bit Vector Integer Add1.781.891.94
1:1 Mixed 256-bit and 512-bit FMA1.340.941.82
2:1 Mixed 256-bit and 512-bit FMA1.500.941.82
Execution throughput for various operations (Chips and Cheese)
 

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
1,871 (0.33/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASUS ROG Strix X670E-I Gaming WiFi
Cooling ID-COOLING SE-207-XT Slim Snow
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA)
Storage 2TB Samsung 990 Pro NVMe
Display(s) AOpen Fire Legend 24" (25XV2Q), Dough Spectrum One 27" (Glossy), LG C4 42" (OLED42C4PUA)
Case ASUS Prime AP201 33L White
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000L
Mouse Logitech Pro Superlight (White), G303 Shroud Edition
Keyboard Wooting 60HE / NuPhy Air75 v2
VR HMD Occulus Quest 2 128GB
Software Windows 11 Pro 64-bit 23H2 Build 22631.3447
Do the Intel processors use dual-pumped 256-bit AVX-512? I know the desktop Zen 4 uses split-consecutive 256-bit, I assume that Phoenix does as well. Or is it mostly clock speed advantage giving the benefit?

The 13th gen (and 12th gen, if I'm not mistaken) does not, which Intel took away:

1689621507790.png


This is why if one is into certain emulation this time around, it would be best to go with AMD CPUs.
 
Joined
Nov 15, 2021
Messages
2,751 (2.96/day)
Location
Knoxville, TN, USA
System Name Work Computer | Unfinished Computer
Processor Core i7-6700 | Ryzen 5 5600X
Motherboard Dell Q170 | Gigabyte Aorus Elite Wi-Fi
Cooling A fan? | Truly Custom Loop
Memory 4x4GB Crucial 2133 C17 | 4x8GB Corsair Vengeance RGB 3600 C26
Video Card(s) Dell Radeon R7 450 | RTX 2080 Ti FE
Storage Crucial BX500 2TB | TBD
Display(s) 3x LG QHD 32" GSM5B96 | TBD
Case Dell | Heavily Modified Phanteks P400
Power Supply Dell TFX Non-standard | EVGA BQ 650W
Mouse Monster No-Name $7 Gaming Mouse| TBD
The 13th gen (and 12th gen, if I'm not mistaken) does not, which Intel took away:

View attachment 305219

This is why if one is into certain emulation this time around, it would be best to go with AMD CPUs.
Yeah, Raptor Lake never had it and only early Alder Lake samples could enable it.

I was mostly curious about the implementation as AMD was trying to save space wherever possible, and I was curious about the performance delta. Most Intel desktop processors with AVX-512 had a full-width implementation.
 
Joined
Jun 2, 2017
Messages
8,122 (3.18/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
When are we going to acknowledge that the 7000 series APUs are all that and a stick of Gum too. I am just watching the ETA Prime Video on the newest Handheld and it comes with a 120Hz 1080P screen Gaming is super sweet as well with USB 4 ports as well. I can't wait for these to come to desktop. If they can spare any.
 
Joined
Oct 6, 2021
Messages
1,527 (1.58/day)
Zen 4 and Tiger Lake have similar throughput for AVX-512. It's the greater number of cores coupled with much better energy efficiency that leads to the win. The numbers look even better when you notice that the 7840U has nearly half the power draw of the 1165 G7: 15.88 W vs 28.73 W.

Type of OperationZen 4 IPCTiger Lake IPCCascade Lake IPC
256-bit FMA1.901.991.94
512-bit FMA1.000.941.82
512-bit Vector Integer Add1.781.891.94
1:1 Mixed 256-bit and 512-bit FMA1.340.941.82
2:1 Mixed 256-bit and 512-bit FMA1.500.941.82
Execution throughput for various operations (Chips and Cheese)
Zen 4 doesn't need to lower TDP/clocks to run avx512, that's the main advantage.
 
Joined
Nov 26, 2021
Messages
1,372 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Zen 4 doesn't need to lower TDP/clocks to run avx512, that's the main advantage.
Tiger Lake also solved that for Intel, but it was their 4th attempt.
 
Joined
Feb 25, 2012
Messages
58 (0.01/day)
Do the Intel processors use dual-pumped 256-bit AVX-512? I know the desktop Zen 4 uses split-consecutive 256-bit, I assume that Phoenix does as well. Or is it mostly clock speed advantage giving the benefit?
There is one reliable explanation why Zen4 gets a solid boost from AVX-512. This is because x86 arch. x86 is a fossil trash.
x86 decoders cannot saturate execution units with operations to execute, and using SIMD with longer vector register gives a boost because it requires less commands to decode.
 
Joined
Nov 26, 2021
Messages
1,372 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
There is one reliable explanation why Zen4 gets a solid boost from AVX-512. This is because x86 arch. x86 is a fossil trash.
x86 decoders cannot saturate execution units with operations to execute, and using SIMD with longer vector register gives a boost because it requires less commands to decode.
Even though that's a very jaundiced take on x86, you're right about one thing. The idea that longer vectors are more energy efficient is correct, because the energy associated with instruction fetch and decode is halved for the same amount of work.
 

Jun

Joined
May 6, 2022
Messages
47 (0.06/day)
System Name Alpha
Processor AMD Ryzen 7 5800X3D [PBO2 tuner -30 all cores]
Motherboard GIGABYTE B550I AORUS PRO AX (rev. 1.0)
Cooling ekwb EK-AIO 240 D-RGB
Memory Trident Z Neo DDR4-3600 CL16 32GB GTZN [15-15-15-35 3800MHz@1.45V]
Video Card(s) INNO3D GEFORCE RTX 3080 TI X3 OC [2010MHz@993mV]
Storage Kingston FURY Renegade 2TB
Display(s) Samsung Odyssey G7 32” // ASUS ROG Strix XG16AHP
Case Lian Li A4-H2O
Audio Device(s) CREATIVE Sound BlasterX G6 // polk MagniFi Mini //SHURE SE846 //steelseries Arctis Nova Pro Wireless
Power Supply SilverStone SX750 Platinum V1.1
Mouse Logitech G303 SHROUD EDITION
Keyboard Logitech G915 TKL Linear
Software Microsoft Windows 11 Pro
Well glued together it seems.
 
Top