• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Readying Feature-enriched ROCm 6.1

T0@st

News Editor
Staff member
Joined
Mar 7, 2023
Messages
2,077 (4.74/day)
Location
South East, UK
The latest version of AMD's open-source GPU compute stack, ROCm, is due for launch soon according to a Phoronix article—chief author, Michael Larabel, has been poring over Team Red's public GitHub repositories over the past couple of days. AMD ROCm version 6.0 was released last December—bringing official support for the AMD Instinct MI300A/MI300X, alongside PyTorch improvements, expanded AI libraries, and many other upgrades and optimizations. The v6.0 milestone placed Team Red in a more competitive position next to NVIDIA's very mature CUDA software layer. A mid-February 2024 update added support for Radeon PRO W7800 and RX 7900 GRE GPUs, as well as ONNX Runtime.

Larabel believes that "ROCm 6.1" is in for an imminent release, given his tracking of increased activity on publicly visible developer platforms: "For MIPOpen 3.1 with ROCm 6.1 there's been many additions including new solvers, an AI-based parameter prediction model for the conv_hip_igemm_group_fwd_xdlops solver, numerous fixes, and other updates. AMD MIGraphX will see an important update with ROCm 6.1. For the next ROCm release, MIGraphX 2.9 brings FP8 support, support for more operators, documentation examples for Whisper / Llama-2 / Stable Diffusion 2.1, new ONNX examples, BLAS auto-tuning for GEMMs, and initial code for MIGraphX running on Microsoft Windows." The change-logs/documentation updates also point to several HIPIFY for ROCm 6.1 improvements—including the addition of CUDA 12.3.2 support.



View at TechPowerUp Main Site | Source
 
Joined
Jan 24, 2023
Messages
65 (0.14/day)
As someone with an AMD GPU (7800XT) that has recently dabbled in stable diffusion, if you care about AI, do yourself a favor get an nvidia. Most of the software is developed/tested with nvidia cards and if you go the AMD route, you're going to be spending hours upon hours troubleshooting and researching arguments to add to have basic functionality run, let alone run fast. For example, say you want to run stable-diffusion-webui, you can't use AUTOMATIC1111, you have to use a fork that uses directml. You want to use ROCm? Well, not all AMD cards are supported. You want to use onnx to optimize the models and have them run faster? Good luck debugging the string of errors you'll get from it. AMD is also not great at offering guides for how to run things properly and when they do, they're quickly outdated and you have to rely on no-name youtubers to guide you through it, like for example:
 
Joined
Feb 11, 2009
Messages
5,411 (0.97/day)
System Name Cyberline
Processor Intel Core i7 2600k -> 12600k
Motherboard Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling Tuniq Tower 120 -> Custom Watercoolingloop
Memory Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s) AMD RX480 -> RX7800XT
Storage Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s) Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s) Focusrite 2i4 (USB)
Power Supply Seasonic 620watt 80+ Platinum
Mouse Elecom EX-G
Keyboard Rapoo V700
Software Windows 10 Pro 64bit
As someone with an AMD GPU (7800XT) that has recently dabbled in stable diffusion, if you care about AI, do yourself a favor get an nvidia. Most of the software is developed/tested with nvidia cards and if you go the AMD route, you're going to be spending hours upon hours troubleshooting and researching arguments to add to have basic functionality run, let alone run fast. For example, say you want to run stable-diffusion-webui, you can't use AUTOMATIC1111, you have to use a fork that uses directml. You want to use ROCm? Well, not all AMD cards are supported. You want to use onnx to optimize the models and have them run faster? Good luck debugging the string of errors you'll get from it. AMD is also not great at offering guides for how to run things properly and when they do, they're quickly outdated and you have to rely on no-name youtubers to guide you through it, like for example:

Ok and on Nvidia's side "it just works" ? or got no experience with that yet?

I know from Wendel from Level 1 Tech that AMD is pretty solid for stable diffusion, hell according to him it was even a tad more accurate (whatever that means).
 
Joined
Jan 24, 2023
Messages
65 (0.14/day)
Ok and on Nvidia's side "it just works" ? or got no experience with that yet?

I know from Wendel from Level 1 Tech that AMD is pretty solid for stable diffusion, hell according to him it was even a tad more accurate (whatever that means).
At least for stable diffusion webui, it does. I tested with a 3060 12GB and got 10it/s, while with 7800xt I get around 4it/s. Substantially less fiddling with args to get things working (first thing you get with AMD is 'your GPU doesn't have CUDA, use this flag to only use the CPU'). You can compare benchmark data here: https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
 
Joined
Feb 11, 2009
Messages
5,411 (0.97/day)
System Name Cyberline
Processor Intel Core i7 2600k -> 12600k
Motherboard Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling Tuniq Tower 120 -> Custom Watercoolingloop
Memory Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s) AMD RX480 -> RX7800XT
Storage Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s) Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s) Focusrite 2i4 (USB)
Power Supply Seasonic 620watt 80+ Platinum
Mouse Elecom EX-G
Keyboard Rapoo V700
Software Windows 10 Pro 64bit
At least for stable diffusion webui, it does. I tested with a 3060 12GB and got 10it/s, while with 7800xt I get around 4it/s. Substantially less fiddling with args to get things working (first thing you get with AMD is 'your GPU doesn't have CUDA, use this flag to only use the CPU'). You can compare benchmark data here: https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

thanks but sadly I cant make heads or tails from that benchmark comparison, the data per row seems all over the place.
 
Joined
Nov 3, 2014
Messages
220 (0.06/day)
At least for stable diffusion webui, it does. I tested with a 3060 12GB and got 10it/s, while with 7800xt I get around 4it/s. Substantially less fiddling with args to get things working (first thing you get with AMD is 'your GPU doesn't have CUDA, use this flag to only use the CPU'). You can compare benchmark data here: https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
Have 5700xt, 3060, and 2070 systems. Can confirm both SD and LLMs work super easily and fast in windows on Nvidia, but AMD is terrible. Every few months there'll be updates that brick AMD systems and you need to wait weeks for one of the ~4 devs with AMD to release a fix, and then do full re-installs. Even with all the manual optimizations for AMD, it's about 3 orders of magnitudes slower than nvidia.
 
Joined
May 3, 2018
Messages
2,355 (1.07/day)
Have 5700xt, 3060, and 2070 systems. Can confirm both SD and LLMs work super easily and fast in windows on Nvidia, but AMD is terrible. Every few months there'll be updates that brick AMD systems and you need to wait weeks for one of the ~4 devs with AMD to release a fix, and then do full re-installs. Even with all the manual optimizations for AMD, it's about 3 orders of magnitudes slower than nvidia.
Alas PC users are plebs to AMD, they don't really care. I own all AMD stuff, but I'm not doing any mission critical work anymore. If I were still working I'd be using quadro for my workstation.
 
Top