AMD Readying Feature-enriched ROCm 6.1

T0@st · Feb 28, 2024

The latest version of AMD's open-source GPU compute stack, ROCm, is due for launch soon according to a Phoronix article—chief author, Michael Larabel, has been poring over Team Red's public GitHub repositories over the past couple of days. AMD ROCm version 6.0 was released last December—bringing official support for the AMD Instinct MI300A/MI300X, alongside PyTorch improvements, expanded AI libraries, and many other upgrades and optimizations. The v6.0 milestone placed Team Red in a more competitive position next to NVIDIA's very mature CUDA software layer. A mid-February 2024 update added support for Radeon PRO W7800 and RX 7900 GRE GPUs, as well as ONNX Runtime.

Larabel believes that "ROCm 6.1" is in for an imminent release, given his tracking of increased activity on publicly visible developer platforms: "For MIPOpen 3.1 with ROCm 6.1 there's been many additions including new solvers, an AI-based parameter prediction model for the conv_hip_igemm_group_fwd_xdlops solver, numerous fixes, and other updates. AMD MIGraphX will see an important update with ROCm 6.1. For the next ROCm release, MIGraphX 2.9 brings FP8 support, support for more operators, documentation examples for Whisper / Llama-2 / Stable Diffusion 2.1, new ONNX examples, BLAS auto-tuning for GEMMs, and initial code for MIGraphX running on Microsoft Windows." The change-logs/documentation updates also point to several HIPIFY for ROCm 6.1 improvements—including the addition of CUDA 12.3.2 support.

View at TechPowerUp Main Site | Source

ecomorph · Feb 28, 2024

As someone with an AMD GPU (7800XT) that has recently dabbled in stable diffusion, if you care about AI, do yourself a favor get an nvidia. Most of the software is developed/tested with nvidia cards and if you go the AMD route, you're going to be spending hours upon hours troubleshooting and researching arguments to add to have basic functionality run, let alone run fast. For example, say you want to run stable-diffusion-webui, you can't use AUTOMATIC1111, you have to use a fork that uses directml. You want to use ROCm? Well, not all AMD cards are supported. You want to use onnx to optimize the models and have them run faster? Good luck debugging the string of errors you'll get from it. AMD is also not great at offering guides for how to run things properly and when they do, they're quickly outdated and you have to rely on no-name youtubers to guide you through it, like for example:

ZoneDymo · Feb 28, 2024

ecomorph said:
As someone with an AMD GPU (7800XT) that has recently dabbled in stable diffusion, if you care about AI, do yourself a favor get an nvidia. Most of the software is developed/tested with nvidia cards and if you go the AMD route, you're going to be spending hours upon hours troubleshooting and researching arguments to add to have basic functionality run, let alone run fast. For example, say you want to run stable-diffusion-webui, you can't use AUTOMATIC1111, you have to use a fork that uses directml. You want to use ROCm? Well, not all AMD cards are supported. You want to use onnx to optimize the models and have them run faster? Good luck debugging the string of errors you'll get from it. AMD is also not great at offering guides for how to run things properly and when they do, they're quickly outdated and you have to rely on no-name youtubers to guide you through it, like for example:

Ok and on Nvidia's side "it just works" ? or got no experience with that yet?

I know from Wendel from Level 1 Tech that AMD is pretty solid for stable diffusion, hell according to him it was even a tad more accurate (whatever that means).

ecomorph · Feb 28, 2024

ZoneDymo said:
Ok and on Nvidia's side "it just works" ? or got no experience with that yet?

I know from Wendel from Level 1 Tech that AMD is pretty solid for stable diffusion, hell according to him it was even a tad more accurate (whatever that means).

At least for stable diffusion webui, it does. I tested with a 3060 12GB and got 10it/s, while with 7800xt I get around 4it/s. Substantially less fiddling with args to get things working (first thing you get with AMD is 'your GPU doesn't have CUDA, use this flag to only use the CPU'). You can compare benchmark data here: https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

ZoneDymo · Feb 28, 2024

ecomorph said:
At least for stable diffusion webui, it does. I tested with a 3060 12GB and got 10it/s, while with 7800xt I get around 4it/s. Substantially less fiddling with args to get things working (first thing you get with AMD is 'your GPU doesn't have CUDA, use this flag to only use the CPU'). You can compare benchmark data here: https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

thanks but sadly I cant make heads or tails from that benchmark comparison, the data per row seems all over the place.

Firedrops · Feb 29, 2024

ecomorph said:
At least for stable diffusion webui, it does. I tested with a 3060 12GB and got 10it/s, while with 7800xt I get around 4it/s. Substantially less fiddling with args to get things working (first thing you get with AMD is 'your GPU doesn't have CUDA, use this flag to only use the CPU'). You can compare benchmark data here: https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

Have 5700xt, 3060, and 2070 systems. Can confirm both SD and LLMs work super easily and fast in windows on Nvidia, but AMD is terrible. Every few months there'll be updates that brick AMD systems and you need to wait weeks for one of the ~4 devs with AMD to release a fix, and then do full re-installs. Even with all the manual optimizations for AMD, it's about 3 orders of magnitudes slower than nvidia.

progste · Feb 29, 2024

All this focus on AI is honestly silly but it's nice to see more Linux support!

Minus Infinity · Mar 1, 2024

Firedrops said:
Have 5700xt, 3060, and 2070 systems. Can confirm both SD and LLMs work super easily and fast in windows on Nvidia, but AMD is terrible. Every few months there'll be updates that brick AMD systems and you need to wait weeks for one of the ~4 devs with AMD to release a fix, and then do full re-installs. Even with all the manual optimizations for AMD, it's about 3 orders of magnitudes slower than nvidia.

Alas PC users are plebs to AMD, they don't really care. I own all AMD stuff, but I'm not doing any mission critical work anymore. If I were still working I'd be using quadro for my workstation.

System Name	Cyberline
Processor	Intel Core i7 2600k -> 12600k
Motherboard	Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling	Tuniq Tower 120 -> Custom Watercoolingloop
Memory	Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s)	AMD RX480 -> RX7800XT
Storage	Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s)	Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case	antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s)	Focusrite 2i4 (USB)
Power Supply	Seasonic 620watt 80+ Platinum
Mouse	Elecom EX-G
Keyboard	Rapoo V700
Software	Windows 10 Pro 64bit

System Name	Cyberline
Processor	Intel Core i7 2600k -> 12600k
Motherboard	Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling	Tuniq Tower 120 -> Custom Watercoolingloop
Memory	Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s)	AMD RX480 -> RX7800XT
Storage	Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s)	Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case	antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s)	Focusrite 2i4 (USB)
Power Supply	Seasonic 620watt 80+ Platinum
Mouse	Elecom EX-G
Keyboard	Rapoo V700
Software	Windows 10 Pro 64bit

AMD Readying Feature-enriched ROCm 6.1

News Editor