Tuesday, November 26th 2024

AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More

Nov 26th, 2024 05:48 Discuss (18 Comments)

AMD has released the new ROCm 6.3 version which introduces several new features and optimizations, including SGLang integration for accelerated AI inferencing, a re-engineered FlashAttention-2 for optimized AI training and inference, the introduction of multi-node Fast Fourier Transform (FFT), new Fortran compiler, and enhanced computer vision libraries like rocDecode, rocJPEG, and rocAL.

According to AMD, the SGLang, a runtime that is now supported by ROCm 6.3, is purpose-built for optimizing inference on models like LLMs and VLMs on AMD Instinct GPUs, and promises 6x higher throughput and much easier usage thanks to Python-integrated and pre-configured ROCm Docker containers. In addition, the AMD ROCm 6.3 also brings further transformer optimizations with FlashAttention-2, which should bring significant improvements in forward and backward pass compared to FlashAttention-1, a whole new AMD Fortran compiler with direct GPU offloading, backward compatibility, and integration with HIP Kernels and ROCm libraries, a whole new multi-node FFT support in rocFFT, which simplifies multi-node scaling and improved scalability, as well as enhanced computer vision libraries, rocDecode, rocJPEG, and rocAL, for AV1 codec support, GPU-accelerated JPEG decoding, and better audio augmentation.

AMD was keen to note that ROCm 6.3 continues to "deliver cutting-edge tools to simplify development while driving better performance and scalability for AI and HPC workloads", as well as keep embracing the open-source ethos and evolving to meet developer needs. You can check out more details over at the ROCm Documentation Hub or the AMD ROCm Blogs.

Source: AMD

Add your own comment

18 Comments on AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More

Space Lynx

Astronaut

I saw the small m next to the 6.3 and just immediately thought headphone cable. :slap:

GoldenX

rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

And still works on only 3 consumer cards.

Onasi

@GoldenX
I like the nice artificial limitation of it supporting W6800, but not consumer Navi 21 cards. Seems bizarrely random.

GoldenX

Onasi@GoldenX
I like the nice artificial limitation of it supporting W6800, but not consumer Navi 21 cards. Seems bizarrely random.

Knowing how stingy AMD is now with their "AI first" focus, they most likely won't add support for any other arch until UDNA is out.
Maaaybe the top end RDNA4 with some luck.

Great CUDA competitor, eh.

igormp

GoldenXKnowing how stingy AMD is now with their "AI first" focus, they most likely won't add support for any other arch until UDNA is out.
Maaaybe the top end RDNA4 with some luck.

Great CUDA competitor, eh.

Even Intel managed to get their pytorch extensions merged in the upstream, and getting it to work is even easier than CUDA.
ROCm, on the other hand, is still a pain even worse than CUDA with all its shenanigans.

dont whant to set it"'

Space LynxI saw the small m next to the 6.3 and just immediately thought headphone cable. :slap:

For a split second I was going about the same thought.

For me its the keyboard one "EPO" , takes me back to the heydays of televised World tour pro cycling when the peletons were full of EPO carrying mules and gregarios(the drug/medicine).le: Allegedly, some were caught, many got caught.
So confusing.

GoldenX

igormpEven Intel managed to get their pytorch extensions merged in the upstream, and getting it to work is even easier than CUDA.
ROCm, on the other hand, is still a pain even worse than CUDA with all its shenanigans.

Heh yeah, Intel coming out of nowhere with a real alternative.
Excuse me while I go get an 8400GS from 2007 to run CUDA.

Neo_Morpheus

As usual, the normal negative posts, every time that ROCm is mentioned.

Anyways, about their compatibility, AMD needs to do better to clarify this mess.

The link above shows that only the 3 top tier RDNA 3 GPUS are supported, yet on these links, they show way more, including many RDNA2.

rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html

rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html

So which ones are really supported AMD?

That said, I read about others people being able to use other GPUs besides the 3 RDNA3 gpus mentioned before.

Patriot

Neo_MorpheusAs usual, the normal negative posts, every time that ROCm is mentioned.

Anyways, about their compatibility, AMD needs to do better to clarify this mess.

The link above shows that only the 3 top tier RDNA 3 GPUS are supported, yet on these links, they show way more, including many RDNA2.

rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html

rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html

So which ones are really supported AMD?

That said, I read about others people being able to use other GPUs besides the 3 RDNA3 gpus mentioned before.

Support is a strong word. RDNA2/3 work, but what cards do they test on? top 3 RDNA3 mi250x and mi300x.
mi100 already has support waning.

But yes you can use rocm on you 6700xt and other 6000 gen cards just fine. Broader support is coming slowly but surely.

#10

GoldenX

PatriotSupport is a strong word. RDNA2/3 work, but what cards do they test on? top 3 RDNA3 mi250x and mi300x.
mi100 already has support waning.

But yes you can use rocm on you 6700xt and other 6000 gen cards just fine. Broader support is coming slowly but surely.

One could argue broad support MUST be the first thing you do, else the bar gets harder with every new feature added.

That on top of stability issues on consumer hardware when running ROCm on Linux, and the worse Windows support are strong detrimentals that should be addressed immediately.
It's been years like this by now, "will be better soon" is meaningless when the entire ecosystem is 17 years delayed.

#11

Makaveli

GoldenXrocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

And still works on only 3 consumer cards.

Windows supports more gpu's

rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html#supported-gpus-win

I use ROCm with LM studio for LLM's

#12

GoldenX

MakaveliWindows supports more gpu's

rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html#supported-gpus-win

I use ROCm with LM studio for LLM's

Last time I tested LM Studio on my 6600, it was far slower than equivalent Ampere cards. Has that improved?

ROCm on Windows is not full support. Only sporadic, like with LM Studio.
Real full support is only available on Linux and WSL.

#13

Makaveli

GoldenXLast time I tested LM Studio on my 6600, it was far slower than equivalent Ampere cards. Has that improved?

ROCm on Windows is not full support. Only sporadic, like with LM Studio.
Real full support is only available on Linux and WSL.

you would have to use the Vulkan runtime in LM studio I don't think the 6600 is supported by ROCm. I would installed the newest version of LM studio and test it again.

With my current gpu 7900XTX when I was testing models with some guys in the LM studio discord running 4090's I saw similar performance for most models.

#14

GoldenX

The AI extensions RDNA3 added to compute do their work then.

#15

Makaveli

GoldenXThe AI extensions RDNA3 added to compute do their work then.

Yes the WMMA instructions work well on RDNA 3

#16

igormp

Makaveliyou would have to use the Vulkan runtime in LM studio I don't think the 6600 is supported by ROCm. I would installed the newest version of LM studio and test it again.

With my current gpu 7900XTX when I was testing models with some guys in the LM studio discord running 4090's I saw similar performance for most models.

Fwiw, overall performance on windows is usually way slower when compared to Linux.
When comparing my performance with 2x3090 on Linux in different models with some folks who had 4090s and 4080s on windows, their performance was way slower than mine.

Not sure if that's the case, but it did use to be 30~70% slower on windows.

#17

Makaveli

igormpFwiw, overall performance on windows is usually way slower when compared to Linux.
When comparing my performance with 2x3090 on Linux in different models with some folks who had 4090s and 4080s on windows, their performance was way slower than mine.

Not sure if that's the case, but it did use to be 30~70% slower on windows.

I'm not a linux user so I will take your word for it.

All of the comparisons i've done has been on windows.

#18

igormp

MakaveliI'm not a linux user so I will take your word for it.

All of the comparisons i've done has been on windows.

No worries. You don't really need to take my word for it tho, a quick google shows people doing such comparisons (specially on reddit)
(to make it clear, this is not aimed directly at you, but anyone that may wonder about this claim).

If lots of your work has to do with running LLMs locally, then it might be worth to switch for the extra performance (and easier tooling overall IMO).
But if you just use it sporadically or as a minor assistant thingie, then there's no point to change your entire workflow.

Add your own comment

AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More

18 Comments on AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More

Related News

18 Comments on AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts