Tuesday, November 26th 2024
AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More
AMD has released the new ROCm 6.3 version which introduces several new features and optimizations, including SGLang integration for accelerated AI inferencing, a re-engineered FlashAttention-2 for optimized AI training and inference, the introduction of multi-node Fast Fourier Transform (FFT), new Fortran compiler, and enhanced computer vision libraries like rocDecode, rocJPEG, and rocAL.
According to AMD, the SGLang, a runtime that is now supported by ROCm 6.3, is purpose-built for optimizing inference on models like LLMs and VLMs on AMD Instinct GPUs, and promises 6x higher throughput and much easier usage thanks to Python-integrated and pre-configured ROCm Docker containers. In addition, the AMD ROCm 6.3 also brings further transformer optimizations with FlashAttention-2, which should bring significant improvements in forward and backward pass compared to FlashAttention-1, a whole new AMD Fortran compiler with direct GPU offloading, backward compatibility, and integration with HIP Kernels and ROCm libraries, a whole new multi-node FFT support in rocFFT, which simplifies multi-node scaling and improved scalability, as well as enhanced computer vision libraries, rocDecode, rocJPEG, and rocAL, for AV1 codec support, GPU-accelerated JPEG decoding, and better audio augmentation.AMD was keen to note that ROCm 6.3 continues to "deliver cutting-edge tools to simplify development while driving better performance and scalability for AI and HPC workloads", as well as keep embracing the open-source ethos and evolving to meet developer needs. You can check out more details over at the ROCm Documentation Hub or the AMD ROCm Blogs.
Source:
AMD
According to AMD, the SGLang, a runtime that is now supported by ROCm 6.3, is purpose-built for optimizing inference on models like LLMs and VLMs on AMD Instinct GPUs, and promises 6x higher throughput and much easier usage thanks to Python-integrated and pre-configured ROCm Docker containers. In addition, the AMD ROCm 6.3 also brings further transformer optimizations with FlashAttention-2, which should bring significant improvements in forward and backward pass compared to FlashAttention-1, a whole new AMD Fortran compiler with direct GPU offloading, backward compatibility, and integration with HIP Kernels and ROCm libraries, a whole new multi-node FFT support in rocFFT, which simplifies multi-node scaling and improved scalability, as well as enhanced computer vision libraries, rocDecode, rocJPEG, and rocAL, for AV1 codec support, GPU-accelerated JPEG decoding, and better audio augmentation.AMD was keen to note that ROCm 6.3 continues to "deliver cutting-edge tools to simplify development while driving better performance and scalability for AI and HPC workloads", as well as keep embracing the open-source ethos and evolving to meet developer needs. You can check out more details over at the ROCm Documentation Hub or the AMD ROCm Blogs.
18 Comments on AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More
And still works on only 3 consumer cards.
I like the nice artificial limitation of it supporting W6800, but not consumer Navi 21 cards. Seems bizarrely random.
Maaaybe the top end RDNA4 with some luck.
Great CUDA competitor, eh.
ROCm, on the other hand, is still a pain even worse than CUDA with all its shenanigans.
For me its the keyboard one "EPO" , takes me back to the heydays of televised World tour pro cycling when the peletons were full of EPO carrying mules and gregarios(the drug/medicine).le: Allegedly, some were caught, many got caught.
So confusing.
Excuse me while I go get an 8400GS from 2007 to run CUDA.
Anyways, about their compatibility, AMD needs to do better to clarify this mess.
The link above shows that only the 3 top tier RDNA 3 GPUS are supported, yet on these links, they show way more, including many RDNA2.
rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html
rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html
So which ones are really supported AMD?
That said, I read about others people being able to use other GPUs besides the 3 RDNA3 gpus mentioned before.
mi100 already has support waning.
But yes you can use rocm on you 6700xt and other 6000 gen cards just fine. Broader support is coming slowly but surely.
That on top of stability issues on consumer hardware when running ROCm on Linux, and the worse Windows support are strong detrimentals that should be addressed immediately.
It's been years like this by now, "will be better soon" is meaningless when the entire ecosystem is 17 years delayed.
rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html#supported-gpus-win
I use ROCm with LM studio for LLM's
ROCm on Windows is not full support. Only sporadic, like with LM Studio.
Real full support is only available on Linux and WSL.
With my current gpu 7900XTX when I was testing models with some guys in the LM studio discord running 4090's I saw similar performance for most models.
When comparing my performance with 2x3090 on Linux in different models with some folks who had 4090s and 4080s on windows, their performance was way slower than mine.
Not sure if that's the case, but it did use to be 30~70% slower on windows.
All of the comparisons i've done has been on windows.
(to make it clear, this is not aimed directly at you, but anyone that may wonder about this claim).
If lots of your work has to do with running LLMs locally, then it might be worth to switch for the extra performance (and easier tooling overall IMO).
But if you just use it sporadically or as a minor assistant thingie, then there's no point to change your entire workflow.