• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Readies Radeon PRO W9000 Series Powered by RDNA 4

Nothing? that is kinda the point lol.
You make a comparison that card X will be better for a lot of tasks, my point is that that does not matter as card X is unavailable and this will thus sell like hotcakes as a result.... which you support by insinuating this will be hard to get as well....

It was a price point argument.

At $2,999, there is no justification to get a Navi 48 card, even if buffed with 32 GB, when the RTX 5090 exists at the $1,999-$2,999 range, unless you're bound by a support contract, that was my point.

If this card turns out to be $1,499, I see many use cases where its performance figures are satisfactory and in fact, even a passable AMD alternative for high-end gaming at the 32 GB level. For its intended business use, if exclusively targeting the niche, a $1,999 price would be acceptable if ECC memory is absolutely required, otherwise, the 5090 has better creator chops than the Navi 48 chip itself, regardless of what driver stack is supporting it.
 
artificial restrictions
Does AMD even do that with their cards? I've never seen solid evidence either way. I know nvidia does (nvenc, and other things).
 
Not yet.

There are however apps on windows that support ROCm Like LM studio.


I asked for this also would like to see them add LM studio and pick a standard set of LLMs

8B to 14B models for lower vram cards so 8GB to 16GB
27B+ for cards with 24GB of vram and above.

Then have a generic query.

I was compiling something like this from info I found on the internet awhile ago but haven't gone back to it yet.

View attachment 396140
There's definitely a lack of content out there testing local LLMs. Your table is pretty good.
 
At $2,999, there is no justification to get a Navi 48 card, even if buffed with 32 GB, when the RTX 5090 exists at the $1,999-$2,999 range, unless you're bound by a support contract, that was my point.

If this card turns out to be $1,499, I see many use cases where its performance figures are satisfactory and in fact, even a passable AMD alternative for high-end gaming at the 32 GB level. For its intended business use, if exclusively targeting the niche, a $1,999 price would be acceptable if ECC memory is absolutely required, otherwise, the 5090 has better creator chops than the Navi 48 chip itself, regardless of what driver stack is supporting it.
$2,499-$2,999 is expectant, based off the W7800's and W7900's price points.

If AMD's consistent in that 9070/Navi 48 is not a 'big' SKU, but rather a 'mid' SKU, the W7700 *might* be a reasonable comparison. -a $999 Pro SKU.

$999 MSRP
$1500 where available
sounds plausible, for a 'Pro' card that AMD knows is highly-desired in the non-Pro market.
 
$2,499-$2,999 is expectant, based off the W7800's and W7900's price points.

If AMD's consistent in that 9070/Navi 48 is not a 'big' SKU, but rather a 'mid' SKU, the W7700 *might* be a reasonable comparison. -a $999 Pro SKU.

$999 MSRP
$1500 where available
sounds plausible, for a 'Pro' card that AMD knows is highly-desired in the non-Pro market.

The W7900 is both better than this card, has more memory and existed in a world where not even the RTX 4090 was a threat to it (since it had half the memory and roughly the same raw compute), this is a benefit this GPU won't enjoy. The 5090 has far more compute than it, the same amount of memory at nearly quadruple the bandwidth, far better encoding engines (multiple of them), etc. - and that's before we count RTX Pro Blackwell cards. The price needs to be reduced significantly this time around.

Does AMD even do that with their cards? I've never seen solid evidence either way. I know nvidia does (nvenc, and other things).

Yes. Many features are disabled on the gaming variants, and Vega FE additionally has some features disabled despite explicitly supporting Pro Software
 
Yes. Many features are disabled on the gaming variants, and Vega FE additionally has some features disabled despite explicitly supporting Pro Software
Such as? Do you have any links to support this claim? What about consumer Radeons that can run the Pro drivers (7900XTX is one such card)?

The only things I'm aware of that are gated are edid substitution (which still works actually but no gui) and ECC VRAM, which is a hardware thing. Is there honestly anything else?
 
Such as? Do you have any links to support this claim? What about consumer Radeons that can run the Pro drivers (7900XTX is one such card)?

The only things I'm aware of that are gated are edid substitution (which still works actually but no gui) and ECC VRAM, which is a hardware thing. Is there honestly anything else?
I thought there was a Floating Point processing limitation on VIIs and MI50s that MI60s did not have? -only other 'cut' that I can think of off the top of my head, tho.
 
I thought there was a Floating Point processing limitation on VIIs and MI50s that MI60s did not have? -only other 'cut' that I can think of off the top of my head, tho.
Thank you. Maybe I can googlefu some info from there.

Not trying to be difficult here guys, just hunting for info heh.

EDIT: Oh, yeah there's the whole DP2.0 shenanigans on 7xxx as well.
 
Thank you. Maybe I can googlefu some info from there.

Not trying to be difficult here guys, just hunting for info heh.

EDIT: Oh, yeah there's the whole DP2.0 shenanigans on 7xxx as well.

The 9070 XT also has UHBR20 support disabled, it will max out at 13.5 - I don't understand why do they do this, it's only going to hurt their gaming cards as high-end monitors start requiring the bandwidth. But like, stuff such as stereoscopic 3D with multi-buffer swapchain, 30-bit color in SDR, etc. - things that you can call niche, and that GeForce also disables, but yeah, AMD's done more market segmentation than ever as of late.
 
Thank you. Maybe I can googlefu some info from there.

Not trying to be difficult here guys, just hunting for info heh.

EDIT: Oh, yeah there's the whole DP2.0 shenanigans on 7xxx as well.
No worries :cool:

FP64:
FP64 Performance (double-precision floating-point):
  • Radeon VII: Initially announced with 1/2 FP32 rate (6.9 TFLOPS), but later capped at 1/8 FP32 rate (approximately 3.5 TFLOPS FP64) to differentiate it from professional cards. This makes it suitable for some scientific workloads but less competitive than professional cards.

  • Radeon Pro VII: Offers unrestricted FP64 performance at 1/2 FP32 rate (approximately 6.7 TFLOPS FP64), making it significantly better for compute-heavy tasks like simulations and HPC development.

  • MI50: Delivers 6.7 TFLOPS FP64 (1/2 FP32 rate), optimized for HPC and scientific computing.

  • MI60: Provides 7.4 TFLOPS FP64 (1/2 FP32 rate), the highest among these cards due to its full 64 compute units (CUs).

PCIe Support:
  • Radeon VII: Limited to PCIe 3.0 x16, despite Vega 20’s capability for PCIe 4.0. This restricts bandwidth for future-proof systems (e.g., Zen 2 CPUs).

  • Radeon Pro VII: Supports PCIe 4.0 x16, offering double the bandwidth of PCIe 3.0 (up to 32 GB/s), which benefits high-bandwidth workstation tasks.

  • MI50/MI60: Both support PCIe 4.0 x16, optimized for data center environments where high I/O bandwidth is critical.

IF links:
  • Radeon VII: Infinity Fabric links are disabled, limiting it to single-GPU configurations without multi-GPU support.

  • Radeon Pro VII: Supports external Infinity Fabric Links, enabling high-speed GPU-to-GPU communication for multi-GPU workstation setups.

  • MI50/MI60: Both support Infinity Fabric Links, designed for multi-GPU configurations in data centers, enhancing scalability for large compute clusters.

Virtualization:
  • Radeon VII: Lacks hardware virtualization and full-chip ECC (only HBM2 ECC), limiting its suitability for server or critical compute tasks.

  • Radeon Pro VII: Supports hardware virtualization and is optimized for 8K media processing (e.g., 26% higher performance in DaVinci Resolve).

  • MI50/MI60: Support hardware virtualization and HSA (Heterogeneous System Architecture) for CPU-GPU integration, ideal for large-scale simulations. They also offer higher INT8 and FP16 performance (MI60: 59 TOPS INT8, 29.5 TFLOPS FP16; MI50: 53.6 TOPS INT8, 26.8 TFLOPS FP16) for ML and AI workloads.

Note: I didn't verify this info, but it is consistent with my (flawed) memory of the time.
I'd pined after Vega 20, for quite awhile; still want a MI60 someday.
 
Last edited:
Thanks guys, consider me educated!

The 9070 XT also has UHBR20 support disabled, it will max out at 13.5 - I don't understand why do they do this, it's only going to hurt their gaming cards as high-end monitors start requiring the bandwidth. But like, stuff such as stereoscopic 3D with multi-buffer swapchain, 30-bit color in SDR, etc. - things that you can call niche, and that GeForce also disables, but yeah, AMD's done more market segmentation than ever as of late.
This one honestly irks me too.
 
It's a Radeon Pro is already know it'll be priced at a very noticeable premium. It's a RX 9070 XT with a bit better driver support and 32GB, but it'll probably be priced significantly higher in spite of costing like maybe $100's more to produce same tired situation from GPU maker's. Hopefully Intel might actually be on the ball with something that's good value for dollar with plenty of VRAM when they do finally come around and launch something to replace the Arc A770 more credibly than the B580 that was a bit of more efficient side grade and no where to be found in terms of stock unless you count heavily scalped price jack up.
I don't think arc is on intels priority list with all the restructuring they are going through to avoid being bought out.
 
It was a price point argument.

At $2,999, there is no justification to get a Navi 48 card, even if buffed with 32 GB, when the RTX 5090 exists at the $1,999-$2,999 range, unless you're bound by a support contract, that was my point.

If this card turns out to be $1,499, I see many use cases where its performance figures are satisfactory and in fact, even a passable AMD alternative for high-end gaming at the 32 GB level. For its intended business use, if exclusively targeting the niche, a $1,999 price would be acceptable if ECC memory is absolutely required, otherwise, the 5090 has better creator chops than the Navi 48 chip itself, regardless of what driver stack is supporting it.

The point is, Card X can be leagues better and leagues cheaper...but if you cant get it, if it (in that regard) does not exist then it does not matter, you are bound to the other product and they can ask whatever price they want.
 
No, it does not. To the best of my knowledge, ROCm on Windows is, as of today, on a conceptual stage and not supported by any consumer-grade or pro-viz graphics card. Of course, there's a way to run it under WSL, but that's just the Linux version of ROCm that works on the few models they support (7900 XTX, etc.)
Nice try, ollama runs AMD GPUs on top of ROCm in Windows for quite some time now. Same goes for LM Studio.
source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1101 driver=6.2 name="AMD Radeon RX 7800 XT" total="16.0 GiB"
 
Nice try, ollama runs AMD GPUs on top of ROCm in Windows for quite some time now. Same goes for LM Studio.
source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1101 driver=6.2 name="AMD Radeon RX 7800 XT" total="16.0 GiB"

Who's gonna tell him what CUDA on Windows can do :roll:
 
Who's gonna tell him what CUDA on Windows can do :roll:
Topic was ROCm on Windows, so stop trolling and go read https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html
Personally very pleased with AMD experience running LLMs on AMD GPUs using ollama and LM Studio. ollama is especially impressive. Hitting single OllamaSetup.exe sets everything up for the user, detects best hardware available, handles offload to system memory when GPU VRAM gets full etc. Many GUIs to choose from to slap on top, if desired.
If there is anything to complain about, it's that 9070/9070XT ROCm official support is missing, though ROCm 6.3.1 and newer has been reported to work.
Regarding 32GB RDNA4 Radeon PRO W9000 pricing, it has to be well below RTX 5090 to make any sense. At least to LLM enthusiasts. With 2.8x VRAM bandwidth 5090 will offer a lot faster LLM inferencing speed. Yes AMD offer has ECC VRAM support, but my use case is LLMs that make mistakes with any known RAM type anyway. AMD does not win in the software stack or compute power department in this comparision also.
 
Topic was ROCm on Windows, so stop trolling and go read https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html
Personally very pleased with AMD experience running LLMs on AMD GPUs using ollama and LM Studio. ollama is especially impressive. Hitting single OllamaSetup.exe sets everything up for the user, detects best hardware available, handles offload to system memory when GPU VRAM gets full etc. Many GUIs to choose from to slap on top, if desired.
If there is anything to complain about, it's that 9070/9070XT ROCm official support is missing, though ROCm 6.3.1 and newer has been reported to work.
Regarding 32GB RDNA4 Radeon PRO W9000 pricing, it has to be well below RTX 5090 to make any sense. At least to LLM enthusiasts. With 2.8x VRAM bandwidth 5090 will offer a lot faster LLM inferencing speed. Yes AMD offer has ECC VRAM support, but my use case is LLMs that make mistakes with any known RAM type anyway. AMD does not win in the software stack or compute power department in this comparision also.

Make it make sense, first you swear on your pinky that the experience is "extremely smooth", and then proceed to tell us that not even AMD themselves support it. I think my point couldn't be more evident.

Happy it works for you, but to call that a viable competitor to CUDA... that is trolling.
 
Happy it works for you, but to call that a viable competitor to CUDA... that is trolling.
I never mentioned CUDA is inferior to ROCm. I was simply pointing out my surprise, when ROCm works all the out of the box.
 
I never mentioned CUDA is inferior to ROCm. I was simply pointing out my surprise, when ROCm works all the out of the box.
AMD seem to be dragging their heels with ROCm support for RDNA4, I have been using 6.3.1 on Ollama with my 9070x, it seems mostly ok but I need to find some instructions for improving the handling of LLMs that are larger than the available VRAM. My 4090 works fairly well in this scenario but the performance of the 9070xt tanks. Same applies to testing with the ARC B580 which really performs well for the price.
 
AMD seem to be dragging their heels with ROCm support for RDNA4, I have been using 6.3.1 on Ollama with my 9070x, it seems mostly ok but I need to find some instructions for improving the handling of LLMs that are larger than the available VRAM. My 4090 works fairly well in this scenario but the performance of the 9070xt tanks. Same applies to testing with the ARC B580 which really performs well for the price.
Have you tried the 9070XT in LM studio with the vulkan runtime?
 
Have you tried the 9070XT in LM studio with the vulkan runtime?
I haven't used LM Studio yet, only used Ollama so far and planning to use llama.cpp directly on a largish EPYC based rig. Still hoping to see prices drop for the 5090 but not holding my breath. AMDs Pro GPUs seem even more expensive.
 
Back
Top