• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Radeon Pro v540 Research Thread

Joined
Mar 31, 2023
Messages
159 (0.19/day)
Recently a wave of these GPUs hit the market and there are active efforts to get them working. If you have one of these cards, and you've found this thread, congratulations because you're in the correct spot.
A more detailed set of information about the card can be found here. The short version is it's a 2x Navi 12, with 2x HBM2 8GB memory, primarily used by AWS for virtualized GPU use cases. It is similar to an RX 5600M (so Linux drivers say)

AWS does have support pages for installing the drivers, but you must be an AWS customer and setup proper permissions. Downloading should be covered by the free trial, but do be warned this is very likely not allowed by their terms and agreements, and I am not a lawyer so please behave yourself. A word of advice, you do NOT need to use their AMI instances to get these. A normal Windows powershell interface will work. For Linux I assume the same to hold true, but I have not tried. I do not recommend installing these on your main machine. Alternatively the links for each should work sufficiently and skip the entire signup process.
Known driver packages
Some extra stuff that may be worthwhile, even if it may be conflicting
  • GPU: PCI\VEN_1002&DEV_7360&REV_C1
  • Driver: PCI\VEN_1002&DEV_7362&REV_C3
  • PCI Memory Controller: PCI\VEN_11F8&DEV_4052&REV_00
VBIOS Stuff
Current state of the card

29-1-2025
  • Added Wayback Machine urls I made over a year ago since the original AWS link seems dead now

11-2-2024

2-2-2024
Older updates
A unique card, and certainly a challenge to work with.

1684664880360.png
 
Last edited:
I dont have one of these cards to look at in detail but from what I've been able to figure out:

- The VRM is VERY overbuilt - 12+1 power stages per core. I don't know what specific power stages or controller was used, i might be able to figure it out with better photos.
The card uses an EPS12V cable instead of the standard PCI12V, giving it a max power draw of around 400W (325 from the cable, 75 from the goldfinger)

- The PCIe switch is a Microchip PM40052B1-F3E. I don't know how much power it draws but its probably between 10-20w. It has a number of interfaces, but most notably UART and SMbus two wire

For now, the AMDGPU pro driver seems to be our best lead. Considering it has VCN and a 1.4a DisplayPort hidden behind the io plate, I'm confident this thing is at least physically capable of display output.
 
Last edited:
The moment someone figure how to unlock these for any kind of graphical workload, the price will go triple at least. Now it's still tempting, as I'm sure most of the PC junkies would love to have a twin 5700xt with HBM, even just as a collectable. You have my full support of course. :D
 
I wouldn't expect too much from this card for any non-compute tasks. This card looks like it was made to be exclusively used with ROCm on Linux and it's clearly a server card that depends on the airflow from a rackmount chassis to be cooled. The display output is probably just tied to one of the GPUs. It's not like you're going to get multi-GPU graphics performance out of this. It clearly wasn't designed for that.
 
I wouldn't expect too much from this card for any non-compute tasks. This card looks like it was made to be exclusively used with ROCm on Linux and it's clearly a server card that depends on the airflow from a rackmount chassis to be cooled. The display output is probably just tied to one of the GPUs. It's not like you're going to get multi-GPU graphics performance out of this. It clearly wasn't designed for that.
You're probably right, it's going to be quite the stretch. Been thinking about this since I got the card and there's just too much weird stuff going on.
Not as related to that, but seems weird nobody makes a case for this kind of GPU at a consumer level. Open bench is fine and all but a case designed to run a server grade GPU (even if a bit loud) would still be really helpful right now vs dangling a fan out the back of the chassis on full speed. Of course, it's for a server rack it's to be expected, but it would be nice of a thing to have in a different form factor chassis. A niche but untapped market perhaps?

The moment someone figure how to unlock these for any kind of graphical workload, the price will go triple at least. Now it's still tempting, as I'm sure most of the PC junkies would love to have a twin 5700xt with HBM, even just as a collectable. You have my full support of course. :D
If any real graphical capability comes into the picture I would actually use it for such, rather than compute. If it ends up going nowhere at least it'll be a fun piece to talk about. Honestly the blue AMD theme is really nice and I wish it was more common.
 
nobody makes a case for this kind of GPU at a consumer level.
HBM2 is expensive. I have a Radeon Pro 5600m in my laptop and it's a lot of GPU for the amount of power it consumes which is (in my opinion,) the main advantage of HBM. That means less on dedicated GPUs in a tower chassis compared to something that needs to be power efficient. So the places where this kind of product would shine would be in the mobile space or in the server space. Take the Vega 64 I have for example. You can overclock the memory by a whopping 25% without any adjustments to voltages and it nets you zero performance gain. The additional cost of HBM2 only gets you a little more thermal headroom in the grand scheme of things when you're talking >200w power consumption. However, when your power envelope is small or you're space constrained, it can make sense. Outside of that, it's just additional cost.

To me, I think the untapped market for HBM2 is mobile GPUs or CPUs. There is a real power savings to be have over modern GDDR memory, but again, it comes at a cost as implementing it is not cheap.
 
This patchset gives us some important info.
- microcode, smu, psp, and vcn are implemented in the regular amdgpu module
- sdma firmware might be used by the v540 but it wasn't present when the patches were introduced

there's a few binary blobs to go dig up in the amdgpu kernel module too.
navi12_gpu_info.bin
navi12_ce.bin
navi12_pfp.bin
navi12_mec2.bin
navi12_rlc.bin
navi12_sdma.bin
navi12_sdma1.bin

This comment is interesting:
"From: Xiaojie Yuan <xiaojie.yuan at amd.com>

don't enable any cg/pg features yet.
v2: calculate external revision id from revision id so that we can differentiate navi12 A0 from A1 directly."

Navi 12 is apparently also implemented in the AMDVLK driver.
 
Continuing form https://www.techpowerup.com/forums/threads/rare-gpus-unreleased-gpus.176929/page-66 , some small steps ahead.

Code:
ls -lh /dev/dri/
total 0
drwxr-xr-x  2 root root        100 maj 21 20:51 by-path
crw-rw----+ 1 root video  226,   0 maj 21 20:51 card0
crw-rw----+ 1 root video  226,   1 maj 21 20:52 card1
crw-rw----+ 1 root render 226, 128 maj 21 20:51 renderD128

Code:
ls -l /sys/class/drm/renderD128/device/driver
lrwxrwxrwx 1 root root 0 maj 21 20:51 /sys/class/drm/renderD128/device/driver -> ../../../../bus/pci/drivers/amdgpu

Code:
xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x47 cap: 0x0 crtcs: 4 outputs: 4 associated providers: 0 name:qxl
Provider 1: id: 0x91 cap: 0x6, Sink Output, Source Offload crtcs: 6 outputs: 0 associated providers: 0 name:7360:C1 @ pci:0000:01:00.0

However, if we try to use it to render to virtual display

Code:
xrandr --setprovideroffloadsink 0x91 0x47
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  140 (RANDR)
  Minor opcode of failed request:  34 (RRSetProviderOffloadSink)
  Value in failed request:  0x47
  Serial number of failed request:  16
  Current serial number in output stream:  17

And DRI_PRIME only shows
Code:
DRI_PRIME=1 glxinfo | grep "OpenGL renderer"
OpenGL renderer string: llvmpipe (LLVM 15.0.3, 256 bits)
Same as DRI_PRIME=0

Maybe qxl can't be used as target display?
 
I wouldn't expect too much from this card for any non-compute tasks. This card looks like it was made to be exclusively used with ROCm on Linux and it's clearly a server card that depends on the airflow from a rackmount chassis to be cooled. The display output is probably just tied to one of the GPUs. It's not like you're going to get multi-GPU graphics performance out of this. It clearly wasn't designed for that.
Rocm wouldnt make sense on a Vxx Card as they are made for VDI and other remote tasks, for eg we use the V620 at work for remote CAD machines.

if/when i can get mine working would be used for 2x Gaming VMs to replace my Radeon Pro Duo whats in my VDI machine now
 
if/when i can get mine working would be used for 2x Gaming VMs to replace my Radeon Pro Duo whats in my VDI machine now
AMD should just make a consumer version of this card and call it a day. Would make things much more simple for us lol
 
AMD should just make a consumer version of this card and call it a day. Would make things much more simple for us lol
lol. Feels...

I firmly believe that we 'were supposed to' get an HBM-Navi or at least a 'halo tier RX 5800/5900', but they were not cost effective to introduce to the consumer market.

Such a shame it's so difficult turning $1k+/ea 'eWaste' back into something useful
 
lol. Feels...

I firmly believe that we 'were supposed to' get an HBM-Navi or at least a 'halo tier RX 5800/5900', but they were not cost effective to introduce to the consumer market.

Such a shame it's so difficult turning $1k+/ea 'eWaste' back into something useful
it perfectly fits to the other navi cards:

navi10
RX5600 - 2048 core 192bit
RX5600XT - 2304 core 192bit
5700 - 2304 core 256 bit
5700XT - 2560 core 256bit

navi12
5800 - 2304 core HBM2 - high quality blower or 2-3 fan
5800XT - 2560 core HBM2 - with high end 3 fan Air or AIO
WX5800 - V520 - BC-160 - high quality blower
WX5800 Duo - V540 - AIO water like Pro Duo


Sadly the only officially released navi12 are V520, BC-160 and the mobile R.Pro 5600M
 
I don't think much can be done with currently available tools.

I recreated setup from VM on bare metal, with this conclusions:

- mesa drivers don't work at all.
- amdgpu drivers look like they should be working, everything checks at boot, both power and temperature fall way down. Sensors can read data from one GPU.
- no OpenGL renderers are detected by the system.
- I could not manage to get headless server with dummy display running. X11 forwarding/xrdp/spice dont use 3d acceleration.
 
The one GPU version is working on windows. I got one proof from china.
https://community.amd.com/t5/drivers-software/windows-driver-for-radeon-pro-v520/td-p/531112

So it is possible the v540 is works with that driver. Sadly the user is disappeared.
1684836729980.png


I have one of the GPUs "Working" aka got the driver installed and fixed Pcie Bridge error, but havent done any real testing due to my lack of cooling

Like that user it does seem i can only get the driver loaded on my Windows 11 Enterprise box, and not my Windows Server 2016/19, if i can figure out cooling or deal with the loud ass alarm on the GPU ill see if i actually got it working and see if a program can use the card.
 
if i can figure out cooling or deal with the loud ass alarm on the GPU ill see if i actually got it working and see if a program can use the card
If you have a ruler (preferably right angle and in mm but its gravy) i would really appreciate some photos of the pcb. I have some ideas for a fc waterblock but I would like to start mocking up something more solid while I cant get the card
 
If you have a ruler (preferably right angle and in mm but its gravy) i would really appreciate some photos of the pcb. I have some ideas for a fc waterblock but I would like to start mocking up something more solid while I cant get the card
No ruler but i got an iphone, it should be pretty accurate iirc
 
Want to just add that the drivers are in fact NOT watermarked like I originally believed.

Windows Server 2019
22.10.01.12-220930a-384126e-WHQLCert_.zip
SHA-1: 55D4B265BA2EEF6FC56FDEA58A82C1663DF2F23C

Windows Server 2016
Windows_2019_2016-20.10.25.01-201109a-361679C-WHQL.zip
SHA1 910c4b4e68e84ed8beef454b2f4d2d8424558ef1

Archive branch
20.10.25.04-210414a-365562C-V520-SN1000-WHQLCert.zip
SHA1 30b2d121f5b91be0c1302280cc2be67213839675
 
Last edited by a moderator:
No ruler but i got an iphone, it should be pretty accurate iirc
I just need something accurate and simple for scale. I have a basic layout planned already.

I'm not expecting an active backplate to be necessary, HBM2 doesn't tend to get hot (nor benefit from extra voltage afaik).

Id like to find out what power stages are being used, on first glance they don't seem likely to get hot - 12 phases per 100W core even with 50A mosfets wouldn't put out much, and unless someone goes to the trouble of epowering it they're going to cap out at 180Wish which still isn't that much

For what we know the cores get very hot though and that's where I expect to find the main problem
 
View attachment 297171

I have one of the GPUs "Working" aka got the driver installed and fixed Pcie Bridge error, but havent done any real testing due to my lack of cooling

Like that user it does seem i can only get the driver loaded on my Windows 11 Enterprise box, and not my Windows Server 2016/19, if i can figure out cooling or deal with the loud ass alarm on the GPU ill see if i actually got it working and see if a program can use the card.
The alarm is a pretty serious overheat alarm. These cards were intended to be used in *very* forced-airflow rackmount cases, in an actively climate-controlled server room. They need a LOT of air, and the higher pressure the better (dense finstack on the GPU-sinks).
When my MI25 'went off' (booting Windows 7 in a dual s940 board, no fan on the MI25), I almost burnt myself pulling the card. Literally, the entire card was near 'freshly melted wax' temp. -and that was from just 1 or 2 POSTs and boot attempts.

If the V540's cooler is like the MI25's, the shroud should come off w/ several screws around the bottom of the shroud (on the 'sides' of the card). From there, you can zip-tie, tape, or otherwise 'attach' fan(s) to the card.

It took me all of a few minutes to remove the shroud, and 'mount' a 12x3.8cm high-pressure fan.
Pics of my 'modified' MI25 in the following posts:

Alternatively: there are 3D prints out there for mounting fan(s) to the unmodified card.
Though IMO, none of those 3D prints are 'viable'. They all make the card unmountably-long.
 
Last edited:
- I not manage to get headless server with dummy display running. X11 forwarding/xrdp/spice dont use 3d acceleration.
When I get one of my own I intend to try amdvlk + wayland + pro drivers. If anyone else feels like it they can try too - it *seems* like it should work but not being able to run X is pretty nasty.
 
Test rig ready!
No overheating.
I still need windows drivers.
IMG_20230524_092252.jpg
I know not the best idea to put a battery next to a hot video card.
 
Card is not cooperating at all.

This is on Ubuntu 18.04, 5.4 kernel, amdgpu-pro drivers from AWS, opencl headless install, as close as you can get on EC2 instance with V520.

snip from dmesg
Code:
[  +0,000217] [drm] Initialized amdgpu 3.36.0 20150101 for 0000:02:00.0 on minor 1

Great.

snip clinfo
Code:
Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (3110.6)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx1011
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0 AMD-APP (3110.6)
  Driver Version                                  3110.6 (PAL,LC)
  Device OpenCL C Version                         OpenCL C 2.0
  Device Type                                     GPU
  Device Board Name (AMD)                         AMD Radeon Pro V520
  Device Topology (AMD)                           PCI-E, 02:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes

Ok, let's try some workload - OpenCL mixbench:

Code:
May 24 19:38:58 m-ub18 kernel: [  155.182148] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 10secs aborting
May 24 19:38:58 m-ub18 kernel: [  155.182179] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 4FCA (len 110, WS 12, PS 8) @ 0x502F
May 24 19:38:58 m-ub18 kernel: [  155.182208] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 4F58 (len 37, WS 0, PS 8) @ 0x4F71
May 24 19:38:58 m-ub18 kernel: [  155.182237] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu asic init failed
May 24 19:38:58 m-ub18 kernel: [  155.201813] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
May 24 19:38:58 m-ub18 kernel: [  155.202098] [drm] PSP is resuming...
May 24 19:38:58 m-ub18 kernel: [  155.202125] amdgpu 0000:02:00.0: sos fw version = 0xffffffff.
May 24 19:38:58 m-ub18 kernel: [  155.202127] amdgpu 0000:02:00.0: sos fw version = 0xffffffff.
May 24 19:38:58 m-ub18 kernel: [  155.396470] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
May 24 19:38:58 m-ub18 kernel: [  155.396504] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
May 24 19:38:58 m-ub18 kernel: [  155.396533] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
May 24 19:38:58 m-ub18 kernel: [  155.396561] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-62).
May 24 19:39:11 m-ub18 kernel: [  168.698350] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=84, emitted seq=86
May 24 19:39:11 m-ub18 kernel: [  168.698403] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=2, emitted seq=4
May 24 19:39:11 m-ub18 kernel: [  168.698447] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
May 24 19:39:11 m-ub18 kernel: [  168.698490] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
May 24 19:39:11 m-ub18 kernel: [  168.698491] amdgpu 0000:02:00.0: GPU reset begin!
May 24 19:39:11 m-ub18 kernel: [  168.698497] amdgpu 0000:02:00.0: GPU reset begin!
May 24 19:39:11 m-ub18 kernel: [  168.698499] [drm] Bailing on TDR for s_job:3f, as another already in progress
May 24 19:39:11 m-ub18 kernel: [  168.698502] BUG: kernel NULL pointer dereference, address: 0000000000000038
May 24 19:39:11 m-ub18 kernel: [  168.698504] #PF: supervisor read access in kernel mode
May 24 19:39:11 m-ub18 kernel: [  168.698506] #PF: error_code(0x0000) - not-present page

That did not go well.

===================

I take it back, managed to get OpenCL running, in xmrig of all things. However this is on bare metal, same software as before.

Code:
 * OPENCL       #0 AMD Accelerated Parallel Processing/OpenCL 2.1 AMD-APP (3110.6)
* OPENCL GPU   #0 0c:00.0 AMD Radeon Pro V520 (gfx1011) 1450 MHz cu:18 mem:6949/8176 MB
* OPENCL GPU   #1 09:00.0 AMD Radeon Pro V520 (gfx1011) 1450 MHz cu:18 mem:6949/8176 MB
* CUDA         disabled
[2023-05-24 20:16:24.840]  net      use pool randomxmonero.eu-north.nicehash.com:3380  34.149.22.228
[2023-05-24 20:16:24.840]  net      new job from randomxmonero.eu-north.nicehash.com:3380 diff 221675 algo rx/0 (4 tx)
[2023-05-24 20:16:24.840]  cpu      use argon2 implementation AVX2
[2023-05-24 20:16:24.842]  msr      register values for "ryzen_17h" preset have been set successfully (2 ms)
[2023-05-24 20:16:24.842]  randomx  init dataset algo rx/0 (12 threads) seed 742ae9bc7c8ac350...
[2023-05-24 20:16:25.032]  randomx  allocated 2336 MB (2080+256) huge pages 100% 1168/1168 +JIT (190 ms)
[2023-05-24 20:16:27.761]  randomx  dataset ready (2729 ms)
[2023-05-24 20:16:27.761]  opencl   use profile  rx  (4 threads) scratchpad 2048 KB
|  # | GPU |  BUS ID | INTENSITY | WSIZE | MEMORY | NAME
|  0 |   0 | 0c:00.0 |      1152 |     8 |   2304 | AMD Radeon Pro V520 (gfx1011)
|  1 |   0 | 0c:00.0 |      1152 |     8 |   2304 | AMD Radeon Pro V520 (gfx1011)
|  2 |   1 | 09:00.0 |      1152 |     8 |   2304 | AMD Radeon Pro V520 (gfx1011)
|  3 |   1 | 09:00.0 |      1152 |     8 |   2304 | AMD Radeon Pro V520 (gfx1011)
[2023-05-24 20:16:28.245]  opencl   READY threads 4/4 (484 ms)
[2023-05-24 20:16:49.579]  net      new job from randomxmonero.eu-north.nicehash.com:3380 diff 221675 algo rx/0 (1 tx)
[2023-05-24 20:17:21.467]  opencl   accepted (1/0) diff 221675 (122 ms)
[2023-05-24 20:17:24.687]  opencl   accepted (2/0) diff 221675 (39 ms)
[2023-05-24 20:17:27.849]  opencl   #0 0c:00.0  55W 53C    0RPM 1445/1000MHz
[2023-05-24 20:17:27.849]  opencl   #1 09:00.0  56W 51C    0RPM 1450/1000MHz
[2023-05-24 20:17:27.849]  miner    speed 10s/60s/15m 1956.5 n/a n/a H/s max 2198.7 H/s
[2023-05-24 20:18:27.944]  opencl   #0 0c:00.0  56W 59C    0RPM 1445/1000MHz
[2023-05-24 20:18:27.944]  opencl   #1 09:00.0  57W 54C    0RPM 1450/1000MHz
[2023-05-24 20:18:27.944]  miner    speed 10s/60s/15m 2076.8 2011.1 n/a H/s max 2198.7 H/s
[2023-05-24 20:18:35.811]  signal   Ctrl+C received, exiting
[2023-05-24 20:18:37.497]  opencl   stopped (1686 ms)

Hopefully, it will be possible now to squeeze some graphics.

================

Heh, it is.

Screenshot_20230524_230009.pngScreenshot_20230524_230535.png
 

Attachments

Last edited:
As it turns out you can get the Windows drivers without an account at all. I've gone ahead and thrown the urls at archive.org as well. I don't know who originally figured this out, the info just trickled down to me on one of them, and I guessed 2 other urls.
 
Last edited:
Back
Top