• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

The secret of Doom (2016) performance on AMD GPUs

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.94/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
Funny how in Windows you need a Skylake CPU or newer to use Vulkan, but Ivy Bridge and newer support DX12, and Vulkan on Linux. We need Zen now...
You mean like how AMDGPU-Pro supports Vulkan in Linux but, is limited to 3rd gen GCN GPUs at the moment?
 
Last edited:
Joined
Oct 2, 2015
Messages
2,991 (0.96/day)
Location
Argentina
System Name Ciel
Processor AMD Ryzen R5 5600X
Motherboard Asus Tuf Gaming B550 Plus
Cooling ID-Cooling 224-XT Basic
Memory 2x 16GB Kingston Fury 3600MHz@3933MHz
Video Card(s) Gainward Ghost 3060 Ti 8GB + Sapphire Pulse RX 6600 8GB
Storage NVMe Kingston KC3000 2TB + NVMe Toshiba KBG40ZNT256G + HDD WD 4TB
Display(s) AOC Q27G3XMN + Samsung S22F350
Case Cougar MX410 Mesh-G
Audio Device(s) Kingston HyperX Cloud Stinger Core 7.1 Wireless PC
Power Supply Aerocool KCAS-500W
Mouse EVGA X15
Keyboard VSG Alnilam
Software Windows 11
You mean like how AMDGPU-Pro supports Vulkan in Linux but, is limited to 3rd gen GCN GPUs at the moment?

https://cgit.freedesktop.org/~agd5f/linux/?h=drm-next-4.8-wip-si
https://cgit.freedesktop.org/~agd5f/linux/?h=drm-next-4.9-si

You can test their progress on GCN 1.0, and to use the driver on GCN 1.1 hardware you only have to enable a kernel flag, both are experimental.
tldr, they are working on it.

And, the community has it's own implementation: https://www.phoronix.com/scan.php?page=news_item&px=RADV-Radeon-Vulkan-Driver
 
Joined
Jun 13, 2012
Messages
1,327 (0.31/day)
Processor i7-13700k
Motherboard Asus Tuf Gaming z790-plus
Cooling Coolermaster Hyper 212 RGB
Memory Corsair Vengeance RGB 32GB DDR5 7000mhz
Video Card(s) Asus Dual Geforce RTX 4070 Super ( 2800mhz @ 1.0volt, ~60mhz overlock -.1volts. 180-190watt draw)
Storage 1x Samsung 980 Pro PCIe4 NVme, 2x Samsung 1tb 850evo SSD, 3x WD drives, 2 seagate
Display(s) Acer Predator XB273u 27inch IPS G-Sync 165hz
Power Supply Corsair RMx Series RM850x (OCZ Z series PSU retired after 13 years of service)
Mouse Logitech G502 hero
Keyboard Logitech G710+
Whats Interesting is AMD made the API open to NV to use but NV refused to use it, where as NV refuses to share their APIs. Honestly Open APIs allow everyone to win no matter Red or Green
Um AMD claimed they would make it open source but date they set to release the source came and went, 6 months went by and no source code. AMD refused to release the source to start with then they canned the project and turned it over to kronos. AMD dropped the ball not NV in that matter, don't say nv refused to use a closed API they never had access to cause it never made open source under AMD.
 
Joined
Oct 2, 2015
Messages
2,991 (0.96/day)
Location
Argentina
System Name Ciel
Processor AMD Ryzen R5 5600X
Motherboard Asus Tuf Gaming B550 Plus
Cooling ID-Cooling 224-XT Basic
Memory 2x 16GB Kingston Fury 3600MHz@3933MHz
Video Card(s) Gainward Ghost 3060 Ti 8GB + Sapphire Pulse RX 6600 8GB
Storage NVMe Kingston KC3000 2TB + NVMe Toshiba KBG40ZNT256G + HDD WD 4TB
Display(s) AOC Q27G3XMN + Samsung S22F350
Case Cougar MX410 Mesh-G
Audio Device(s) Kingston HyperX Cloud Stinger Core 7.1 Wireless PC
Power Supply Aerocool KCAS-500W
Mouse EVGA X15
Keyboard VSG Alnilam
Software Windows 11
Well, to be fair, Nvidia took part in the development of Vulkan, that's better than having to just take Mantle and support it.
 
Joined
Nov 5, 2004
Messages
385 (0.05/day)
Location
Belgium, Leuven
Processor I7-6700
Motherboard ASRock Z170 Pro4S
Cooling 2*120mm
Memory G.Skill D416GB 3200-14 Trident Z K2 GSK
Video Card(s) Rx480 Sapphire
Storage SSD Samsung 256GB 850 pro + bunch of TB
Case Antec
Audio Device(s) Creative Sound Blaster Z
Power Supply be quit 900W
Mouse Logitech G5
Keyboard Logitech G11
I am failing to find consensus here. What is the middle-ground in this discussion that people agree on? Or is it all interpretation and speculation?

I read that Mantle was dropped because DX12 was doing the same; thus I can conclude that AMD and MS were working together on a level that NV wasn´t.

On the other hand I also read that DX11 was secretly supporting some extras(async) for NV, pointing out an agreement between NV and MS.

So is MS playing both sides, or just opportunistic?


I am not trying to make any tin foil hat theories; I am just trying to make sens out of what is being said here in one simple to understand big picture post. (and how this relates to the "secret" that OP mentions)
 
Joined
Jul 9, 2015
Messages
3,413 (1.06/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
Well if it wasn't closed when was it ever open sourced before turning it over to kronos group?
You mean the 435 page programming guide isn't enough?
Yeah, cause there are much better ways to expose APIs, right?

Jeez. Name a single nV proprietary then become standard... pretty much anything, will ya? You know, just for a perspective on things.

On the other hand I also read that DX11 was secretly supporting some extras(async) for NV, pointing out an agreement between NV and MS.
Me wonders what this speculation is based on.

And how it bodes with "nvidia = no async".
 
Joined
Oct 2, 2015
Messages
2,991 (0.96/day)
Location
Argentina
System Name Ciel
Processor AMD Ryzen R5 5600X
Motherboard Asus Tuf Gaming B550 Plus
Cooling ID-Cooling 224-XT Basic
Memory 2x 16GB Kingston Fury 3600MHz@3933MHz
Video Card(s) Gainward Ghost 3060 Ti 8GB + Sapphire Pulse RX 6600 8GB
Storage NVMe Kingston KC3000 2TB + NVMe Toshiba KBG40ZNT256G + HDD WD 4TB
Display(s) AOC Q27G3XMN + Samsung S22F350
Case Cougar MX410 Mesh-G
Audio Device(s) Kingston HyperX Cloud Stinger Core 7.1 Wireless PC
Power Supply Aerocool KCAS-500W
Mouse EVGA X15
Keyboard VSG Alnilam
Software Windows 11
It was said Nvidia spent (my english is slipping, is "spent" the right use here?) serious money on reducing the cpu overhead of their DirectX11 and OpenGL drivers. Maybe they managed to make a better multi-thread use of them, but that doesn't mean the Vulkan and DX12 implementation of async compute is the same.
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB and Toshiba N300 NAS 10TB HDD
Display(s) 2X LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
vk_nv_glsl_shader is for using existing GLSL shaders(2), the ones used in OpenGL (they are compiled in runtime) on Vulkan, so you don't have to port them to SPIR-V (the new universal method, mostly precompiled). It's purely for easing the porting work, it even makes things run slower.
Good point on the Far Cry 2 example, but you have to remember Nvidia refused to implement DX10.1, they had to do an implementation or they would have looked slower/older than the competition.

OpenGL had vendor specific extensions for decades, not just with the recent console generation(1).
1. So what? AMD's OpenGL didn't have shader intrinsics and features. The difference shows between AMD's OpenGL's and Vulkan's framerates.

2. For NVIDA GPUs, read https://developer.nvidia.com/reading-between-threads-shader-intrinsics This is applicable for NVidia's Vulkan, OpenGL, DX11, DX12 and NVAPI.

https://www.opengl.org/discussion_b...0-Nvidia-s-OpenGL-extensions-rival-AMD-Mantle
According to Carmack himself, Nvidia’s OpenGL extensions can give similar improvements – regarding draw calls – with AMD’s Mantle.

 
Last edited:
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB and Toshiba N300 NAS 10TB HDD
Display(s) 2X LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
Um AMD claimed they would make it open source but date they set to release the source came and went, 6 months went by and no source code. AMD refused to release the source to start((1) with then they canned the project(1) and turned it over to kronos(2). AMD dropped the ball not NV in that matter, don't say nv refused to use a closed API they never had access to cause it never made open source under AMD.
1. Mantle API is still listed as a working API in my latest Radeon driver. Mantle API wasn't completed and AMD wants to avoid slow/stone wall/filler buster API politics. NVidia has their own OpenGL vendor extensions and NVAPI competing against AMD's Mantle. NVAPI has existed for a long time i.e. before Mantle and DX11.

2. What's important is the end result.
 
Joined
Oct 2, 2015
Messages
2,991 (0.96/day)
Location
Argentina
System Name Ciel
Processor AMD Ryzen R5 5600X
Motherboard Asus Tuf Gaming B550 Plus
Cooling ID-Cooling 224-XT Basic
Memory 2x 16GB Kingston Fury 3600MHz@3933MHz
Video Card(s) Gainward Ghost 3060 Ti 8GB + Sapphire Pulse RX 6600 8GB
Storage NVMe Kingston KC3000 2TB + NVMe Toshiba KBG40ZNT256G + HDD WD 4TB
Display(s) AOC Q27G3XMN + Samsung S22F350
Case Cougar MX410 Mesh-G
Audio Device(s) Kingston HyperX Cloud Stinger Core 7.1 Wireless PC
Power Supply Aerocool KCAS-500W
Mouse EVGA X15
Keyboard VSG Alnilam
Software Windows 11
What Carmack refers to is Nvidia's own nv_command_list OpenGL extension, it gives almost Vulkan level of overhead on OpenGL: https://www.opengl.org/registry/specs/NV/command_list.txt
It works like intended on Kepler and newer, and is badly implemented in Fermi.

Just for a quick example, AMD has GCN_shader in OpenGL, among others like pinned memory. I'm currently on Fedora, so I can give you only what the open driver offers, but it has enough examples:

GL_AMD_conservative_depth, GL_AMD_draw_buffers_blend,
GL_AMD_performance_monitor, GL_AMD_pinned_memory,
GL_AMD_seamless_cubemap_per_texture, GL_AMD_shader_stencil_export,
GL_AMD_shader_trinary_minmax, GL_AMD_vertex_shader_layer,
GL_AMD_vertex_shader_viewport_index

It even supports other vendor's extensions:

GL_NVX_gpu_memory_info,
GL_NV_conditional_render, GL_NV_depth_clamp, GL_NV_packed_depth_stencil,
GL_NV_texture_barrier, GL_NV_vdpau_interop

In Windows this list is more extensive.

Nvidia isn't the only one with their own optimizations, and not all of them are useful in gaming scenarios.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.94/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
According to Carmack himself, Nvidia’s OpenGL extensions can give similar improvements – regarding draw calls – with AMD’s Mantle.
That's probably because nVidia does something similar at the driver level. I speculated before that nVidia's implementation might keep track of the OpenGL calls and might use a queue to buffer things like draw calls which are processed independently and joined up on when a OpenGL call comes through that requires the draw calls to all be completed.

In fact, for system integration at work I recently implemented a library (unpolished,) that takes in a stream of data with associated data dependencies. It blocks subsequent data from proceeding if there is something using a resource that must be handled serially but, allows other to continue to be processed in order to improve parallel throughput. I refer to this as "queue re-ordering based on data dependencies," and I wouldn't be surprised if nVidia did something similar at the driver level in order to give the appearance that certain calls are executed quickly but, in reality they very well might be happening asynchronously in another thread or process after being put on a queue in memory.

Simply put, you don't need async compute to use a queue to accelerate certain kinds of driver workloads (and in all seriously, many other kinds of workloads as well.) The nice thing about queues is that they decouple the what from the when so, instead of waiting for a draw call to complete, it returns immediately with the understanding that the draw was queued up and will eventually be executed before another OpenGL call is made that requires it to be complete. It's possible that certain OpenGL calls might tell nVidia's driver "look, you need to finish processing everything in the queue (or maybe even just the stuff that call cares about,) before continuing."
What Carmack refers to is Nvidia's own nv_command_list OpenGL extension, it gives almost Vulkan level of overhead on OpenGL: https://www.opengl.org/registry/specs/NV/command_list.txt
It works like intended on Kepler and newer, and is badly implemented in Fermi.

Just for a quick example, AMD has GCN_shader in OpenGL, among others like pinned memory. I'm currently on Fedora, so I can give you only what the open driver offers, but it has enough examples:

GL_AMD_conservative_depth, GL_AMD_draw_buffers_blend,
GL_AMD_performance_monitor, GL_AMD_pinned_memory,
GL_AMD_seamless_cubemap_per_texture, GL_AMD_shader_stencil_export,
GL_AMD_shader_trinary_minmax, GL_AMD_vertex_shader_layer,
GL_AMD_vertex_shader_viewport_index

It even supports other vendor's extensions:

GL_NVX_gpu_memory_info,
GL_NV_conditional_render, GL_NV_depth_clamp, GL_NV_packed_depth_stencil,
GL_NV_texture_barrier, GL_NV_vdpau_interop

In Windows this list is more extensive.

Nvidia isn't the only one with their own optimizations, and not all of them are useful in gaming scenarios.
...and that doesn't even touch on the implementation of the calls that are in the OpenGL (insert version you care about here,) spec. Just because you need to implement a spec that has functions x, y, and z doesn't mean that the implementations themselves are the same or even remotely similar.

tl;dr: I wouldn't be surprised if some of the things Vulkan is spec'ed out to do is merely done implicitly by nVidia's OpenGL drivers already, under the hood. Sometimes it's faster to put something on a queue and do it later than to do something on the spot so long as managing the queue doesn't outstrip the benefit.
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB and Toshiba N300 NAS 10TB HDD
Display(s) 2X LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
That's probably because nVidia does something similar at the driver level. I speculated before that nVidia's implementation might keep track of the OpenGL calls and might use a queue to buffer things like draw calls which are processed independently and joined up on when a OpenGL call comes through that requires the draw calls to all be completed.

In fact, for system integration at work I recently implemented a library (unpolished,) that takes in a stream of data with associated data dependencies. It blocks subsequent data from proceeding if there is something using a resource that must be handled serially but, allows other to continue to be processed in order to improve parallel throughput. I refer to this as "queue re-ordering based on data dependencies," and I wouldn't be surprised if nVidia did something similar at the driver level in order to give the appearance that certain calls are executed quickly but, in reality they very well might be happening asynchronously in another thread or process after being put on a queue in memory.

Simply put, you don't need async compute to use a queue to accelerate certain kinds of driver workloads (and in all seriously, many other kinds of workloads as well.) The nice thing about queues is that they decouple the what from the when so, instead of waiting for a draw call to complete, it returns immediately with the understanding that the draw was queued up and will eventually be executed before another OpenGL call is made that requires it to be complete. It's possible that certain OpenGL calls might tell nVidia's driver "look, you need to finish processing everything in the queue (or maybe even just the stuff that call cares about,) before continuing."
From https://developer.nvidia.com/dx12-dos-and-donts

"On DX11, the driver does farm off asynchronous tasks to driver worker threads where possible".



What Carmack refers to is Nvidia's own nv_command_list OpenGL extension, it gives almost Vulkan level of overhead on OpenGL: https://www.opengl.org/registry/specs/NV/command_list.txt
It works like intended on Kepler and newer, and is badly implemented in Fermi.

Just for a quick example, AMD has GCN_shader in OpenGL, among others like pinned memory. I'm currently on Fedora, so I can give you only what the open driver offers, but it has enough examples:

GL_AMD_conservative_depth, GL_AMD_draw_buffers_blend,
GL_AMD_performance_monitor, GL_AMD_pinned_memory,
GL_AMD_seamless_cubemap_per_texture, GL_AMD_shader_stencil_export,
GL_AMD_shader_trinary_minmax, GL_AMD_vertex_shader_layer,
GL_AMD_vertex_shader_viewport_index

It even supports other vendor's extensions:

GL_NVX_gpu_memory_info,
GL_NV_conditional_render, GL_NV_depth_clamp, GL_NV_packed_depth_stencil,
GL_NV_texture_barrier, GL_NV_vdpau_interop

In Windows this list is more extensive.

Nvidia isn't the only one with their own optimizations, and not all of them are useful in gaming scenarios.
Nearly useless post since it doesn't specifically address performance difference between AMD's Vulkan and OpenGL frame rate results. Furthermore, majority of PC games are written with Direct3D not OpenGL APIs.


AMD has recently enabled GCN's Shader Intrinsic Functions with Vulkan, DirectX11 and DirectX12, while NVIDIA has Shader Intrinsic Functions with Direct3D and NVAPI along time ago.

https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl

"None of the intrinsics are possible in standard DirectX or OpenGL. But they have been supported and well-documented in CUDA for years. A mechanism to support them in DirectX has been available for a while but not widely documented. I happen to have an old NVAPI version 343 on my system from October 2014 and the intrinsics are supported in DirectX by that version and probably earlier versions. This blog explains the mechanism for using them in DirectX.

Unlike OpenGL or Vulkan, DirectX unfortunately doesn't have a native mechanism for vendor-specific extensions. But there is still a way to make all this functionality available in DirectX 11 or 12 through custom intrinsics. That mechanism is implemented in our graphics driver and accessible through the NVAPI library."

 
Last edited:

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.94/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
From https://developer.nvidia.com/dx12-dos-and-donts

"On DX11, the driver does farm off asynchronous tasks to driver worker threads where possible".

What's more interesting from that link you provided:
Don’ts
  • Don’t rely on the driver to parallelize any Direct3D12 works in driver threads
    • On DX11 the driver does farm off asynchronous tasks to driver worker threads where possible – this doesn’t happen anymore under DX12
    • While the total cost of work submission in DX12 has been reduced, the amount of work measured on the application’s thread may be larger due to the loss of driver threading. The more efficiently one can use parallel hardware cores of the CPU to submit work in parallel, the more benefit in terms of draw call submission performance can be expected.

After reading that, it makes me think that DX12 very well might be too difficult to handle in the same way to accelerate what would normally behave like a serial workload but, gets delegated to the driver as I surmised (even for DX11 which isn't entirely surprising.) I see this as being a move to put more power in the hands of game developers and less in the hands of driver developers should extra performance be demanded. It takes driver developers off the hook for making up for poor engine implementations, which nVidia has done exceptionally well in my opinion. In general, I would call this a good thing but, that very well might mean we're not going to see the same kind of driver advantage nVidia has had over AMD going forward but, I think that will highly depend on how game devs implement and utilize their engines going forward and it will be less about what kind of optimizations that device driver devs can make.
 
Top