Editorial Closer to the Metal: Shader Intrinsic Functions

Raevenlord · Oct 19, 2016

Shader intrinsic functions stand as a partial solution for granting developers more control over existing computational resources and how they are leveraged. This capability (much touted by AMD as a performance-enhancing feature on their GCN-based products) essentially exposes features and capabilities that exist on the hardware developers are programming for, but wouldn't generally be able to access. This can happen either because they're being abstracted by a high-level API (Application Programming Interface, like DX11), or because the API isn't functionally able to access them. To understand why high-level APIs such as DX11 don't usually offer support for a piece of hardware's full feature list, or full processing capabilities, we must first look at the basic architecture of a given computer system.

As you can see, there are usually multiple layers a given task must go through in order for it to be processed at a hardware level. You might be wondering why do we even need so many layers in the first place, and why wasn't this enabled before. There are many technical reasons for this, but one of the strongest is simply the breadth of different hardware available for your buying and assembling pleasure. Unlike the console ecosystem, where hardware is fixed and, as a result, predictable in its performance metrics and command execution, the PC ecosystem is fractured in countless hardware combinations. You may have an AMD, CMT-enabled (Clustered Multi-Threading) FX-8350, an SMT-enabled (Simultaneous Multi-Threading) i7 6700K or anything in between, paired with a GCN RX 480 or a Pascal GTX 1070… And all that hardware has particularities in regards as to how it processes the same task, and the type of commands you need to input in order to get a given result. So, DX11, DX12 and Vulkan serve as what we call an abstraction layer.

Abstraction layers essentially simplify the programmer's work - they "hide" and automate a given command's underlying processes, particular implementation and hardware-specific code paths, so that the programmer only has to worry about what commands he wants to use - and voila. The high-level API converts a given command (let's imagine, for simplicity's sake, "draw frame") into its equivalent, non-abstracted, hardware code, and runs it with a good-enough optimization on most hardware to deliver those awesome (insert your favorite game here) frames. To elaborate a little: imagine you have a command called "Stack". On a high-level API like DX11, this command will be interpreted and values for its inner workings wil be automatically given, based on general hardware compatibility: how many levels to stack, when to stack them, and when to stop the operation. But since these aren't optimized, your hardware will use somewhat of a brute-force approach. With a low-level API, developers can now set the exact values for the "Stack" command's inner workings, optimized for your hardware, so it never goes out of budget, and none of those sexy stream processors are left idle.

The problem with the former, high-level approach, of course, is that generalizations and simplifications aren't as efficient as running an optimized, hardware-specific code path, and may sometimes even deny access to hardware features over lack of support from the high-level API. The thing with DX12 and Vulkan's low-level capabilities is that with them, in specific scenarios, developers can mostly ignore abstraction layers (some compiler checks are still used to make sure the code is within expected parameters). This allows them to code so as to take advantage of hardware-specific features, sometimes accelerating workloads by up to 2x compared to the high-level approach. This is the basic principle of low-level API's: and something that is enabled, at least partially, by shader intrinsic functions.

Going back to the different layers on a system, imagine, for argument's sake, that it takes 5 ms for a task to compute and go through each of the layers until it is executed by the hardware - in the example image given above, that would mean 5x 5ms = 25 ms. And now imagine you can effectively avoid going through all those figurative hoops, going straight from the app's hardware processing requirements to the hardware. You now have reduced your 25 ms computation to a mere 10 ms, which frees up computation time for other tasks. This is what shader intrinsic functions really are: pieces of code that when recognized by the low-level API, are allowed to move directly towards the hardware, bypassing other, time-consuming layers.

The problem with this approach must seem obvious to you by now: while abstraction layers do add overhead to any given computing task, they do so while simplifying, sometimes by orders of magnitude, the coding process. Closer to the metal programming has in its greatest strength what also amounts to its greatest flaw: the ability to directly leverage hardware resources needs specific, time-consuming programming for the functions that were largely automatic before. This not only means more developer resources, but also a system that is more prone to errors: and debugging five lines of code is very different from debugging fifty lines of it. One must also keep in mind that closer to the metal programming, on behalf of it targeting more specifically only a subset of existing hardware, ends up leaving behind users of older, unsupported hardware.

AMD's specific application of shader intrinsic functions in low-level graphics APIs such as Vulkan and DX12 stem from AMD's grasp on the console market (with their CPUs and GPUs powering all three current-generation games consoles), as well as their previous work on Mantle, which went on to become embedded in today's Vulkan library, and arguably gave Microsoft the push it needed to include low-level access to their DX12. This means that programmers are already leveraging optimized, feature-specific code paths in their console game implementations, which in turn, leads to AMD wanting to give them access to those same features on the PC hardware that supports it, reaping the benefits of hardware-specific optimizations for their GCN architecture. That said, this doesn't mean NVIDIA doesn't have their own shader intrinsic functions that developers can take advantage of: through their GameWorks initiative, NVIDIA allows programmers to add extensions not natively supported by DX's HLSL (High Level Shading Language), while also allowing shader intrinsic functions to be leveraged as part of their CUDA ecosystem. An important distinction between the two companies' approach is that while NVIDIA requires developers to use their specific GamesWorks libraries (which are proprietary, and not accessible on AMD's cards), AMD's approach is more open, being accessible in open standards such as GPUOpen and Vulkan's libraries.

Shader intrinsics are just a part of what a low-level API needs to be, and aren't particularly game-changing in and of themselves. That said, shader intrinsics will never be at their best on PC hardware, simply because of how the ecosystem is fractured by the amount of possible, updated or not-so-up-to-date systems. The best part of PC gaming is also, in this case and at this point in time, its greatest drawback towards obtaining perfect performance from any given system. But shader intrinsics are indeed a step forward towards giving developers more control over the features they implement and how they are run, and stand side by side with other technologies which will, in time, steer us towards ever more performant systems.

View at TechPowerUp Main Site

Ferrum Master · Oct 19, 2016

Where I can give this man a beer?

the54thvoid · Oct 19, 2016

Ferrum Master said:
Where I can give this man a beer?

Assumptions...... could be a woman.

Though yes, the term 'lord' is in the name.

Steevo · Oct 19, 2016

Well shave my legs and call me Harry, that was a well written piece.

It worth pointing out the difference in open architecture that AMD is offering VS the walled garden of Nvidia. One has to wonder how long that garden will be closed as the move towards a common architecture between both companies and DX12 will reveal code path to developers and many others as it becomes an increasingly thin shim between the OS and hardware.

Folterknecht · Oct 19, 2016

Nice!

If you guys continue with these kind of articels, it would be good imo to create a seperate section for them with a direct link somewhere in the top bar between Home-Reviews-Forum. Would be a shame for all the effort that goes into such a piece, to just get forgotten between news articels.

Raevenlord · Oct 19, 2016

Ferrum Master said:
Where I can give this man a beer?

A digital one will suffice :toast:

TPU's emojis to the rescue! Thanks though.

the54thvoid said:
Assumptions...... could be a woman.

Though yes, the term 'lord' is in the name.

The devil, as they say, is in the details. Good catch, kind sir.

Raevenlord · Oct 19, 2016

Steevo said:
Well shave my legs and call me Harry, that was a well written piece.

It worth pointing out the difference in open architecture that AMD is offering VS the walled garden of Nvidia. One has to wonder how long that garden will be closed as the move towards a common architecture between both companies and DX12 will reveal code path to developers and many others as it becomes an increasingly thin shim between the OS and hardware.

Thanks, man :peace:

That is indeed a relevant distinction. I'll try and sprinkle it on the piece :toast:

R-T-B · Oct 19, 2016

I've tried to detail this to users of TPU in the past, but never had the gumption to do a full writeup. Props.

Hood · Oct 19, 2016

This is great stuff. Thank you!

Rockarola · Oct 19, 2016

Thank you, this was like a drink of cold water in the desert! I really hope that we will see more of these articles in the future...informative, factual and well written.

TheGuruStud · Oct 20, 2016

Steevo said:
Well shave my legs and call me Harry, that was a well written piece.

It worth pointing out the difference in open architecture that AMD is offering VS the walled garden of Nvidia. One has to wonder how long that garden will be closed as the move towards a common architecture between both companies and DX12 will reveal code path to developers and many others as it becomes an increasingly thin shim between the OS and hardware.

I think nvidia saw the writing on the wall with Vulkan. Why else would they be so quick to get onboard?

Totally · Oct 20, 2016

TheGuruStud said:
I think nvidia saw the writing on the wall with Vulkan. Why else would they be so quick to get onboard?

Nope, something is being offered at no detriment, common sense says take advantage of it and that they are doing. It also let's the plug whatever proprietary 'useful if it went mainstream, useless for the vast majority otherwise" tech as a bullet point on their list of features.

TheGuruStud · Oct 20, 2016

Totally said:
Nope, something is being offered at no detriment, common sense says take advantage of it and that they are doing. It also let's the plug whatever proprietary 'useful if it went mainstream, useless for the vast majority otherwise" tech as a bullet point on their list of features.

But that would mean using something that AMD wants to use! *hiss*

Normally, they would ignore it and say it's useless and no one will use it. After a few years, they would be drug kicking and screaming into compliance. I think they know it will be the API with its design to be universal and big names behind it.

Prima.Vera · Oct 20, 2016

Raevenlord said:
Closer to the metal programming has in its greatest strength what also amounts to its greatest flaw: the ability to directly leverage hardware resources needs specific, time-consuming programming for the functions that were largely automatic before. This not only means more developer resources, but also a system that is more prone to errors: and debugging five lines of code is very different from debugging fifty lines of it. One must also keep in mind that closer to the metal programming, on behalf of it targeting more specifically only a subset of existing hardware, ends up leaving behind users of older, unsupported hardware.

Actually I wanted to ask you about this while reading the article, but you summarize it nicely. So basically, the closer you get to metal programing, the more you are segregating different GPU architectures and techniques. To be honest, as much as I love going closer to directly programing the GPU with skipping layers, no sure is actually such a good idea. This not only involves wasting more time into programing drivers for each GPU, but also the game developers needs to waste A LOT more time in developing different programing techniques for different GPU, therefore increasing developing time and cost.
Just my 50 cents...

DeathtoGnomes · Oct 20, 2016

Nvidia fought this, they lost. Open libraries are going to make it easier to adapt and create even better more efficient libraries. Nvidia will lag behind because of the desire for proprietary software, they just dont want to share the spotlight, which could lead to its downfall unless they jump in with two feet.

I dont see why we cant benefit and eat the cake too. Nvidia, IMO, is holding us back now.

efikkan · Oct 20, 2016

While it's nice to see some editorial content instead of simply recirculation of other's content, it would be nice if the author demonstrated a good understanding of the subject.

Raevenlord said:
Shader intrinsic functions stand as a partial solution for granting developers more control over existing computational resources and how they are leveraged. This capability (much touted by AMD as a performance-enhancing feature on their GCN-based products) essentially exposes features and capabilities that exist on the hardware developers are programming for, but wouldn't generally be able to access.

This is just AMD's PR department inventing a new name to an old feature, like most of the other stuff in their "GPUOpen" initiative. Writing hardware specific shader programs in assembly has been possible for a long time, in fact I remember this before HLSL and GLSL was even a thing, like 15 years ago. Using GPU shader features in assembly code has been possible which each new hardware generation.

Raevenlord said:
This can happen either because they're being abstracted by a high-level API (Application Programming Interface, like DX11), or because the API isn't functionally able to access them. To understand why high-level APIs such as DX11 don't usually offer support for a piece of hardware's full feature list, or full processing capabilities, we must first look at the basic architecture of a given computer system.

You are mixing the API calls with the GPU shader programs. While APIs usually bundles API calls and a shader language, they constitutes separate parts of the rendering; one executing in the CPU and one in the GPU. Of course both may provide some degree of abstractions and vendor specific features.
- APIs such as Direct3D, OpenGL, Vulkan, and vendor specific ones like Mantle and Cg, all provide a set of API calls which serves as an interface between the game and the driver.
- Shader programs are pieces of code executing in the GPU cores. Shader programs are usually written in a high level language like HLSL or GLSL, converted to an IR for distribution and compiled to machine code by the driver. Shader intrinsic functions is all about writing hardware specific shader programs directly in assembly, potentially creating more optimal code. It has nothing to do with potential abstractions in the API calls on the CPU side.

Raevenlord said:
As you can see, there are usually multiple layers a given task must go through in order for it to be processed at a hardware level. You might be wondering why do we even need so many layers in the first place, and why wasn't this enabled before. There are many technical reasons for this, but one of the strongest is simply the breadth of different hardware available for your buying and assembling pleasure.

This illustration does not match rendering at all, this is 100% wrong.

Rendering is done by using a number of API calls to build and control what we call a pipeline. Traditionally we had what we call a "fixed pipeline", allowing the programmer to only enable and slightly adjust some hardware implemented features. Back then there were no shader programs, everything was done through a huge number of API calls, and yet it was not very flexible.

Shader programs allows the programmer to implement parts of the rendering pipeline themselves. The pipeline is still controlled by API calls, but stages of the pipline can be customized to a large extent, allowing the developer to implement vertex manipulations, lightning effects, fog, transparency, texture blending, and post-processing effects like blur themselves.

The term "shader program" is actually quite confusing, but in modern rendering it refers to pieces of code executing on the GPU, which can do geometry, compute and more. Initially it was primarily used for creating shading effects, so the name has stuck. (Actually, the term is also used for non-GPU shading code used in various 3D modelling programs dating all the way back to the late 80s.)

Getting back to your claims, the GPU shader code executes directly inside the GPU. The "OS and applications, "kernel", etc. has nothing to do with this. The only abstraction involved is the transition from a high level shading language to assembly.
As a little side note; a customized shader might of course require adjustments in the API calls used.

Despite all the actors describing Direct3D 12 and Vulkan as "low level APIs", it's important to understand what is meant by "low level features". These APIs leverages greater control over the internals of how the driver manages the queue, allocations, etc., but it does not greater abilities to control this on the GPU side, like the pipeline flow, GPU threads/internal scheduling, etc. So in terms of GPU features exposed to shader programs, the new APIs currently bring nothing new. I'm hoping the next iteration of APIs will do this; move more flexibility to the shaders.

Raevenlord said:
Abstraction layers essentially simplify the programmer's work - they "hide" and automate a given command's underlying processes, particular implementation and hardware-specific code paths, so that the programmer only has to worry about what commands he wants to use...

We all know there can at times be great benefits from optimizing the pipeline and/or shaders for specific hardware, but usually it's a matter of resources. Most game developers don't even prioritize writing a decent pipeline in the first place, so doing these tweaks should not be the primary concern.

Talking of abstractions, most games have a much larger cause of overhead: the engine itself. Let's take a much hyped game like AofS, using like 100.000 API calls to render a pretty basic scene. Any graphics programmer would know they could well known techniques like instancing and batching to improve the performance by a factor of 10. Rendering with a high number of API calls is certainly the most inefficient way to utilize a GPU, customizing the pipeline or the shaders with such major "defects" is basically "putting lipstick on a pig".

Raevenlord said:
Going back to the different layers on a system, imagine, for argument's sake, that it takes 5 ms for a task to compute and go through each of the layers until it is executed by the hardware - in the example image given above, that would mean 5x 5ms = 25 ms. And now imagine you can effectively avoid going through all those figurative hoops, going straight from the app's hardware processing requirements to the hardware. You now have reduced your 25 ms computation to a mere 10 ms, which frees up computation time for other tasks. This is what shader intrinsic functions really are: pieces of code that when recognized by the low-level API, are allowed to move directly towards the hardware, bypassing other, time-consuming layers.

As mentioned, that description has nothing to do with how rendering works, tasks does not propagate through the levels as you described. Regardless of which API a game use, the API is the interface towards the driver, which in turn sends native commands to the GPU. Each command doesn't propagate through the levels causing the program to wait for the result. Ever since conception, both Direct3D and OpenGL has been designed as async APIs*. The game builds what we call a queue (which builds up the pipeline) and dispatches it to the driver, the game then continues to build the queue for the next frame while the driver is feeding the GPU.
*) Not to be confused with the unrelated feature "async compute".

So your calculation of 5 × 5 ms = 25 ms have no relation to reality. And even if a game used 25 ms for compute, the whole frame would probably take more than 100 ms, resulting in less than 10 FPS, so this overhead clearly does not exist as you described.

Raevenlord said:
AMD's specific application of shader intrinsic functions in low-level graphics APIs such as Vulkan and DX12 stem from AMD's grasp on the console market (with their CPUs and GPUs powering all three current-generation games consoles), as well as their previous work on Mantle, which went on to become embedded in today's Vulkan library, and arguably gave Microsoft the push it needed to include low-level access to their DX12.

That's quite a few mistakes in a single sentence.
1) Hardware specific shaders and API features have existed for many years, that's not new.
2) Direct3D 12 was in the works since 2010/2011, Mantle originated from early Direct3D 12 work, not the other way around.
3) Vulkan is built on SPIR-V. It got some inspiration from Mantle in terms of the front-end, but the underlying architecture is derived from SPIR.

Raevenlord said:
An important distinction between the two companies' approach is that while NVIDIA requires developers to use their specific GamesWorks libraries (which are proprietary, and not accessible on AMD's cards), AMD's approach is more open, being accessible in open standards such as GPUOpen and Vulkan's libraries

That's not true at all. Both vendors have open and proprietary parts. Gameworks is mostly open, while some requires an NDA. Almost all of it runs on AMD GPUs, claiming otherwise is untrue. They also provide the most extensive collection of examples and best practices for modern graphics development.
And GPUOpen is no "open standard" at all, it's mostly a collection of renamed tools and libraries which has existed for years. And do I need to remind you which vendor who were the last to provide Vulkan support? And still to this date fail to provide stable OpenGL support.
No vendor is even close to perfect, but there is no doubt that no one has done more to promote open standards than Nvidia. So please show some professionalism, and stop painting the picture as one being the champion of openness while the other being the evil proprietary one.

lorraine walsh · Oct 20, 2016

Will it allow for that same better cinematic FPS of 30 ?

That's what I've always wanted on my PC..........the "better" lower FPS.

Xuper · Oct 20, 2016

efikkan said:
That's not true at all. Both vendors have open and proprietary parts. Gameworks is mostly open, while some requires an NDA. Almost all of it runs on AMD GPUs, claiming otherwise is untrue. They also provide the most extensive collection of examples and best practices for modern graphics development.
And GPUOpen is no "open standard" at all, it's mostly a collection of renamed tools and libraries which has existed for years.

Care to explain What is Open standard? GPUOpen is base on MIT License while Gameworks is not , even Developer Can not change source code if he/she wants to optimise Source code to run better on rival hardware.remind you Gameworks runs Shiity on AMD cards rather than Nvidia cards! If you believe Open Means looking at source without changing source then don't bother to reply me.I don't believe this.

efikkan said:
do I need to remind you which vendor who were the last to provide Vulkan support? And still to this date fail to provide stable OpenGL support.

also, Nvidia has issue with Vulkan and DX12 Support.look at AtoS , It took them for a while to beat Fury X with GTX980Ti and Also Doom.where Is Day 1 Driver for BF1 DX12?
I really Hope AMD drops OpenGL Support.OpenGL has belonged to Past and needs to die.

R-T-B · Oct 20, 2016

behrouz said:
also, Nvidia has issue with Vulkan and DX12 Support

They really only have an issue with async compute.

If you read the article above, you'd realize you can't really have an issue with something that's "low level," only implementations within it.

I agree about open standards though. Except for the part about dropping OpenGL support. I like my old games to run, yo. That and almost ALL linux ports rely on it. It's hardly obsolete at this point, just being phased out. You're lining the accused up for the firing squad before you've even read them their rights.

Ferrum Master · Oct 20, 2016

Here we go again.

Raevenlord · Oct 20, 2016

efikkan said:
While it's nice to see some editorial content instead of simply recirculation of other's content, it would be nice if the author demonstrated a good understanding of the subject.

First, I would like to thank you for the lenght and minutiae of your post. This clearly means something to you, and your contribution and time dedicated to this deserve praise. Most of your criticism is constructive, and will allow readers to see into deeper detail what I tried to convey on the piece.

That said, you should have kept in mind that this isn't supposed to be neither a deep dive, nor a white paper. This is simply trying to explain in some more detail what exactly is meant by these shader intrinsic functions. So, you should look at this piece as an abstraction layer unto itself, not as an be-all-end-all exploration. Some inaccuracies are inevitable.

efikkan said:
This is just AMD's PR department inventing a new name to an old feature, like most of the other stuff in their "GPUOpen" initiative. Writing hardware specific shader programs in assembly has been possible for a long time, in fact I remember this before HLSL and GLSL was even a thing, like 15 years ago. Using GPU shader features in assembly code has been possible which each new hardware generation.

You are right, of course, and I did mention HLSL, though I'm not familiar with previous implementations of the subject. And while this is, obviously, a PR spin, I think AMD deserves to do it, based on the fact that it is now much more relevant than it was before, simply based on the current architecture proximity between consoles and PC. You just have to look at XBOX 360's architecture and compare it to the XBOX One's or PS4's to see that today, GPUs in consoles are much closer to their PC counterparts than ever before. That is why I agree that this subject has more relevance now, and why I accept that AMD spins it that way.

efikkan said:
You are mixing the API calls with the GPU shader programs. While APIs usually bundles API calls and a shader language, they constitutes separate parts of the rendering; one executing in the CPU and one in the GPU. Of course both may provide some degree of abstractions and vendor specific features.
- APIs such as Direct3D, OpenGL, Vulkan, and vendor specific ones like Mantle and Cg, all provide a set of API calls which serves as an interface between the game and the driver.
- Shader programs are pieces of code executing in the GPU cores. Shader programs are usually written in a high level language like HLSL or GLSL, converted to an IR for distribution and compiled to machine code by the driver. Shader intrinsic functions is all about writing hardware specific shader programs directly in assembly, potentially creating more optimal code. It has nothing to do with potential abstractions in the API calls on the CPU side.

Like I said above, thank you for this. If anyone wants to, they can read this and better understand what are the underlying systems.

efikkan said:
This illustration does not match rendering at all, this is 100% wrong.

So your calculation of 5 × 5 ms = 25 ms have no relation to reality. And even if a game used 25 ms for compute, the whole frame would probably take more than 100 ms, resulting in less than 10 FPS, so this overhead clearly does not exist as you described.

It isn't 100% wrong, since it isn't meant to match rendering. This is simply so that readers can understand what is meant by layers, and how a "given computer system" operates. I never claimed it to be graphics-related. It just serves to show that there are usually underlying processes between the OS and the hardware executing code. I am fully aware that 25ms is impossibly huge, since for VR, for example, a single frame must be rendered at around 13.3ms for achieving the 90fps threshold. Like I said, "imagine, for argument's sake". It's an abstraction. Thank you again for the rest of your write-up, as it again goes into more detail than I wanted to in this piece, but is still very much relevant to the subject at hand.

efikkan said:
No vendor is even close to perfect, but there is no doubt that no one has done more to promote open standards than Nvidia. So please show some professionalism, and stop painting the picture as one being the champion of openness while the other being the evil proprietary one.

I won't even dignify that with an answer. Just re-read what I wrote and you'll see how that was completely blown out of proportion and uncalled for.

Steevo · Oct 20, 2016

Nvidia has done everything to keep their stuff locked down,

efikkan said:
That's not true at all. Both vendors have open and proprietary parts. Gameworks is mostly open, while some requires an NDA. Almost all of it runs on AMD GPUs, claiming otherwise is untrue. They also provide the most extensive collection of examples and best practices for modern graphics development.
And GPUOpen is no "open standard" at all, it's mostly a collection of renamed tools and libraries which has existed for years. And do I need to remind you which vendor who were the last to provide Vulkan support? And still to this date fail to provide stable OpenGL support.
No vendor is even close to perfect, but there is no doubt that no one has done more to promote open standards than Nvidia. So please show some professionalism, and stop painting the picture as one being the champion of openness while the other being the evil proprietary one.

http://gpuopen.com/compute-product/amd-openvx/

https://github.com/GPUOpen-Effects/TressFX/releases/tag/v3.1.1

Holy shit, source code!!!! Free to modify.

VS

http://docs.nvidia.com/gameworks/content/artisttools/hairworks/HairWorks_sdkSamples.html

path = "NvHairWorksDx11.win64.D.dll"

yayyy, they give us .dll's..... casue having the dll is the same as source code right? Like how MS releases their source code with every OS, and every program doesn't include .dll's? Right?

Cause when you download and agree to

""NVIDIA GameWorks SDK" means the set of instructions for computers, in executable form only and in any media (which may include diskette, CD-ROM, downloadable internet, hardware, or firmware) comprising NVIDIA's proprietary Software Development Kit and related media and printed materials, including reference guides, documentation, and other manuals, installation routines and support files, libraries, sample art files and assets, tools, support utilities and any subsequent updates or adaptations provided by NVIDIA, whether with this installation or as separately downloaded (unless containing their own separate license terms and conditions)."

"
In addition, you may not and shall not permit others to:

I. modify, reproduce, de-compile, reverse engineer or translate the NVIDIA GameWorks SDK; or
II. distribute or transfer the NVIDIA GameWorks SDK other than as part of the NVIDIA GameWorks Application."

"
3. Redistribution; NVIIDA GameWorks Applications. Any redistribution of the NVIDIA GameWorks SDK (in accordance with Section 2 above) or portions thereof must be subject to an end user license agreement including language that

a) prohibits the end user from modifying, reproducing, de-compiling, reverse engineering or translating the NVIDIA GameWorks SDK;
b) prohibits the end user from distributing or transferring the NVIDIA GameWorks SDK other than as part of the NVIDIA GameWorks Application;"

qubit · Oct 20, 2016

Well, I'm surprised that there's a difference in featureset between the hardware and the API, as I thought the two were developed together.

So, for a GTX 580 for example, it supports DX11.0, so I'd expect the GF110 GPU in it to support this and not have any features "left over", or conversely, not fully support all of DX11.0's features.

I get that the hardware could support some unofficial and undocumented features not in the API allowing for "trick shot" special effects, but these shouldn't be significant.

Great article raevenlord.

Ferrum Master · Oct 20, 2016

qubit said:
all of DX11.0's features.

Just a mess with naming... Direct X has always been like that. Remember DX8, 8a, 8b, 8.1 DX9 abc etc... And now everyone is confused... well... just as always...

Raevenlord · Oct 20, 2016

qubit said:
Great article raevenlord.

Thanks, qubit

System Name	The Ryzening
Processor	AMD Ryzen 9 5900X
Motherboard	MSI X570 MAG TOMAHAWK
Cooling	Lian Li Galahad 360mm AIO
Memory	32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s)	Gigabyte RTX 3070 Ti
Storage	Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s)	Acer Nitro VG270UP (1440p 144 Hz IPS)
Case	Lian Li O11DX Dynamic White
Audio Device(s)	iFi Audio Zen DAC
Power Supply	Seasonic Focus+ 750 W
Mouse	Cooler Master Masterkeys Lite L
Keyboard	Cooler Master Masterkeys Lite L
Software	Windows 10 x64

System Name	HELLSTAR
Processor	AMD RYZEN 9 5950X
Motherboard	ASUS Strix X570-E
Cooling	2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory	4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s)	Sapphire Pulse RX 7900XTX + under waterblock.
Storage	Optane 900P[W11] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO[FEDORA]
Display(s)	Philips PHL BDM3270 + Acer XV242Y
Case	Lian Li O11 Dynamic EVO
Audio Device(s)	Sound Blaster ZxR
Power Supply	Fractal Design Newton R3 1000W
Mouse	Razer Basilisk
Keyboard	Razer BlackWidow V3 - Yellow Switch
Software	FEDORA 39 / Windows 11 insider

Processor	Ryzen 7800X3D
Motherboard	MSI MAG Mortar B650 (wifi)
Cooling	be quiet! Dark Rock Pro 4
Memory	32GB Kingston Fury
Video Card(s)	Gainward RTX4070ti
Storage	Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s)	LG 32" 165Hz 1440p GSYNC
Case	Asus Prime AP201
Audio Device(s)	On Board
Power Supply	be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software	W10

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s)	56" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

System Name	Karl Arsch v. u. z. Abgewischt
Processor	i5 3770K @5GHz delided
Motherboard	ASRock Z77 Professional
Cooling	Arctic Liquid Freezer 240
Memory	4x 4GB 1866 MHz DDR3
Video Card(s)	GTX 970
Storage	Samsung 830 - 512GB; 2x 2TB WD Blue
Display(s)	Samsung T240 1920x1200
Case	Bitfenix Shinobie XL
Audio Device(s)	onboard
Power Supply	Cougar G600
Mouse	Logitech G500
Keyboard	CMStorm Ultimate QuickFire (CherryMX Brown)
Software	Win7 Pro 64bit

Editorial Closer to the Metal: Shader Intrinsic Functions

Raevenlord

News Editor

Ferrum Master

the54thvoid

Intoxicated Moderator

Steevo

Folterknecht

Raevenlord

News Editor

Raevenlord

News Editor

R-T-B

Hood

Rockarola

TheGuruStud

Totally

TheGuruStud

Prima.Vera

DeathtoGnomes

efikkan

lorraine walsh

Xuper

R-T-B

Ferrum Master

Raevenlord

News Editor

Steevo

qubit

Overclocked quantum bit

Ferrum Master

Raevenlord

News Editor

System Name	Pioneer
Processor	Ryzen R9 7950X
Motherboard	GIGABYTE Aorus Elite X670 AX
Cooling	Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory	64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64

System Name	DEVIL'S ABYSS
Processor	i7-4790K@4.6 GHz
Motherboard	Asus Z97-Deluxe
Cooling	Corsair H110 (2 x 140mm)(3 x 140mm case fans)
Memory	16GB Adata XPG V2 2400MHz
Video Card(s)	EVGA 780 Ti Classified
Storage	Intel 750 Series 400GB (AIC), Plextor M6e 256GB (M.2), 13 TB storage
Display(s)	Crossover 27QW (27"@ 2560x1440)
Case	Corsair Obsidian 750D Airflow
Audio Device(s)	Realtek ALC1150
Power Supply	Cooler Master V1000
Mouse	Ttsports Talon Blu
Keyboard	Logitech G510
Software	Windows 10 Pro x64 version 1803
Benchmark Scores	Passmark CPU score = 13080

System Name	Royal Fortune (Main)/Adventure Galley (NAS)/Little Ranger (HTPC)
Processor	Intel i5 4460/AMD C-70/Intel Pentium G3258 Anniversary Ed.
Motherboard	Gigabyte ga-z97x-gaming 5/Asrock C-70M1/Asrock Z97 Anniversary
Cooling	Phanteks PH-TC12DX/Stock/Raijintek Triton Core
Memory	8GB Team Group Dark 1600 CL9/8GB Team Group Elite 1600 CL9/8GB Avexir Core 1600
Video Card(s)	VTX3D R9 280X 3GB/APU/Palit GTX 750 TI StormX Duo
Storage	120GB Team Group Ultra L5 SSD + 1TB WD Black/4 X 2TB WD Blue/120 GB Kingston V300
Display(s)	Dell 2310/AOC e2070Swn 19.5"/TV
Case	In Win 707/Bitfenix Prodigy M/Dimastech Easy V3
Audio Device(s)	N/A
Power Supply	EVGA Supernova GS 650W/be quiet! System Power 7 350W/Xigmatek Maverick 400W
Mouse	Logitech G303 Daedalus Apex/Razer Abyssus/-
Keyboard	Corsair K70 Red/Steelseries Apex Raw/Logitech K400
Software	Win10/FreeNAS 9.3/KodiBuntu

Processor	OCed 5800X3D
Motherboard	Asucks C6H
Cooling	Air
Memory	32GB
Video Card(s)	OCed 6800XT
Storage	NVMees
Display(s)	32" Dull curved 1440
Case	Freebie glass idk
Audio Device(s)	Sennheiser
Power Supply	Don't even remember

System Name	Miami
Processor	Ryzen 3800X
Motherboard	Asus Crosshair VII Formula
Cooling	Ek Velocity/ 2x 280mm Radiators/ Alphacool fullcover
Memory	F4-3600C16Q-32GTZNC
Video Card(s)	XFX 6900 XT Speedster 0
Storage	1TB WD M.2 SSD/ 2TB WD SN750/ 4TB WD Black HDD
Display(s)	DELL AW3420DW / HP ZR24w
Case	Lian Li O11 Dynamic XL
Audio Device(s)	EVGA Nu Audio
Power Supply	Seasonic Prime Gold 1000W+750W
Mouse	Corsair Scimitar/Glorious Model O-
Keyboard	Corsair K95 Platinum
Software	Windows 10 Pro

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

System Name	Dumbass
Processor	AMD Ryzen 7800X3D
Motherboard	ASUS TUF gaming B650
Cooling	Artic Liquid Freezer 2 - 420mm
Memory	G.Skill Sniper 32gb DDR5 6000
Video Card(s)	GreenTeam 4070 ti super 16gb
Storage	Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s)	1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case	Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s)	onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply	Corsair HX1000i
Mouse	Steeseries Esports Wireless
Keyboard	Corsair K100
Software	windows 10 H
Benchmark Scores	https://i.imgur.com/aoz3vWY.jpg?2

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	Home PC
Processor	Ryzen 5900X
Motherboard	Asus Prime X370 Pro
Cooling	Thermaltake Contac Silent 12
Memory	2x8gb F4-3200C16-8GVKB - 2x16gb F4-3200C16-16GVK
Video Card(s)	XFX RX480 GTR
Storage	Samsung SSD Evo 120GB -WD SN580 1TB - Toshiba 2TB HDWT720 - 1TB GIGABYTE GP-GSTFS31100TNTD
Display(s)	Cooler Master GA271 and AoC 931wx (19in, 1680x1050)
Case	Green Magnum Evo
Power Supply	Green 650UK Plus
Mouse	Green GM602-RGB ( copy of Aula F810 )
Keyboard	Old 12 years FOCUS FK-8100

System Name	Quantumville™
Processor	Intel Core i7-2700K @ 4GHz
Motherboard	Asus P8Z68-V PRO/GEN3
Cooling	Noctua NH-D14
Memory	16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s)	MSI RTX 2080 SUPER Gaming X Trio
Storage	Samsung 850 Pro 256GB \| WD Black 4TB \| WD Blue 6TB
Display(s)	ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) \| Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case	Cooler Master HAF 922
Audio Device(s)	Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply	Corsair AX1600i
Mouse	Microsoft Intellimouse Pro - Black Shadow
Keyboard	Yes
Software	Windows 10 Pro 64-bit