Monday, August 31st 2015

Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for DirectX 12

It turns out that NVIDIA's "Maxwell" architecture has an Achilles' heel after all, which tilts the scales in favor of competing AMD Graphics CoreNext architecture, in being better prepared for DirectX 12. "Maxwell" lacks support for async compute, one of the three highlight features of Direct3D 12, even as the GeForce driver "exposes" the feature's presence to apps. This came to light when game developer Oxide Games alleged that it was pressured by NVIDIA's marketing department to remove certain features in its "Ashes of the Singularity" DirectX 12 benchmark.

Async Compute is a standardized API-level feature added to Direct3D by Microsoft, which allows an app to better exploit the number-crunching resources of a GPU, by breaking down its graphics rendering tasks. Since NVIDIA driver tells apps that "Maxwell" GPUs supports it, Oxide Games simply created its benchmark with async compute support, but when it attempted to use it on Maxwell, it was an "unmitigated disaster." During to course of its developer correspondence with NVIDIA to try and fix this issue, it learned that "Maxwell" doesn't really support async compute at the bare-metal level, and that NVIDIA driver bluffs its support to apps. NVIDIA instead started pressuring Oxide to remove parts of its code that use async compute altogether, it alleges.
"Personally, I think one could just as easily make the claim that we were biased toward NVIDIA as the only "vendor" specific-code is for NVIDIA where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that NVIDIA does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports," writes Oxide, in a statement disputing NVIDIA's "misinformation" about the "Ashes of Singularity" benchmark in its press communications (presumably to VGA reviewers).

Given its growing market-share, NVIDIA could use similar tactics to keep game developers away from industry-standard API features that it doesn't support, and which rival AMD does. NVIDIA drivers tell Windows that its GPUs support DirectX 12 feature-level 12_1. We wonder how much of that support is faked at the driver-level, like async compute. The company is already drawing flack for using borderline anti-competitive practices with GameWorks, which effectively creates a walled garden of visual effects that only users of NVIDIA hardware can experience for the same $59 everyone spends on a particular game. Sources: DSOGaming, WCCFTech
Add your own comment

196 Comments on Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for DirectX 12

#151
Sony Xperia S
Aquinus
He said nVidia is the Apple of. He wasn't talking about Apple. He went on to say (talking about nVidia)
Really, I never knew and actually don't wanna know that this fruit the apple is so divine. :laugh:

Seriously, how would I have known that ? When this is the first time I hear someone speaking like that ?
Posted on Reply
#152
Aquinus
Resident Wat-man
Sony Xperia S
Really, I never knew and actually don't wanna know that this fruit the apple is so divine. :laugh:

Seriously, how would I have known that ? When this is the first time I hear someone speaking like that ?
Then maybe you should learn to read before making assumptions about what people are saying. Considering this is not a thread about Apple, you should have been able to put one and one together to make two.

Remember how I said:
Aquinus
Your only digging yourself a deeper hole...
Aquinus
This is the nice way of me telling you to shut up and stop posting bullshit but, it appears that I needed to spell that out for you.
Well, that all still stands and is only even more relevant now.
Posted on Reply
#153
Sony Xperia S
Remember how I said:



I am not in a hole, and I don't understand what exactly you are speaking about and how in hell you know what that person meant?

Are you threatening me or what?

My reading skills are ok. I am reading.
What I want to kindly ask you is to leave me alone without all the time analysing in a very negative way my posts and instead trying to respect my opinion.
Posted on Reply
#154
Aquinus
Resident Wat-man
Sony Xperia S
I am not in a hole, and I don't understand what exactly you are speaking about and how in hell you know what that person meant?
I can read. It doesn't take a rocket scientist to figure out what he was saying.
Sony Xperia S
My reading skills are ok. I am reading.
Then there is nothing further to discuss because I know English and I understood him just fine.
Sony Xperia S
Are you threatening me or what?
No, just pointing out that you've been pulling the thread off topic because you didn't understand what someone else posted.
Sony Xperia S
What I want to kindly ask you is to leave me alone without all the time analysing in a very negative way my posts and instead trying to respect my opinion.
Then maybe you should stay on topic like I said in the first place. If you want to be left alone, a public forum is not the place to be. Calling you out on BS is not persecution, it's called accountability.
Posted on Reply
#155
Captain_Tom
RejZoR
I think NVIDIA just couldn't be bothered with driver implementation till now because frankly, async compute units weren't really needed till now (or shall I say till DX12 games are here). Maybe drivers "bluff" the support just to prevent crashing if someone happens to try and use it now, but they'll implement it at later time properly. Until NVIDIA confirms that GTX 900 series have no async units, I call it BS.
Yeah and how long did it take them to admit the 970 has 3.5GB of VRAM? Heck they still haven't fully fessed up to it.
Posted on Reply
#156
cadaveca
My name is Dave
Uh ,FYI guys, on reddit is a thread with an apparent AMD guy saying that NO GPU ON THE MARKET TODAY is fully DX12 compliant. So...

What AMD does, NVidia doesn't. Also, vice versa.

Now, about that rumour that you could use NVidia and AMD GPUs together in the same system... would that somehow overcome these "issues"?
Posted on Reply
#157
EarthDog
It has 4GB of vram though... its just that the last .5GB are much slower. ;)
Posted on Reply
#158
EarthDog
Sony Xperia S
Anyways, you guys are so mean. I can't comprehend how it's even possible that such people exist.



Yes, and Image quality CHECK. ;)
Mean? How are we mean when we(I) shower you with facts? I like how you cherry pick the two good things (it was one actually) I mentioned, but completely disregard the rest yet still think its better.

Image quality? You need to prove that Sony...

You have your head shoved so far up AMD's ass you are crapping AMD BS human caterpillar style (THAT was the first mean thing I have said) and you don't even know it. Since TPU doesn't seem to want to perma ban this clown, I'm just going to put him on ignore. Have fun with this guy people. I can't take the nonsense anymore and risk getting in trouble myself.
Posted on Reply
#161
RejZoR
Well, from what I can see so far, NVIDIA is capable of doing async compute, just more limited by the queue scheduler. Still need to read further...
Posted on Reply
#162
FordGT90Concept
"I go fast!1!11!1!"
Sony Xperia S
That's called brainwashing. I have never seen any technological competetive advantages in apple's products compared to the competition. Actually, the opposite - they break like shit.
For the record, I just installed my PowerColor PCS+ 290X yesterday and first impressions are excellent. FurMark (100% load) only took it to 64C.
Posted on Reply
#163
rvalencia
RejZoR
Well, from what I can see so far, NVIDIA is capable of doing async compute, just more limited by the queue scheduler. Still need to read further...
Maxwellv2 is not capable of concurrent async + rendering without incurring context penalties and it's under this context that Oxdie made it's remarks.



cadaveca
Uh ,FYI guys, on reddit is a thread with an apparent AMD guy saying that NO GPU ON THE MARKET TODAY is fully DX12 compliant. So...
cadaveca
What AMD does, NVidia doesn't. Also, vice versa.

Now, about that rumour that you could use NVidia and AMD GPUs together in the same system... would that somehow overcome these "issues"?



Intel Xeon 18 CPU core per socket running DirectX12 reference driver is the full DirectX12 renderer. ;)
Posted on Reply
#164
FordGT90Concept
"I go fast!1!11!1!"
Direct3D has feature levels. 12.0 is basic DirectX 12 support which AMD GCN, Intel's iGPU, and NVIDIA all support. Maxwell has 12.1 support officially meaning the cards won't freak out if they see 12.1 instructions but all NVIDIA cards that support 12.0 will take a performance penalty when the software uses async compute. It supports it but it does a really bad job at supporting it.

I'm curious if Intel's iGPU takes a performance penalty when using async compute too.
Posted on Reply
#165
Sony Xperia S
FordGT90Concept
For the record, I just installed my PowerColor PCS+ 290X yesterday and first impressions are excellent. FurMark (100% load) only took it to 64C.
Excellent news ! It is great to hear you have a new card.

Why do you stress her under FurMark ?
Posted on Reply
#166
FordGT90Concept
"I go fast!1!11!1!"
Make sure it is stable and the temperatures are reasonable. I'm only keeping it installed for about a week then I'm going back to 5870 until I can get my hands on a 6700K. I need to make sure I don't have to RMA it.
Posted on Reply
#167
the54thvoid
Aquinus
I can read. It doesn't take a rocket scientist to figure out what he was saying.

Then there is nothing further to discuss because I know English and I understood him just fine.

No, just pointing out that you've been pulling the thread off topic because you didn't understand what someone else posted.

Then maybe you should stay on topic like I said in the first place. If you want to be left alone, a public forum is not the place to be. Calling you out on BS is not persecution, it's called accountability.
lol, I thought "why the hell is @Aquinus triple posting and wtf are these people talking about" then realised - ah it's him. I can't see their posts - still blocked to me - thankfully it seems. I agree with @EarthDog though - should be banned - simple as that.
Posted on Reply
#168
BiggieShady
rvalencia
Maxwellv2 is not capable of concurrent async + rendering without incurring context penalties and it's under this context that Oxdie made it's remarks.
That is a claim presented at the beginning of the article. Through the end, if you read it, it is proven in benchmark that it is not true (number of queues horizontally and time spent computing vertically - lower is better)

Maxwell is faster than GCN up to 32 queues, and it evens out with GCN to 128 queues, where GCN has same speed up to 128 queues.
It's also shown that with async shaders it's extremely important how they are compiled for each architecture.
Good find @RejZoR
Posted on Reply
#169
FordGT90Concept
"I go fast!1!11!1!"
Fermi and newer apparently can handle 31 async commands (jumps up at 32, 64, 96, 128) before the scheduler freaks out. GCN can handle 64 at which point it starts straining. GCN can handle far more async commands than Fermi and newer.

The question is how does this translate to the real world? How many async commands is your average game going to use? 31 or less? 1000s?
Posted on Reply
#170
Ikaruga
cadaveca
Uh ,FYI guys, on reddit is a thread with an apparent AMD guy saying that NO GPU ON THE MARKET TODAY is fully DX12 compliant. So...

What AMD does, NVidia doesn't. Also, vice versa.
This what I said in this thread a day earlier than that reddit post, but some people are in write only mode and don't actually read what others are saying.
Ikaruga
There is no misinformation at all, most of the dx12 features will be supported by software on most of the cards, there are no GPU on the market with 100% top tier dx12 support (and I'm not sure if the next generation will be one, but maybe). This is nothing but a very well directed market campaign to level the fields, but I expected more insight into this from some of the TPU vets tbh (I don't mind it btw, AMD needs all the help he can get anyways).
Posted on Reply
#171
BiggieShady
FordGT90Concept
The question is how does this translate to the real world? How many async commands is your average game going to use? 31 or less? 1000s?
The answer to that question is same as the answer to this question: How many different kinds of parallel tasks beside graphics can you imagine in game? Let's say that you don't want to animate leaves in the forest using only geometry shaders but you want real global wind simulation, and you use compute shader for that. Next you want wind on the water geometry, do you go with new async compute shader or append to existing one? As you can see the real world number for simultaneous async compute shaders is how many different kinds of simulations are we going to use: hair, fluids, rigid bodies, custom GPU accelerated AI ... all that would benefit from being in different async shader each, rather than having huge shader with bunch of branching (no branch prediction in gpu cores, even worse - gpu cores almost always execute both if-else paths)
All in all I'd say 32 is more than enough for gaming ... there might be benefit of more in pure compute
Ikaruga
most of the dx12 features will be supported by software on most of the cards, there are no GPU on the market with 100% top tier dx12 support (and I'm not sure if the next generation will be one, but maybe)
Point is good and all but let's not forget how we are here (mostly) very well used to difference between marketing badges on colorful boxes and spotty support for a new API. Major game engine developers will find a well supported feature subset on both architectures and use them ... hopefully every major engine will have optimized code paths for each architecture and automatic fallback to DX11. Let's try keeping out fingers crossed for a couple of years.

In august adoption of win10 with dx12 gpu owners went from 0% to 16.32% ... hmm ... using full blown dx12 features - maybe in a year
Posted on Reply
#172
truth teller
the info on that reddit thread is not really the truth.

the main goal of having async compute units is not the major parallelization of workload, but having the gpu compute said workload while still performing rendering tasks, which nvidia hardware can't do (all the news floating around seeems to indicate so, also the company hasnt addressed the issue in any way so that pretty much admitting fault).

leave that reddit guy with its c source file with cuda preprocessor tags alone, its going nowhere
Posted on Reply
#173
Ikaruga
BiggieShady
Point is good and all but let's not forget how we are here (mostly) very well used to difference between marketing badges on colorful boxes and spotty support for a new API. Major game engine developers will find a well supported feature subset on both architectures and use them ... hopefully every major engine will have optimized code paths for each architecture and automatic fallback to DX11. Let's try keeping out fingers crossed for a couple of years.

In august adoption of win10 with dx12 gpu owners went from 0% to 16.32% ... hmm ... using full blown dx12 features - maybe in a year
I agree and did not forget at all, but my conclusion were a little different, as I wrote it somewhere here earlier. Anyways, most of the Multiplatform games will mostly likely pick the features which are fast and available on the major consoles, but this it not the end of the world, you will still be able to play those games, perhaps you will need to set 1-2 sliders to high instead of ultra in the options to get optimal performance. Other titles might use gameworks or even a more direct approach exclusive to the PC, and those will run better on Nvidia and on AMD depending on the path they will take.
truth teller
the info on that reddit thread is not really the truth.

the main goal of having async compute units is not the major parallelization of workload, but having the gpu compute said workload while still performing rendering tasks, which nvidia hardware can't do (all the news floating around seeems to indicate so, also the company hasnt addressed the issue in any way so that pretty much admitting fault).
I don't think that's correct. Nvidia has a disadvantage with async compute on the hardware level indeed, but we don't know the performance impact if that's properly gets corrected with the help of the driver/CPU (and properly means well optimized here), and there are other features what the Nvidia architecture does faster, and engines using those might easily gain back what they lost with the async compute part.

We just don't know yet.
Posted on Reply
Add your own comment