• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for DirectX 12

We kind of already gathered that, no? Async on AMD cards is executed asynchronously while async on NVIDIA cards is executed synchronously.

Interesting that on both architectures, 100 threads appears to be the sweet spot.
 
Those nvidia cons managed to have a gpu architecture that is good for dx12 while it is good for dx11 at the same time, in the middle of a transition from dx11 to dx12 ... damn them.
Those bastards may even engineer Pascal completely with dx12 in mind.

But seriously, Maxwell architecture seems to handle async task concurrency between themselves just fine (latencies are in accordance with 32 queue depth)... problem is graphics workload being synchronous against async compute workload - if there is no architectural reason to be that way, this could be solved through a driver update. Troubling thing is, if nvidia knew they could fix it in driver update, they'd be faster with their response. Maybe Jen-Hsun Huang is writing a heartwarming letter.
 
Why am I not surprised at all by this too ?

Nvidia is so dirty like pigs in the mud.

I keep telling you that this company is not good but how many listen to me ?
The world will become a better place when we get rid of nvidia.

Monopoly of AMD (with good hearts) will be better than monopoly of nvidia (who only look how to screw technological progress).
Say whut?! :eek: :laugh: Especially the bold bit. With idiotic statements like that, no wonder you're getting criticized by everyone.
 
What a huge pile of dog turd over something which seems to have been a driver issue, completely expectable with Alpha level implementation of first ever DX12 title. NVIDIA DX12 driver does not seem to yet fully support Async Shaders, although Oxide dev thought it does.

Yeah, AMD technical marketing might not be your best source for info about competitor products. Combine that with a meltdown from a game dev... Then we have some good old fashioned NVIDIA bashing.

http://wccftech.com/nvidia-async-compute-directx-12-oxide-games/

"We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute.
"
 
That comedy team has been running these gags almost 10 years. There's been a few people in other threads that actually think he works for Nvidia, so I figured I would throw this up here:
 
Really, I never knew and actually don't wanna know that this fruit the apple is so divine. :laugh:

Seriously, how would I have known that ? When this is the first time I hear someone speaking like that ?

Because you are supposed to be smart enough to comprehend it. (No offense) But that's how life works.
 
No support, no problem, just pay them to gimp it on AMD cards: welcome to the wonderful world of the nVidia console, sorry, pc gaming, the way it's meant to be paid.

More like the way WE'RE meant to be played.
 
I don't see single DX12 game, only some calculation for possible scenario.
Until 10 DX12 games show on market Pascal will be old more than year.
Because of that I don't see reason for panic, if some card support DX12 that's not same as capable to offer playable fps.
I remember when 5870 with 2GB show up I bought immediately as first card with DX11 support.
Card was excellent but for DX9 and few DX10 environment, but first playable fps and much better with tessellation and DX11 was GTX580.
I changed and ATI5870 and ATI6970 but only with GTX580 situation become really better and with Tahiti later. Period between ATI 5870 and AMD 7970 AMD didn't improve nothing on DX11 field and people who wait and upgrade on GTX580 played much better, until HD7950/HD7970.
Because of that no reason to panic, NVIDIA will be ready when time come...
Only one other thing is bad and that's tendency to NVIDIA write driver only for last architecture.
If they continue to do that people will turn them back. At least middle segment.
That's much bigger reason for worry than Maxwell and DX12. We will not play nice DX12 games at least 2 years.
Maybe some very rich people with multi GPU. But I talk for people who play games as on beginning with single powerful graphic.
 
Last edited:
Only one other thing is bad and that's tendency to NVIDIA write driver only for last architecture.
If they continue to do that people will turn them back. At least middle segment.
That's much bigger reason for worry than Maxwell and DX12. We will not play nice DX12 games at least 2 years.
Maybe some very rich people with multi GPU. But I talk for people who play games as on beginning with single powerful graphic.
Nvidia has very good drivers for older generations, even Fermi cards run recent games quite nicely, one just needs to reduce some settings, but that's always the case as time goes by, you get a new card or reduce some settings. I recently helped somebody who has a 560ti + a 2500k (he bought those from me), and most of the games still look and run quite nicely with "medium" settings, and some of them even runs fine with "high". I can't imagine my Maxwell2 would need a replacement any time soon because of performance problems, if I will replace it, that will only happen because I won't be able to resist the upgrade itch again.
 
That is a claim presented at the beginning of the article. Through the end, if you read it, it is proven in benchmark that it is not true (number of queues horizontally and time spent computing vertically - lower is better)
View attachment 67772
Maxwell is faster than GCN up to 32 queues, and it evens out with GCN to 128 queues, where GCN has same speed up to 128 queues.
It's also shown that with async shaders it's extremely important how they are compiled for each architecture.
Good find @RejZoR

From https://forum.beyond3d.com/posts/1870374/



For pure compute, AMD's compute latency (green color areas) rivals NVIDIA's compute latency (refer to the attached file).

http://www.overclock.net/t/1569897/...ingularity-dx12-benchmarks/1710#post_24368195

Here's what I think they did at Beyond3D:
  1. They set the amount of threads, per kernel, to 32 (they're CUDA programmers after-all).
  2. They've bumped the Kernel count to up to 512 (16,384 Threads total).
  3. They're scratching their heads wondering why the results don't make sense when comparing GCN to Maxwell 2

Here's why that's not how you code for GCN


Why?:
  1. Each CU can have 40 Kernels in flight (each made up of 64 threads to form a single Wavefront).
  2. That's 2,560 Threads total PER CU.
  3. An R9 290x has 44 CUs or the capacity to handle 112,640 Threads total.

If you load up GCN with Kernels made up of 32 Threads you're wasting resources. If you're not pushing GCN you're wasting compute potential. In slide number 4, it stipulates that latency is hidden by executing overlapping wavefronts. This is why GCN appears to have a high degree of latency but you can execute a ton of work on GCN without affected the latency. With Maxwell/2, latency rises up like a staircase with the more work you throw at it. I'm not sure if the folks at Beyond3D are aware of this or not.


Conclusion:

I think they geared this test towards nVIDIAs CUDA architectures and are wondering why their results don't make sense on GCN. If true... DERP! That's why I said the single Latency results don't matter. This test is only good if you're checking on Async functionality.


GCN was built for Parallelism, not serial workloads like nVIDIAs architectures. This is why you don't see GCN taking a hit with 512 Kernels.

What did Oxide do? They built two paths. One with Shaders Optimized for CUDA and the other with Shaders Optimized for GCN. On top of that GCN has Async working. Therefore it is not hard to determine why GCN performs so well in Oxide's engine. It's a better architecture if you push it and code for it. If you're only using light compute work, nVIDIAs architectures will be superior.

This means that the burden is on developers to ensure they're optimizing for both. In the past, this hasn't been the case. Going forward... I hope they do. As for GameWorks titles, don't count them being optimized for GCN. That's a given. Oxide played fair, others... might not.
 

Attachments

  • 64.png
    64.png
    26.2 KB · Views: 464
Last edited:
This test is only good if you're checking on Async functionality.
That's exactly what the test is for ... checking on how much latency Async functionality introduces on both architectures.
GCN has a constant latency, good enough for compute loads made of small number of async tasks and great for huge number of async tasks. Additionaly GCN mixes compute async load and graphics load in near perfect parallelism.
Maxwell shows varying latency that is extremely low for small number of async tasks and surpasses GCN over 128 async tasks. What's really bad is that in current drivers async compute load and graphics load are done serially.
Mind you, every single async compute task is parallel in itself and can occupy 100% of GPU if the job is suitable (parallelizable), so in most cases penalty boils down in how many times and how much context switching is done. Maxwell has nice cache hierarchy to help with that.
GCN should destroy Maxwell in special cases where huge number of async tasks depend on results calculated by huge number of other async tasks that are greatly varying in computational complexity ;)
 
Gears of War Ultimate Will Have Unlocked Frame Rate; Devs Explain How They’re Using DX12 & Async Compute

To begin with, Cam McRae (Technical Director for the Windows 10 PC version) explained how they’re going to use DirectX 12 and even Async Compute in Gears of War Ultimate.

We are still hard at work optimising the game. DirectX 12 allows us much better control over the CPU load with heavily reduced driver overhead. Some of the overhead has been moved to the game where we can have control over it. Our main effort is in parallelising the rendering system to take advantage of multiple CPU cores. Command list creation and D3D resource creation are the big focus here. We’re also pulling in optimisations from UE4 where possible, such as pipeline state object caching. On the GPU side, we’ve converted SSAO to make use of async compute and are exploring the same for other features, like MSAA.
 
I'm sure they did. They did that to the Windows version of Minecraft already. Of course there's no technical reason for doing so.
 
Last edited:
Back
Top