Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for DirectX 12

BiggieShady · Sep 3, 2015

truth teller · Sep 3, 2015

turns out nvidia async implementation is just a wrapper

source: ka_rf @ beyond3d forum

their marketing department should have a full schedule on the next days

FordGT90Concept · Sep 3, 2015

We kind of already gathered that, no? Async on AMD cards is executed asynchronously while async on NVIDIA cards is executed synchronously.

Interesting that on both architectures, 100 threads appears to be the sweet spot.

Xzibit · Sep 4, 2015

truth teller said:
turns out nvidia async implementation is just a wrapper

source: ka_rf @ beyond3d forum

their marketing department should have a full schedule on the next days

Someone has to find out what kind of performance will be expected if the developer codes for Async and has multi-gpu scalability to the game engine.

BiggieShady · Sep 4, 2015

Those nvidia cons managed to have a gpu architecture that is good for dx12 while it is good for dx11 at the same time, in the middle of a transition from dx11 to dx12 ... damn them.
Those bastards may even engineer Pascal completely with dx12 in mind.

But seriously, Maxwell architecture seems to handle async task concurrency between themselves just fine (latencies are in accordance with 32 queue depth)... problem is graphics workload being synchronous against async compute workload - if there is no architectural reason to be that way, this could be solved through a driver update. Troubling thing is, if nvidia knew they could fix it in driver update, they'd be faster with their response. Maybe Jen-Hsun Huang is writing a heartwarming letter.

qubit · Sep 5, 2015

Sony Xperia S said:
Why am I not surprised at all by this too ?

Nvidia is so dirty like pigs in the mud.

I keep telling you that this company is not good but how many listen to me ?
The world will become a better place when we get rid of nvidia.

Monopoly of AMD (with good hearts) will be better than monopoly of nvidia (who only look how to screw technological progress).

Say whut?! :eek:

Especially the bold bit. With idiotic statements like that, no wonder you're getting criticized by everyone.

GC_PaNzerFIN · Sep 5, 2015

What a huge pile of dog turd over something which seems to have been a driver issue, completely expectable with Alpha level implementation of first ever DX12 title. NVIDIA DX12 driver does not seem to yet fully support Async Shaders, although Oxide dev thought it does.

Yeah, AMD technical marketing might not be your best source for info about competitor products. Combine that with a meltdown from a game dev... Then we have some good old fashioned NVIDIA bashing.

http://wccftech.com/nvidia-async-compute-directx-12-oxide-games/

"We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute.
"

Ikaruga · Sep 6, 2015

Interview with Nvidia engineer about DirectX 12

BiggieShady · Sep 6, 2015

@Ikaruga I was wondering what took toothless Spanish Nvidia employee so long

... meanwhile in real nvidia http://www.guru3d.com/news-story/nvidia-will-fully-implement-async-compute-via-driver-support.html

Ikaruga · Sep 6, 2015

BiggieShady said:
@Ikaruga I was wondering what took toothless Spanish Nvidia employee so long

But he has some teeth

rtwjunkie · Sep 6, 2015

That comedy team has been running these gags almost 10 years. There's been a few people in other threads that actually think he works for Nvidia, so I figured I would throw this up here:

Mejor chiste del risitas

BiggieShady · Sep 6, 2015

Ikaruga said:
But he has some teeth

He has a tooth, you are being generous with the plural there

P-40E · Sep 7, 2015

Sony Xperia S said:
Really, I never knew and actually don't wanna know that this fruit the apple is so divine.

Seriously, how would I have known that ? When this is the first time I hear someone speaking like that ?

Because you are supposed to be smart enough to comprehend it. (No offense) But that's how life works.

anubis44 · Sep 7, 2015

Mr McC said:
No support, no problem, just pay them to gimp it on AMD cards: welcome to the wonderful world of the nVidia console, sorry, pc gaming, the way it's meant to be paid.

More like the way WE'RE meant to be played.

Vlada011 · Sep 7, 2015

I don't see single DX12 game, only some calculation for possible scenario.
Until 10 DX12 games show on market Pascal will be old more than year.
Because of that I don't see reason for panic, if some card support DX12 that's not same as capable to offer playable fps.
I remember when 5870 with 2GB show up I bought immediately as first card with DX11 support.
Card was excellent but for DX9 and few DX10 environment, but first playable fps and much better with tessellation and DX11 was GTX580.
I changed and ATI5870 and ATI6970 but only with GTX580 situation become really better and with Tahiti later. Period between ATI 5870 and AMD 7970 AMD didn't improve nothing on DX11 field and people who wait and upgrade on GTX580 played much better, until HD7950/HD7970.
Because of that no reason to panic, NVIDIA will be ready when time come...
Only one other thing is bad and that's tendency to NVIDIA write driver only for last architecture.
If they continue to do that people will turn them back. At least middle segment.
That's much bigger reason for worry than Maxwell and DX12. We will not play nice DX12 games at least 2 years.
Maybe some very rich people with multi GPU. But I talk for people who play games as on beginning with single powerful graphic.

Ikaruga · Sep 8, 2015

Vlada011 said:
Only one other thing is bad and that's tendency to NVIDIA write driver only for last architecture.
If they continue to do that people will turn them back. At least middle segment.
That's much bigger reason for worry than Maxwell and DX12. We will not play nice DX12 games at least 2 years.
Maybe some very rich people with multi GPU. But I talk for people who play games as on beginning with single powerful graphic.

Nvidia has very good drivers for older generations, even Fermi cards run recent games quite nicely, one just needs to reduce some settings, but that's always the case as time goes by, you get a new card or reduce some settings. I recently helped somebody who has a 560ti + a 2500k (he bought those from me), and most of the games still look and run quite nicely with "medium" settings, and some of them even runs fine with "high". I can't imagine my Maxwell2 would need a replacement any time soon because of performance problems, if I will replace it, that will only happen because I won't be able to resist the upgrade itch again.

ValenOne · Sep 10, 2015

BiggieShady said:
That is a claim presented at the beginning of the article. Through the end, if you read it, it is proven in benchmark that it is not true (number of queues horizontally and time spent computing vertically - lower is better)
View attachment 67772
Maxwell is faster than GCN up to 32 queues, and it evens out with GCN to 128 queues, where GCN has same speed up to 128 queues.
It's also shown that with async shaders it's extremely important how they are compiled for each architecture.
Good find @RejZoR

From https://forum.beyond3d.com/posts/1870374/

For pure compute, AMD's compute latency (green color areas) rivals NVIDIA's compute latency (refer to the attached file).

http://www.overclock.net/t/1569897/...ingularity-dx12-benchmarks/1710#post_24368195

Here's what I think they did at Beyond3D:

They set the amount of threads, per kernel, to 32 (they're CUDA programmers after-all).
They've bumped the Kernel count to up to 512 (16,384 Threads total).
They're scratching their heads wondering why the results don't make sense when comparing GCN to Maxwell 2

Here's why that's not how you code for GCN

Why?:

Each CU can have 40 Kernels in flight (each made up of 64 threads to form a single Wavefront).
That's 2,560 Threads total PER CU.
An R9 290x has 44 CUs or the capacity to handle 112,640 Threads total.

If you load up GCN with Kernels made up of 32 Threads you're wasting resources. If you're not pushing GCN you're wasting compute potential. In slide number 4, it stipulates that latency is hidden by executing overlapping wavefronts. This is why GCN appears to have a high degree of latency but you can execute a ton of work on GCN without affected the latency. With Maxwell/2, latency rises up like a staircase with the more work you throw at it. I'm not sure if the folks at Beyond3D are aware of this or not.

Conclusion:

I think they geared this test towards nVIDIAs CUDA architectures and are wondering why their results don't make sense on GCN. If true... DERP! That's why I said the single Latency results don't matter. This test is only good if you're checking on Async functionality.

GCN was built for Parallelism, not serial workloads like nVIDIAs architectures. This is why you don't see GCN taking a hit with 512 Kernels.

What did Oxide do? They built two paths. One with Shaders Optimized for CUDA and the other with Shaders Optimized for GCN. On top of that GCN has Async working. Therefore it is not hard to determine why GCN performs so well in Oxide's engine. It's a better architecture if you push it and code for it. If you're only using light compute work, nVIDIAs architectures will be superior.

This means that the burden is on developers to ensure they're optimizing for both. In the past, this hasn't been the case. Going forward... I hope they do. As for GameWorks titles, don't count them being optimized for GCN. That's a given. Oxide played fair, others... might not.

BiggieShady · Sep 10, 2015

rvalencia said:
This test is only good if you're checking on Async functionality.

That's exactly what the test is for ... checking on how much latency Async functionality introduces on both architectures.
GCN has a constant latency, good enough for compute loads made of small number of async tasks and great for huge number of async tasks. Additionaly GCN mixes compute async load and graphics load in near perfect parallelism.
Maxwell shows varying latency that is extremely low for small number of async tasks and surpasses GCN over 128 async tasks. What's really bad is that in current drivers async compute load and graphics load are done serially.
Mind you, every single async compute task is parallel in itself and can occupy 100% of GPU if the job is suitable (parallelizable), so in most cases penalty boils down in how many times and how much context switching is done. Maxwell has nice cache hierarchy to help with that.
GCN should destroy Maxwell in special cases where huge number of async tasks depend on results calculated by huge number of other async tasks that are greatly varying in computational complexity

Xzibit · Sep 12, 2015

Gears of War Ultimate Will Have Unlocked Frame Rate; Devs Explain How They’re Using DX12 & Async Compute

To begin with, Cam McRae (Technical Director for the Windows 10 PC version) explained how they’re going to use DirectX 12 and even Async Compute in Gears of War Ultimate.

We are still hard at work optimising the game. DirectX 12 allows us much better control over the CPU load with heavily reduced driver overhead. Some of the overhead has been moved to the game where we can have control over it. Our main effort is in parallelising the rendering system to take advantage of multiple CPU cores. Command list creation and D3D resource creation are the big focus here. We’re also pulling in optimisations from UE4 where possible, such as pipeline state object caching. On the GPU side, we’ve converted SSAO to make use of async compute and are exploring the same for other features, like MSAA.

rtwjunkie · Sep 13, 2015

Xzibit said:
Gears of War Ultimate Will Have Unlocked Frame Rate; Devs Explain How They’re Using DX12 & Async Compute

To begin with, Cam McRae (Technical Director for the Windows 10 PC version) explained how they’re going to use DirectX 12 and even Async Compute in Gears of War Ultimate.

So is MS pulling another Halo 2, and making this W10 exclusive? It sounds like it, but doesn't outright say it.

FordGT90Concept · Sep 13, 2015

I'm sure they did. They did that to the Windows version of Minecraft already. Of course there's no technical reason for doing so.

Xzibit · Sep 13, 2015

rtwjunkie said:
So is MS pulling another Halo 2, and making this W10 exclusive? It sounds like it, but doesn't outright say it.

Its X-Box One & Windows 10. Fable Legends is the same.

System Name	Windows 10 64-bit Core i7 6700
Processor	Intel Core i7 6700
Motherboard	Asus Z170M-PLUS
Cooling	Corsair AIO
Memory	2 x 8 GB Kingston DDR4 2666
Video Card(s)	Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage	Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s)	Dell P2414H
Case	Corsair Carbide Air 540
Audio Device(s)	Realtek HD Audio
Power Supply	Corsair TX v2 650W
Mouse	Steelseries Sensei
Keyboard	CM Storm Quickfire Pro, Cherry MX Reds
Software	MS Windows 10 Pro 64-bit

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

System Name	Windows 10 64-bit Core i7 6700
Processor	Intel Core i7 6700
Motherboard	Asus Z170M-PLUS
Cooling	Corsair AIO
Memory	2 x 8 GB Kingston DDR4 2666
Video Card(s)	Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage	Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s)	Dell P2414H
Case	Corsair Carbide Air 540
Audio Device(s)	Realtek HD Audio
Power Supply	Corsair TX v2 650W
Mouse	Steelseries Sensei
Keyboard	CM Storm Quickfire Pro, Cherry MX Reds
Software	MS Windows 10 Pro 64-bit

System Name	Quantumville™
Processor	Intel Core i7-2700K @ 4GHz
Motherboard	Asus P8Z68-V PRO/GEN3
Cooling	Noctua NH-D14
Memory	16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s)	MSI RTX 2080 SUPER Gaming X Trio
Storage	Samsung 850 Pro 256GB \| WD Black 4TB \| WD Blue 6TB
Display(s)	ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) \| Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case	Cooler Master HAF 922
Audio Device(s)	Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply	Corsair AX1600i
Mouse	Microsoft Intellimouse Pro - Black Shadow
Keyboard	Yes
Software	Windows 10 Pro 64-bit

System Name	RGB-PC v2.0
Processor	AMD Ryzen 7950X
Motherboard	Asus Crosshair X670E Extreme
Cooling	Corsair iCUE H150i RGB PRO XT
Memory	4x16GB DDR5-5200 CL36 G.SKILL Trident Z5 NEO RGB
Video Card(s)	Asus Strix RTX 2080 Ti
Storage	2x2TB Samsung 980 PRO
Display(s)	Acer Nitro XV273K 27" 4K 120Hz (G-SYNC compatible)
Case	Lian Li O11 Dynamic EVO
Audio Device(s)	Audioquest Dragon Red + Sennheiser HD 650
Power Supply	Asus Thor II 1000W + Cablemod ModMesh Pro sleeved cables
Mouse	Logitech G500s
Keyboard	Corsair K70 RGB with low profile red cherrys
Software	Windows 11 Pro 64-bit

Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for DirectX 12

BiggieShady

truth teller

FordGT90Concept

"I go fast!1!11!1!"

Xzibit

BiggieShady

qubit

Overclocked quantum bit

GC_PaNzerFIN

Ikaruga

BiggieShady

Ikaruga

rtwjunkie

PC Gaming Enthusiast

BiggieShady

P-40E

anubis44

Vlada011

Ikaruga

ValenOne

Attachments

BiggieShady

Xzibit

rtwjunkie

PC Gaming Enthusiast

FordGT90Concept

"I go fast!1!11!1!"

Xzibit

Processor	Core i9-9900k
Motherboard	ASRock Z390 Phantom Gaming 6
Cooling	All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax ETS-T50 Black CPU cooler
Memory	32GB (2x16) Mushkin Redline DDR-4 3200
Video Card(s)	ASUS RTX 4070 Ti Super OC 16GB
Storage	1x 1TB MX500 (OS); 2x 6TB WD Black; 1x 2TB MX500; 1x 1TB BX500 SSD; 1x 6TB WD Blue storage (eSATA)
Display(s)	Infievo 27" 165Hz @ 2560 x 1440
Case	Fractal Design Define R4 Black -windowed
Audio Device(s)	Soundblaster Z
Power Supply	Seasonic Focus GX-1000 Gold
Mouse	Coolermaster Sentinel III (large palm grip!)
Keyboard	Logitech G610 Orion mechanical (Cherry Brown switches)
Software	Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)

System Name	Biostar
Processor	Intel i5 3470
Motherboard	BIOSTAR TZ77A
Cooling	Stock Intel Cooler
Memory	8GB G.Skill Ripjaws 1600 DDR3
Video Card(s)	EVGA GTX 970 SC ACX 2.0 3.5GB + 512MB of very slow.
Storage	1x Seagate Barracuda 7200.12 500GB / 2x Segate Barracuda 7200.10 250 GB
Display(s)	HP 2009m
Case	Thermaltake Versa H22
Audio Device(s)	Realtek ALC892
Power Supply	Thermaltake SP-650P 650Watt.
Mouse	Logitech M185
Keyboard	Logitech K270
Software	Windows 10 Enterprise LTSB 64
Benchmark Scores	Why waste my hardware and wear it out for a stupid score similar to everyone else's?

System Name	Current Rig
Processor	Intel 12700K@5.1GHz
Motherboard	MSI Pro Z790-P
Cooling	Arctic Cooling Liquid Freezer II 360mm
Memory	2x16GB DDR5-6000 G.Skill Trident Z RGB
Video Card(s)	MSI Gaming X Trio 6800 16GB
Storage	1TB SSD
Case	Cooler Master Storm Striker
Power Supply	Antec True Power 750w
Keyboard	IBM Model 'M"

System Name	Intel® X99 Wellsburg
Processor	Intel® Core™ i7-5820K - 4.5GHz
Motherboard	ASUS Rampage V E10 (1801)
Cooling	EK RGB Monoblock + EK XRES D5 Revo Glass PWM
Memory	CMD16GX4M4A2666C15
Video Card(s)	ASUS GTX1080Ti Poseidon
Storage	Samsung 970 EVO PLUS 1TB /850 EVO 1TB / WD Black 2TB
Display(s)	Samsung P2450H
Case	Lian Li PC-O11 WXC
Audio Device(s)	CREATIVE Sound Blaster ZxR
Power Supply	EVGA 1200 P2 Platinum
Mouse	Logitech G900 / SS QCK
Keyboard	Deck 87 Francium Pro
Software	Windows 10 Pro x64

System Name	Eula
Processor	AMD Ryzen 9 7950X
Motherboard	MSI MPG B850 Edge Ti WiFi
Cooling	Corsair H150i Elite LCD XT White
Memory	Trident Z5 Neo RGB DDR5-6000 CL32-38-38-96 1.40V 64GB (2x32GB) AMD EXPO F5-6000J3238G32GX2-TZ5NR
Video Card(s)	Gigabyte GeForce RTX 4080 GAMING OC
Storage	Crucial P3 Plus, 4 TB NVMe, Samsung 980 Pro 2TB NVMe, Toshiba N300 10TB HDD, WDC Red Pro NAS HDD
Display(s)	Acer Predator X32FP 32in 160Hz 4K, Corsair Xeneon 32UHD144 32in 144 hz 4K
Case	Antec Constellation C8 RGB White
Audio Device(s)	Creative Sound Blaster Z
Power Supply	Corsair HX1000 Platinum 1000W
Mouse	SteelSeries Prime Pro Gaming Mouse
Keyboard	SteelSeries Apex 5
Software	MS Windows 11 Pro