• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for DirectX 12

Joined
Feb 8, 2012
Messages
2,946 (0.97/day)
Location
Zagreb, Croatia
System Name Windows 7 64-bit Core i5 3570K
Processor Intel Core i5 3570K @ 4.2 GHz, 1.26 V
Motherboard Gigabyte GA-Z77MX-D3H
Cooling Scythe Katana 4
Memory 4 x 4 GB G-Skill Sniper DDR3 @ 1600 MHz
Video Card(s) Gainward NVIDIA GeForce GTX 970 Phantom
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case CoolerMaster Silencio 550
Audio Device(s) VIA HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 7 Enterprise 64-bit SP1

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,058 (6.13/day)
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) Sapphire Radeon RX 5500 XT Pulse 8 GiB
Storage Crucial MX300 275 GB, Seagate Exos X12 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 VGA)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
We kind of already gathered that, no? Async on AMD cards is executed asynchronously while async on NVIDIA cards is executed synchronously.

Interesting that on both architectures, 100 threads appears to be the sweet spot.
 
Joined
Feb 8, 2012
Messages
2,946 (0.97/day)
Location
Zagreb, Croatia
System Name Windows 7 64-bit Core i5 3570K
Processor Intel Core i5 3570K @ 4.2 GHz, 1.26 V
Motherboard Gigabyte GA-Z77MX-D3H
Cooling Scythe Katana 4
Memory 4 x 4 GB G-Skill Sniper DDR3 @ 1600 MHz
Video Card(s) Gainward NVIDIA GeForce GTX 970 Phantom
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case CoolerMaster Silencio 550
Audio Device(s) VIA HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 7 Enterprise 64-bit SP1
Those nvidia cons managed to have a gpu architecture that is good for dx12 while it is good for dx11 at the same time, in the middle of a transition from dx11 to dx12 ... damn them.
Those bastards may even engineer Pascal completely with dx12 in mind.

But seriously, Maxwell architecture seems to handle async task concurrency between themselves just fine (latencies are in accordance with 32 queue depth)... problem is graphics workload being synchronous against async compute workload - if there is no architectural reason to be that way, this could be solved through a driver update. Troubling thing is, if nvidia knew they could fix it in driver update, they'd be faster with their response. Maybe Jen-Hsun Huang is writing a heartwarming letter.
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
15,925 (3.49/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K at stock (hits 5 gees+ easily)
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (4 x 4GB Corsair Vengeance DDR3 PC3-12800 C9 1600MHz)
Video Card(s) Zotac GTX 1080 AMP! Extreme Edition
Storage Samsung 850 Pro 256GB | WD Green 4TB
Display(s) BenQ XL2720Z | Asus VG278HE (both 27", 144Hz, 3D Vision 2, 1080p)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair HX 850W v1
Software Windows 10 Pro 64-bit
Why am I not surprised at all by this too ?

Nvidia is so dirty like pigs in the mud.

I keep telling you that this company is not good but how many listen to me ?
The world will become a better place when we get rid of nvidia.

Monopoly of AMD (with good hearts) will be better than monopoly of nvidia (who only look how to screw technological progress).
Say whut?! :eek: :laugh: Especially the bold bit. With idiotic statements like that, no wonder you're getting criticized by everyone.
 
Joined
Oct 9, 2009
Messages
706 (0.18/day)
Location
Finland
System Name :P~
Processor Intel Core i7-5930K (ES)
Motherboard Asus Rampage V Extreme/3.1
Cooling Phanteks PH-TC14PE
Memory 32GB Corsair Vengeance LPX 2400 MHz
Video Card(s) Asus GTX 1080 Strix
Storage 400GB Intel 750 PCI-E SSD, 512GB Crucial MX100 SSD, 3TB WD RED HDD
Display(s) QNIX QX2710LED OC @ 96 Hz 27"
Case Corsair Obsidian 750D
Audio Device(s) Audioquest Dragon Red + Sennheiser HD 650
Power Supply Corsair HX1000i + Cablemod sleeved cables kit
Mouse Logitech G500s
Keyboard Logitech Ultra X Flat Premium
Software Windows 10 64-bit
What a huge pile of dog turd over something which seems to have been a driver issue, completely expectable with Alpha level implementation of first ever DX12 title. NVIDIA DX12 driver does not seem to yet fully support Async Shaders, although Oxide dev thought it does.

Yeah, AMD technical marketing might not be your best source for info about competitor products. Combine that with a meltdown from a game dev... Then we have some good old fashioned NVIDIA bashing.

http://wccftech.com/nvidia-async-compute-directx-12-oxide-games/

"We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute.
"
 
Joined
Feb 8, 2012
Messages
2,946 (0.97/day)
Location
Zagreb, Croatia
System Name Windows 7 64-bit Core i5 3570K
Processor Intel Core i5 3570K @ 4.2 GHz, 1.26 V
Motherboard Gigabyte GA-Z77MX-D3H
Cooling Scythe Katana 4
Memory 4 x 4 GB G-Skill Sniper DDR3 @ 1600 MHz
Video Card(s) Gainward NVIDIA GeForce GTX 970 Phantom
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case CoolerMaster Silencio 550
Audio Device(s) VIA HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 7 Enterprise 64-bit SP1

rtwjunkie

PC Gaming Enthusiast
Supporter
Joined
Jul 25, 2008
Messages
13,020 (3.00/day)
Location
Louisiana -Laissez les bons temps rouler!
System Name Bayou Phantom
Processor Core i7-8700k 4.4Ghz @ 1.18v
Motherboard ASRock Z390 Phantom Gaming 6
Cooling All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax T40F Black CPU cooler
Memory 2x 16GB Mushkin Redline DDR-4 3200
Video Card(s) MSI GTX 1080Ti Gaming X
Storage 1x 500 MX500 SSD; 1x 2TB WD Black; 2x 4TB WD Black; 1x400GB VelRptr; 1x 3TB WD Blue storage (eSATA)
Display(s) HP 27q 27" IPS @ 2560 x 1440
Case Fractal Design Define R4 Black w/Titanium front -windowed
Audio Device(s) Soundblaster Z
Power Supply Seasonic X-850
Mouse Coolermaster Sentinel III (large palm grip!)
Keyboard Logitech G610 Orion mechanical (Cherry Brown switches)
Software Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)
That comedy team has been running these gags almost 10 years. There's been a few people in other threads that actually think he works for Nvidia, so I figured I would throw this up here:
 
Joined
Feb 8, 2012
Messages
2,946 (0.97/day)
Location
Zagreb, Croatia
System Name Windows 7 64-bit Core i5 3570K
Processor Intel Core i5 3570K @ 4.2 GHz, 1.26 V
Motherboard Gigabyte GA-Z77MX-D3H
Cooling Scythe Katana 4
Memory 4 x 4 GB G-Skill Sniper DDR3 @ 1600 MHz
Video Card(s) Gainward NVIDIA GeForce GTX 970 Phantom
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case CoolerMaster Silencio 550
Audio Device(s) VIA HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 7 Enterprise 64-bit SP1
Joined
Sep 21, 2013
Messages
61 (0.02/day)
System Name Biostar
Processor Intel i5 3470
Motherboard BIOSTAR TZ77A
Cooling Stock Intel Cooler
Memory 8GB G.Skill Ripjaws 1600 DDR3
Video Card(s) EVGA GTX 970 SC ACX 2.0 3.5GB + 512MB of very slow.
Storage 1x Seagate Barracuda 7200.12 500GB / 2x Segate Barracuda 7200.10 250 GB
Display(s) HP 2009m
Case Thermaltake Versa H22
Audio Device(s) Realtek ALC892
Power Supply Thermaltake SP-650P 650Watt.
Mouse Logitech M185
Keyboard Logitech K270
Software Windows 10 Enterprise LTSB 64
Benchmark Scores Why waste my hardware and wear it out for a stupid score similar to everyone else's?
Really, I never knew and actually don't wanna know that this fruit the apple is so divine. :laugh:

Seriously, how would I have known that ? When this is the first time I hear someone speaking like that ?
Because you are supposed to be smart enough to comprehend it. (No offense) But that's how life works.
 
Joined
Sep 29, 2011
Messages
211 (0.07/day)
Location
Ottawa, Canada
System Name Current Rig
Processor AMD Ryzen 7 1700@3.95GHz
Motherboard Asus X370 Crosshair VI
Cooling Arctic Cooling 240mm
Memory 2x8GB DDR4-3200 G.Skill Trident Z RGB
Video Card(s) Gigabyte Windforce R9 290 (bios flashed to 1050MHz core
Storage 1TB SSD
Display(s) 3x22" LG Flatron (eyefinity)
Case Cooler Master Storm Striker
Power Supply Antec True Power 750w
No support, no problem, just pay them to gimp it on AMD cards: welcome to the wonderful world of the nVidia console, sorry, pc gaming, the way it's meant to be paid.
More like the way WE'RE meant to be played.
 
Joined
Sep 22, 2012
Messages
1,008 (0.36/day)
Location
Belgrade, Serbia
System Name Intel® X99 Wellsburg
Processor Intel® Core™ i7-5820K - 4.5GHz
Motherboard ASUS Rampage V E10 (1801)
Cooling EK RGB Monoblock + EK XRES D5 Revo Glass PWM
Memory CMD16GX4M4A2666C15
Video Card(s) ASUS GTX1080Ti Poseidon
Storage Samsung 970 EVO PLUS 1TB /850 EVO 1TB / WD Black 2TB
Display(s) Samsung P2450H
Case Lian Li PC-O11 WXC
Audio Device(s) CREATIVE Sound Blaster ZxR
Power Supply EVGA 1200 P2 Platinum
Mouse Logitech G900 / SS QCK
Keyboard Deck 87 Francium Pro
Software Windows 10 Pro x64
I don't see single DX12 game, only some calculation for possible scenario.
Until 10 DX12 games show on market Pascal will be old more than year.
Because of that I don't see reason for panic, if some card support DX12 that's not same as capable to offer playable fps.
I remember when 5870 with 2GB show up I bought immediately as first card with DX11 support.
Card was excellent but for DX9 and few DX10 environment, but first playable fps and much better with tessellation and DX11 was GTX580.
I changed and ATI5870 and ATI6970 but only with GTX580 situation become really better and with Tahiti later. Period between ATI 5870 and AMD 7970 AMD didn't improve nothing on DX11 field and people who wait and upgrade on GTX580 played much better, until HD7950/HD7970.
Because of that no reason to panic, NVIDIA will be ready when time come...
Only one other thing is bad and that's tendency to NVIDIA write driver only for last architecture.
If they continue to do that people will turn them back. At least middle segment.
That's much bigger reason for worry than Maxwell and DX12. We will not play nice DX12 games at least 2 years.
Maybe some very rich people with multi GPU. But I talk for people who play games as on beginning with single powerful graphic.
 
Last edited:
Joined
Feb 18, 2011
Messages
1,250 (0.37/day)
Only one other thing is bad and that's tendency to NVIDIA write driver only for last architecture.
If they continue to do that people will turn them back. At least middle segment.
That's much bigger reason for worry than Maxwell and DX12. We will not play nice DX12 games at least 2 years.
Maybe some very rich people with multi GPU. But I talk for people who play games as on beginning with single powerful graphic.
Nvidia has very good drivers for older generations, even Fermi cards run recent games quite nicely, one just needs to reduce some settings, but that's always the case as time goes by, you get a new card or reduce some settings. I recently helped somebody who has a 560ti + a 2500k (he bought those from me), and most of the games still look and run quite nicely with "medium" settings, and some of them even runs fine with "high". I can't imagine my Maxwell2 would need a replacement any time soon because of performance problems, if I will replace it, that will only happen because I won't be able to resist the upgrade itch again.
 
Joined
Nov 3, 2011
Messages
368 (0.12/day)
System Name Fractal Define R5 | Fractal Define R6
Processor AMD Ryzen 9 3900X | Intel Core i7-9900K @ 5 Ghz all cores
Motherboard ASUS ROG Strix X570 Gaming | MSI Z390 Gaming Pro Carbon AC
Cooling CORSAIR Hydro H115i, RGB | CORSAIR Hydro H150i RGB
Memory G.Skill Trident 32GB 3200 Mhz RGB| HyperX 32GB 3600 Mhz RGB
Video Card(s) Gigabyte RTX 2080 Windforce 8G OC| MSI RTX 2080 Ti Gaming X TRIO
Display(s) 3X Samsung 23 in LED | LG 32UL950-W 32in 4K HDR FreeSync
Case Fractal R5 tempered glass | Fractal R6 tempered glass
Audio Device(s) Creative Sound Blaster Z | Creative Sound Blaster AE-7
Power Supply Seasonic 750 watts| Seasonic 1000 watts
Mouse Bloody P95s
Keyboard Logitech G810s
Software MS Windows 10 Pro version 2004
Benchmark Scores Intel Core i7-7820X 4.5 Ghz and ASUS ROG Strix X299-E Gaming parts in storage.
That is a claim presented at the beginning of the article. Through the end, if you read it, it is proven in benchmark that it is not true (number of queues horizontally and time spent computing vertically - lower is better)
View attachment 67772
Maxwell is faster than GCN up to 32 queues, and it evens out with GCN to 128 queues, where GCN has same speed up to 128 queues.
It's also shown that with async shaders it's extremely important how they are compiled for each architecture.
Good find @RejZoR
From https://forum.beyond3d.com/posts/1870374/



For pure compute, AMD's compute latency (green color areas) rivals NVIDIA's compute latency (refer to the attached file).

http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1710#post_24368195

Here's what I think they did at Beyond3D:
  1. They set the amount of threads, per kernel, to 32 (they're CUDA programmers after-all).
  2. They've bumped the Kernel count to up to 512 (16,384 Threads total).
  3. They're scratching their heads wondering why the results don't make sense when comparing GCN to Maxwell 2

Here's why that's not how you code for GCN


Why?:
  1. Each CU can have 40 Kernels in flight (each made up of 64 threads to form a single Wavefront).
  2. That's 2,560 Threads total PER CU.
  3. An R9 290x has 44 CUs or the capacity to handle 112,640 Threads total.

If you load up GCN with Kernels made up of 32 Threads you're wasting resources. If you're not pushing GCN you're wasting compute potential. In slide number 4, it stipulates that latency is hidden by executing overlapping wavefronts. This is why GCN appears to have a high degree of latency but you can execute a ton of work on GCN without affected the latency. With Maxwell/2, latency rises up like a staircase with the more work you throw at it. I'm not sure if the folks at Beyond3D are aware of this or not.


Conclusion:

I think they geared this test towards nVIDIAs CUDA architectures and are wondering why their results don't make sense on GCN. If true... DERP! That's why I said the single Latency results don't matter. This test is only good if you're checking on Async functionality.


GCN was built for Parallelism, not serial workloads like nVIDIAs architectures. This is why you don't see GCN taking a hit with 512 Kernels.

What did Oxide do? They built two paths. One with Shaders Optimized for CUDA and the other with Shaders Optimized for GCN. On top of that GCN has Async working. Therefore it is not hard to determine why GCN performs so well in Oxide's engine. It's a better architecture if you push it and code for it. If you're only using light compute work, nVIDIAs architectures will be superior.

This means that the burden is on developers to ensure they're optimizing for both. In the past, this hasn't been the case. Going forward... I hope they do. As for GameWorks titles, don't count them being optimized for GCN. That's a given. Oxide played fair, others... might not.
 

Attachments

Last edited:
Joined
Feb 8, 2012
Messages
2,946 (0.97/day)
Location
Zagreb, Croatia
System Name Windows 7 64-bit Core i5 3570K
Processor Intel Core i5 3570K @ 4.2 GHz, 1.26 V
Motherboard Gigabyte GA-Z77MX-D3H
Cooling Scythe Katana 4
Memory 4 x 4 GB G-Skill Sniper DDR3 @ 1600 MHz
Video Card(s) Gainward NVIDIA GeForce GTX 970 Phantom
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case CoolerMaster Silencio 550
Audio Device(s) VIA HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 7 Enterprise 64-bit SP1
This test is only good if you're checking on Async functionality.
That's exactly what the test is for ... checking on how much latency Async functionality introduces on both architectures.
GCN has a constant latency, good enough for compute loads made of small number of async tasks and great for huge number of async tasks. Additionaly GCN mixes compute async load and graphics load in near perfect parallelism.
Maxwell shows varying latency that is extremely low for small number of async tasks and surpasses GCN over 128 async tasks. What's really bad is that in current drivers async compute load and graphics load are done serially.
Mind you, every single async compute task is parallel in itself and can occupy 100% of GPU if the job is suitable (parallelizable), so in most cases penalty boils down in how many times and how much context switching is done. Maxwell has nice cache hierarchy to help with that.
GCN should destroy Maxwell in special cases where huge number of async tasks depend on results calculated by huge number of other async tasks that are greatly varying in computational complexity ;)
 
Joined
Apr 30, 2012
Messages
3,621 (1.22/day)
Gears of War Ultimate Will Have Unlocked Frame Rate; Devs Explain How They’re Using DX12 & Async Compute

To begin with, Cam McRae (Technical Director for the Windows 10 PC version) explained how they’re going to use DirectX 12 and even Async Compute in Gears of War Ultimate.

We are still hard at work optimising the game. DirectX 12 allows us much better control over the CPU load with heavily reduced driver overhead. Some of the overhead has been moved to the game where we can have control over it. Our main effort is in parallelising the rendering system to take advantage of multiple CPU cores. Command list creation and D3D resource creation are the big focus here. We’re also pulling in optimisations from UE4 where possible, such as pipeline state object caching. On the GPU side, we’ve converted SSAO to make use of async compute and are exploring the same for other features, like MSAA.
 

rtwjunkie

PC Gaming Enthusiast
Supporter
Joined
Jul 25, 2008
Messages
13,020 (3.00/day)
Location
Louisiana -Laissez les bons temps rouler!
System Name Bayou Phantom
Processor Core i7-8700k 4.4Ghz @ 1.18v
Motherboard ASRock Z390 Phantom Gaming 6
Cooling All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax T40F Black CPU cooler
Memory 2x 16GB Mushkin Redline DDR-4 3200
Video Card(s) MSI GTX 1080Ti Gaming X
Storage 1x 500 MX500 SSD; 1x 2TB WD Black; 2x 4TB WD Black; 1x400GB VelRptr; 1x 3TB WD Blue storage (eSATA)
Display(s) HP 27q 27" IPS @ 2560 x 1440
Case Fractal Design Define R4 Black w/Titanium front -windowed
Audio Device(s) Soundblaster Z
Power Supply Seasonic X-850
Mouse Coolermaster Sentinel III (large palm grip!)
Keyboard Logitech G610 Orion mechanical (Cherry Brown switches)
Software Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,058 (6.13/day)
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) Sapphire Radeon RX 5500 XT Pulse 8 GiB
Storage Crucial MX300 275 GB, Seagate Exos X12 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 VGA)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
I'm sure they did. They did that to the Windows version of Minecraft already. Of course there's no technical reason for doing so.
 
Last edited:
Top