• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for DirectX 12

Joined
Sep 7, 2011
Messages
2,785 (0.87/day)
Location
New Zealand
System Name MoneySink
Processor 2600K @ 4.8
Motherboard P8Z77-V
Cooling AC NexXxos XT45 360, RayStorm, D5T+XSPC tank, Tygon R-3603, Bitspower
Memory 16GB Crucial Ballistix DDR3-1600C8
Video Card(s) GTX 780 SLI (EVGA SC ACX + Giga GHz Ed.)
Storage Kingston HyperX SSD (128) OS, WD RE4 (1TB), RE2 (1TB), Cav. Black (2 x 500GB), Red (4TB)
Display(s) Achieva Shimian QH270-IPSMS (2560x1440) S-IPS
Case NZXT Switch 810
Audio Device(s) onboard Realtek yawn edition
Power Supply Seasonic X-1050
Software Win8.1 Pro
Benchmark Scores 3.5 litres of Pale Ale in 18 minutes.
Wow.......what's up with the hostility man. Is it really you? Doesn't seem like it.
I wonder why Humansmoke hasn't commented on this news....
1. Because the whole thing is based upon a demo of an unreleased game which may - or may not, have any significant impact on PC gaming
2. Because as others have said, the time to start sweating is when DX12 games actually arrive.
3. As I've said in earlier posts, there are going to instances where game engines favour one vendor or the other - it has always been the case, it will very likely continue to be so. Nitrous is built for GCN. No real surprises since Oxide's Star Swarm was the original Mantle demo poster child. AMD gets its licks in early. Smart marketing move. It will be interesting how they react when they are at the disadvantage, and what games draw what mix of hardware and software features available to DX12
4. With the previous point in mind, Unreal launched UE 4.9 yesterday. The engine supports a number of features that AMD has had problems with (GameWorks), or has architectural/driver issues with. 4.9 I believe has VXGI support, and ray tracing. My guess is that the same people screaming "Nvidia SUCK IT!!!!!" will be the same people crying foul when a game emerges that leverages any of these graphical effects.....of course, Unreal Engine 4 might be inconsequential WRT AAA titles, but I very much doubt it.

PC Gaming benchmarks and performance - vendors win some, lose some. Wash.Rinse.Repeat. I just hope the knee-jerk comments keep on coming - I just love bookmarking (and screencapping for those who retroactively rewrite history) for future reference.
The UE4.9 notes are pretty extensive, so here's an editor shot showing the VXGI support.
 
Last edited:

the54thvoid

Moderator
Staff member
Joined
Dec 14, 2009
Messages
7,854 (2.06/day)
Location
Glasgow - home of formal profanity
System Name Newer Ho'Ryzen
Processor Ryzen 3700X
Motherboard Asus Crosshair VI Hero
Cooling TR Le Grand Macho
Memory 16Gb G.Skill 3200 RGB
Video Card(s) RTX 2080ti MSI Duke @2Ghz ish
Storage Samsumg 960 Pro m2. 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Lian Li PC-V33WX
Audio Device(s) On Board
Power Supply Seasonic Prime TItanium 850
Software W10
Benchmark Scores Look, it's a Ryzen on air........ What's the point?
Wow.......what's up with the hostility man. Is it really you? Doesn't seem like it.
Because I tire to my old bones of idiots spouting ill thought out shite. This bit:

haha, jokes on you Nvidia fanboy
is where my ire is focused because my post isn't in any way Nvidia-centric. I believe i say 'kudos' to AMD for bringing the hardware level support to the fore. This forum is all too often littered with idiotic and childish school ground remarks that would otherwise be met with a smack on the chops. I'm a pleasant enough chap but I'm no pacifist and the veil of internet anonymity is just one area where cowards love to hide.

So while you decry my hostility - which was in fact a simple retort of intellectual deficit (aimed at myself as well, having the IQ of a slug) why are you not attacking the tone of the post from the fud that laughs in my face and calls me a fanboy? I'm not turning the other cheek if someone intentionally offends me.

EDIT: where I come from, my post wasn't even a tickle near hostility.
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
15,925 (3.50/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K at stock (hits 5 gees+ easily)
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (4 x 4GB Corsair Vengeance DDR3 PC3-12800 C9 1600MHz)
Video Card(s) Zotac GTX 1080 AMP! Extreme Edition
Storage Samsung 850 Pro 256GB | WD Green 4TB
Display(s) BenQ XL2720Z | Asus VG278HE (both 27", 144Hz, 3D Vision 2, 1080p)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair HX 850W v1
Software Windows 10 Pro 64-bit
A while back when it was being said that Kepler and Maxwell were DX12 compliant I said no way, only partial at the most and that we should wait for Pascal for full compliance, since these GPUs precede the DX12 standard and hence cannot possibly fully support it. Nice to see this article prove me right on this one.

It's inevitably the case that the most significant feature of a new graphics API will require new hardware to go with it and that's what we have here.

It also doesn't surprise me that NVIDIA would pressure a dev to remove problematic code from a DX12 benchmark in order not to be shown up. :shadedshu:

What should really happen is that the benchmark point out what isn't supported when run on pre-Pascal GPUs and pre-Fury ones for AMD) but that's not happening is it? It should then run that part of the benchmark on AMD Fury hardware since it does support it. However, that part of the benchmark is simply not there at all and that's the scandal.
 

Frick

Fishfaced Nincompoop
Joined
Feb 27, 2006
Messages
16,194 (3.11/day)
Location
Piteå
System Name Black MC in Tokyo
Processor Ryzen 5 2600x
Motherboard Asrock B450M-HDV
Cooling AMD Wraith Spire I think
Memory 2 x 8GB G-skill Aegis 3000 or somesuch
Video Card(s) Asus GTX 760 DCU2OC 2GB
Storage Kingston A400 240GB | WD Blue 1TB x 2
Display(s) BenQ GL2450HT
Case Some old Antec
Audio Device(s) Line6 UX1 + slightly modded Sony DR-ZX302
Power Supply Fractal Design Effekt 400W
Mouse Logitech G602
Keyboard Cherry MX-Board 3.0
Software Windows 10 Pro
Benchmark Scores I once had +100 dorfs in DF, so yeah pretty great

the54thvoid

Moderator
Staff member
Joined
Dec 14, 2009
Messages
7,854 (2.06/day)
Location
Glasgow - home of formal profanity
System Name Newer Ho'Ryzen
Processor Ryzen 3700X
Motherboard Asus Crosshair VI Hero
Cooling TR Le Grand Macho
Memory 16Gb G.Skill 3200 RGB
Video Card(s) RTX 2080ti MSI Duke @2Ghz ish
Storage Samsumg 960 Pro m2. 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Lian Li PC-V33WX
Audio Device(s) On Board
Power Supply Seasonic Prime TItanium 850
Software W10
Benchmark Scores Look, it's a Ryzen on air........ What's the point?
Incest-spawned shitface.
Love you too! But I was borne of an ill conceived ICI and brewery conglomerate. I'm a turpentine spawned expletive.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
11,680 (3.84/day)
Location
Concord, NH
System Name Kratos
Processor Intel Core i7 3930k @ 4.5Ghz
Motherboard ASUS P9X79 Deluxe
Cooling Corsair H100i V2
Memory G.Skill DDR3-2133, 16gb (4x4gb) @ 9-11-10-28-108-1T 1.65v
Video Card(s) Sapphire AMD Radeon RX Vega 64
Storage 2x120Gb SATA3 SSD Raid-0, 4x1Tb RAID-5, 1x500GB, 1x512GB Samsung 960 Pro NVMe
Display(s) 1x LG 27UD69P (4k), 2x Dell S2340M (1080p)
Case Antec 1200
Audio Device(s) Onboard Realtek® ALC898, FIIO Alpen 2 Headphone DAC + Amp
Power Supply Seasonic 1000-watt 80 PLUS Platinum
Mouse Logitech G602
Keyboard Rosewill RK-9100, Cherry MX Blues with O-rings
Software Ubuntu 18.04 (5.6.11 Mainline Kernel)
Benchmark Scores Benchmarks aren't everything.
All this tells me is that GCN has untapped resources that DX12 (in this case,) could take advantage of. Probably a great example of how engines and rendering libraries in the past did a piss poor job of utilizing resources properly. nVidia caught on fast and started throwing away parts of the GPU that games weren't needing while crippling things like DP performance where GCN has always been biased toward compute-heavy workloads. If anything, this is just another example of how DX12 very well might utilize resources better than DX11 and earlier have. The real question is as more DX12 games and benchmarks start cropping up, how many of them will show similar results?
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
15,925 (3.50/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K at stock (hits 5 gees+ easily)
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (4 x 4GB Corsair Vengeance DDR3 PC3-12800 C9 1600MHz)
Video Card(s) Zotac GTX 1080 AMP! Extreme Edition
Storage Samsung 850 Pro 256GB | WD Green 4TB
Display(s) BenQ XL2720Z | Asus VG278HE (both 27", 144Hz, 3D Vision 2, 1080p)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair HX 850W v1
Software Windows 10 Pro 64-bit
All this tells me is that GCN has untapped resources that DX12 (in this case,) could take advantage of. Probably a great example of how engines and rendering libraries in the past did a piss poor job of utilizing resources properly. nVidia caught on fast and started throwing away parts of the GPU that games weren't needing while crippling things like DP performance where GCN has always been biased toward compute-heavy workloads. If anything, this is just another example of how DX12 very well might utilize resources better than DX11 and earlier have. The real question is as more DX12 games and benchmarks start cropping up, how many of them will show similar results?
What I don't want to see are games purporting to be DX12 catering to the lowest common denominator, ie graphics cards with only partial DX12 support and there's an awful lot of those about, from both brands.

If DX12 with the latest games and full GPU DX12 features (eg Pascal) doesn't have a real wow factor compelling users to upgrade then this becomes a distinct possibility. :ohwell:
 
Last edited:
Joined
Dec 22, 2011
Messages
3,127 (1.02/day)
System Name Zimmer Frame Rates
Processor Intel i7 920 @ Stock speeds baby
Motherboard EVGA X58 3X SLI
Cooling True 120
Memory Corsair Vengeance 12GB
Video Card(s) Palit GTX 980 Ti Super JetStream
Storage Of course
Display(s) Crossover 27Q 27" 2560x1440
Case Antec 1200
Audio Device(s) Don't be silly
Power Supply XFX 650W Core
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 10
Benchmark Scores Epic
After all this fuss let's hope Ashes of the Singularity isn't shit.
 
Joined
Apr 1, 2015
Messages
19 (0.01/day)
Location
Athens Greece
System Name The Sentinel Reloaded
Processor AMD Ryzen 7 2700X
Motherboard Asus Rog Strix X470-F Gaming
Cooling CoolerMaster MasterLiquid Lite 240
Memory 24GB Patriot Viper (2X8) + HyperX Predator 16GB (2X4) DDR4-3000MHz
Video Card(s) Sapphire Radeon RX 570 4GB Nitro+
Storage WD Μ.2 Black NVME 500Gb, Seagate Barracuda 500GB (2.5"), Seagate Firecuda 2TB (2.5")
Display(s) LG 29UM59-P
Case Lian-Li PC-011 Dynamic Black
Audio Device(s) Onboard
Power Supply Super Flower Leadex II 80 Plus Gold 1000W Black
Mouse Logitech Marathon M705
Keyboard Logitech K330
Software Windows 10 Pro
Benchmark Scores Beyond this Galaxy!
It's again a SINGLE game. Until I see more (that aren't exclusive to either camp like this one is to AMD), then I'll accept the info...
So this is what you understand from the article? That "Ashes" is exclusive to AMD? Please take off the green glasses!
 
Joined
Jan 2, 2012
Messages
1,062 (0.35/day)
Location
Indonesia
Processor AMD Ryzen 7 3700X
Motherboard ASUS STRIX X470-F
Cooling NOCTUA NH-U12A
Memory G.Skill FlareX 32 GB (4 x 8 GB) DDR4-3200
Video Card(s) PALIT RTX 2080 Super GRP
Storage 512 GB Samsung 850 Pro | 500GB Crucial MX500 SATA M.2 | 2 x 4 TB WD Black
Display(s) Dell U2717D
Case Fractal Design Define R5 Black
Power Supply Seasonic Prime 650W
Mouse Logitech G703 Hero
Keyboard KIRA EXS
I think the title is a little bit misleading. "Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for Ashes of Singularity" would be better :D

And I think it's not surprising if AMD have the upper had on async compute. They just have more "muscles" to do that, especially if the game devs spam the GPU with a lot of graphic and compute task.

As far as I understand, NVIDIA GPUs will still do an async compute, BUT it will be limited to 31 command queue to be effective (a.k.a. not overloading their scheduler) meanwhile AMD can do up to 64 command queue and still be as effective.

NVIDIA = 1 graphic engine, 1 shader engine, with 32 depth command queue (total of 32 queues, 31 maximum usable for graphic/compute mode)
AMD = 1 graphic engine, 8 shader engines (they coined it as an ACE or Asinc Compute Engine), with 8 depth command queue (total of 64 queues)

So if you spam lot of graphics and computes command (on the GPU in a non-sequential way) to an NVIDIA GPU, it will end up overloading its scheduler and then it will do a lot of context switching (from graphic command to compute command and vice versa), this will result in increased latency, hence the increased time for processing. This is what happened in this specific game demo (Ashes of Singularity, AoS), they use our GPU(s) to process the graphic command (to render all of those little space ships thingy) AND also to process the compute command (the AIs for every single space ship thingy), and the more the space ship thingy, the more NVIDIA GPUs will suffer.

And you'll all thinking : "AMD can only win in DX12, async compute rulezz!", well, the fact is we don't know yet. We don't know how most game devs deal with the graphic and compute side of their games, whether they think it would be wise to offload most compute task to our GPUs (so freeing CPU resource a.k.a. removing most of CPU bottleneck) or just let the CPU do the compute tasks (less hassle in coding and especially synchronizing).

Oh and UE4 wrote in their documentation for async compute implementation in their engine :

As more more APIs expose the hardware feature we would like make the system more cross platform. Features that make use use AsyncCompute you always be able to run without (console variable / define) to run on other platforms and easier debugging and profiling. AsyncCompute should be used with caution as it can cause more unpredicatble performance and requires more coding effort for synchromization.
From here : https://docs.unrealengine.com/latest/INT/Programming/Rendering/ShaderDevelopment/AsyncCompute/index.html
 
Joined
Oct 16, 2013
Messages
35 (0.01/day)
Processor i7 4930k
Motherboard Rampage IV Extreme
Cooling Thermalright HR-02 Macho
Memory 4 X 4096 MB G.Skill DDR3 1866 9-10-9-26
Video Card(s) Gigabyte GV-N780OC-3GD
Storage Crucial M4 128GB, M500 240GB, Samsung HD103SJ 1TB
Display(s) Planar PX2710MW 27" 1920x1080
Case Corsair 500R
Power Supply RAIDMAX RX-1200AE
Software Windows 10 64-bit
I think the title is a little bit misleading. "Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for Ashes of Singularity" would be better :D

And I think it's not surprising if AMD have the upper had on async compute. They just have more "muscles" to do that, especially if the game devs spam the GPU with a lot of graphic and compute task.

As far as I understand, NVIDIA GPUs will still do an async compute, BUT it will be limited to 31 command queue to be effective (a.k.a. not overloading their scheduler) meanwhile AMD can do up to 64 command queue and still be as effective.

NVIDIA = 1 graphic engine, 1 shader engine, with 32 depth command queue (total of 32 queues, 31 maximum usable for graphic/compute mode)
AMD = 1 graphic engine, 8 shader engines (they coined it as an ACE or Asinc Compute Engine), with 8 depth command queue (total of 64 queues)

So if you spam lot of graphics and computes command (on the GPU in a non-sequential way) to an NVIDIA GPU, it will end up overloading its scheduler and then it will do a lot of context switching (from graphic command to compute command and vice versa), this will result in increased latency, hence the increased time for processing. This is what happened in this specific game demo (Ashes of Singularity, AoS), they use our GPU(s) to process the graphic command (to render all of those little space ships thingy) AND also to process the compute command (the AIs for every single space ship thingy), and the more the space ship thingy, the more NVIDIA GPUs will suffer.

And you'll all thinking : "AMD can only win in DX12, async compute rulezz!", well, the fact is we don't know yet. We don't know how most game devs deal with the graphic and compute side of their games, whether they think it would be wise to offload most compute task to our GPUs (so freeing CPU resource a.k.a. removing most of CPU bottleneck) or just let the CPU do the compute tasks (less hassle in coding and especially synchronizing).

Oh and UE4 wrote in their documentation for async compute implementation in their engine :



From here : https://docs.unrealengine.com/latest/INT/Programming/Rendering/ShaderDevelopment/AsyncCompute/index.html
I don't think it's the command queues causing problem. By AMD side, only hawaii, tonga and fiji have 8 ACEs. Older GCNs have 1 or 2 ACEs. If there is a huge amount of command queues that is causing problem, then not only NVIDIA card's but also those GCN cards with fewer ACEs will have a performance downgrade in DX12. But the benchmark result does not support this argument at all.
 
Joined
Jan 2, 2012
Messages
1,062 (0.35/day)
Location
Indonesia
Processor AMD Ryzen 7 3700X
Motherboard ASUS STRIX X470-F
Cooling NOCTUA NH-U12A
Memory G.Skill FlareX 32 GB (4 x 8 GB) DDR4-3200
Video Card(s) PALIT RTX 2080 Super GRP
Storage 512 GB Samsung 850 Pro | 500GB Crucial MX500 SATA M.2 | 2 x 4 TB WD Black
Display(s) Dell U2717D
Case Fractal Design Define R5 Black
Power Supply Seasonic Prime 650W
Mouse Logitech G703 Hero
Keyboard KIRA EXS
I don't think it's the command queues causing problem. By AMD side, only hawaii, tonga and fiji have 8 ACEs. Older GCNs have 1 or 2 ACEs. If there is a huge amount of command queues that is causing problem, then not only NVIDIA card's but also those GCN cards with fewer ACEs will have a performance downgrade in DX12. But the benchmark result does not support this argument at all.
I can't find any AoS benchmark result with older GCN cards like Tahiti, Pitcairn, etc.
It would be much appreciated if you can provide one for reading purpose, thank you very much.
 
Joined
Oct 16, 2013
Messages
35 (0.01/day)
Processor i7 4930k
Motherboard Rampage IV Extreme
Cooling Thermalright HR-02 Macho
Memory 4 X 4096 MB G.Skill DDR3 1866 9-10-9-26
Video Card(s) Gigabyte GV-N780OC-3GD
Storage Crucial M4 128GB, M500 240GB, Samsung HD103SJ 1TB
Display(s) Planar PX2710MW 27" 1920x1080
Case Corsair 500R
Power Supply RAIDMAX RX-1200AE
Software Windows 10 64-bit
Joined
Aug 20, 2007
Messages
12,955 (2.78/day)
System Name Pioneer
Processor Intel i9 9900k
Motherboard ASRock Z390 Taichi
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) AMD RX 5700 XT (XFX THICC Ultra III)
Storage Mushkin Pilot-E 2TB NVMe SSD w/ EKWB M.2 Heatsink
Display(s) 55" LG 55B9-OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) VGA HDMI->Panasonic SC-HTB20/Schiit Modi MB/Asgard 2 DAC/Amp to AKG Pro K7712 Headphones
Power Supply SeaSonic Prime 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Transluscent Keycaps
Software Windows 10 Enterprise (yes, it's legit.)
This. We all know Nvdia has cheated at benchmarks in the past.

so its like this:

Oxide: Look! we got benchmarks!
AMD: oh my we almost beat The Green Meanies
Nvidia: @oxide you cheated on the benchmarks
Oxide: did not. nyah.
Nvidia: disable competitive features so our non-async bluff works right!
Oxide: not gonna happen
Nvidia: F*** you AMD! you're not better than us! we'll fix you with our l33t bluffing skillz
AMD: *poops pants* and hides in a corner eating popcorn.
This is a more accurate summary than I'd like to admit. I always felt AMD would be more partial to eating paste than popcorn however.
 

rtwjunkie

PC Gaming Enthusiast
Supporter
Joined
Jul 25, 2008
Messages
12,997 (3.01/day)
Location
Louisiana -Laissez les bons temps rouler!
System Name Bayou Phantom
Processor Core i7-8700k 4.4Ghz @ 1.18v
Motherboard ASRock Z390 Phantom Gaming 6
Cooling All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax T40F Black CPU cooler
Memory 2x 16GB Mushkin Redline DDR-4 3200
Video Card(s) MSI GTX 1080Ti Gaming X
Storage 1x 500 MX500 SSD; 1x 2TB WD Black; 2x 4TB WD Black; 1x400GB VelRptr; 1x 3TB WD Blue storage (eSATA)
Display(s) HP 27q 27" IPS @ 2560 x 1440
Case Fractal Design Define R4 Black w/Titanium front -windowed
Audio Device(s) Soundblaster Z
Power Supply Seasonic X-850
Mouse Coolermaster Sentinel III (large palm grip!)
Keyboard Logitech G610 Orion mechanical (Cherry Brown switches)
Software Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)
So this is what you understand from the article? That "Ashes" is exclusive to AMD? Please take off the green glasses!
See, now this is the danger in making assumptions about people's hardware preferences when you know nothing about them.

What you don't know is that @RejZoR has been a long time AMD supporter, and only recently got a 980 out of frustration.
 
Last edited:
Joined
Aug 20, 2007
Messages
12,955 (2.78/day)
System Name Pioneer
Processor Intel i9 9900k
Motherboard ASRock Z390 Taichi
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) AMD RX 5700 XT (XFX THICC Ultra III)
Storage Mushkin Pilot-E 2TB NVMe SSD w/ EKWB M.2 Heatsink
Display(s) 55" LG 55B9-OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) VGA HDMI->Panasonic SC-HTB20/Schiit Modi MB/Asgard 2 DAC/Amp to AKG Pro K7712 Headphones
Power Supply SeaSonic Prime 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Transluscent Keycaps
Software Windows 10 Enterprise (yes, it's legit.)
A while back when it was being said that Kepler and Maxwell were DX12 compliant
They are. Heck frickin' fermi is COMPLIANT.

Compliant being the key word. They aren't FULLY SUPPORTED. Careful with your words there man. ;)

See, now this is the danger in making assumptions about people's hardware preferences when you know nothing about them.

What you don't know is that @RejZoR has been a long time AMD supporter, and only recently got a 980 out of frustration.
Indeed. He did it shortly after the "I'll eat my shoes if AMD make R9 390x GCN 1.1" debacle. He was like our AMD posterboy only he's proven he'll go green if it's required to get a good game. I would not target him for assumptions, if I were you.
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
38,801 (8.41/day)
Location
Hyderabad, India
Processor AMD Ryzen 7 2700X
Motherboard ASUS ROG Strix B450-E Gaming
Cooling AMD Wraith Prism
Memory 2x 16GB Corsair Vengeance LPX DDR4-3000
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) Creative Sound Blaster Recon3D PCIe
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Microsoft Sidewinder X4
Software Windows 10 Pro
After all this fuss let's hope Ashes of the Singularity isn't shit.
Oh it is shit. But that's not the point.

I think the title is a little bit misleading. "Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for Ashes of Singularity" would be better :D


Async compute is a DirectX feature, not an AotS feature.
 
Joined
Feb 18, 2011
Messages
1,250 (0.37/day)
Its about time for the console deal "with no profit in it" to pay back.

Its on NV that they are out of it. Next time they should get more involved in the gaming ecosystem if they want to have a word in its development, even if it doesn't bring them the big bucks immediately.
They tried with the last gen and Sony choose them (sadly they couldn't wait a few months for the g80), this gen would have been way much more expensive without APUs, what Nvidia lacks, so there was no real price-war at all, AMD simply made a bad deal imo.

There bigger implications if either side is up to mischief.

If the initial game engine is being developed under AMD GNC with Async Compute for consoles then we PC gamers are getting even more screwed.

We are already getting a bad port, non-optimize, down-graded from improved API paths. GameWorks middle-ware. driver side emulation, developer who don't care = PC version of Batman Arkham Knight
Please read my two earlier posts, I suggested a slightly different conclusion with those, but to reply to you; We are already screwed because the consoles (the main target platforms for developers) have Jaguar cores. That's why it doesn't really matter what things will be supported from software or hardware, because we will have so many free CPU cycles on the PC, some driver magic won't matter much. I bet even Fermi cards will do just "fine" under dx12 (and yes, Fermi dx12 support is coming probably the end of this year where nvidia will simply implement the missing features from software just how AMD will do with their older architectures).
 
Last edited:
Joined
Mar 24, 2011
Messages
2,311 (0.69/day)
Location
Essex Jct, VT
Processor AMD Ryzen 5 2600
Motherboard Gigabyte B450 Aurorus Elite
Cooling Stock
Memory 16GB (2x8GB) Corsair Vengence LPX DDR4
Video Card(s) Gigabyte GTX 1060 Windforce OC 6GB
Storage Samsung EVO 850 256GB / Samsung EVO 860 500GB / WD Caviar Black 1TB
Display(s) AOC G2590FX
Case NZXT H500 Mid-Tower
Audio Device(s) Onboard
Power Supply Corsair RM650x 650W Fully Modular
Software Windows 10
Is anyone surprised AMD's Architecture does better at a Compute-centric task? They have been championing Compute for the better part of the past 5 years, as Nvidia was shedding it until the technology was actually relevant. I think this is a good indicator that Pascal is going to usher in the return of a lot of Compute functionality.
 
Joined
Oct 2, 2004
Messages
13,791 (2.41/day)
So this is what you understand from the article? That "Ashes" is exclusive to AMD? Please take off the green glasses!
Ahahaha, did this dude just flag me as NVIDIA fanboy? Kiddo, you are hilarious. For last 10 years, I've owned nothing but ATi/AMD graphic cards. This is the first NVIDIA after all those years. Nice try. XD

And I never said it's "exclusive". All I said is that this very specific game has been developed with cooperation with AMD basically since day one using Mantle. And when that dropped in the water, it's DX12. No one says Project Cars is NVIDIA exclusive, but we all know it has been developed with NVIDIA since day one basically and surprise surprise, it runs on NVIDIA far better than on any AMD card. Wanna calls omeone AMD fanboy for that one? 1 single game doesn't reflect perfrmance in ALL games.
 
Joined
Sep 25, 2007
Messages
5,842 (1.26/day)
Location
New York
Processor Core I7 3770K@4.8Ghz
Motherboard AsRock Z77 Extreme
Cooling Cooler Master Seidon 120M
Memory 12Gb G.Skill Sniper
Video Card(s) EVGA GTX 1070 FTW 2150/2240
Storage Sandisk SSD + 1TB Seagate Barracuda 7200
Display(s) IPS Asus 26inch
Case Antec 300
Audio Device(s) Xonar DG
Power Supply EVGA Supernova 650 G2
Software Windows 10/Windows 7
not really surprised, but its starting to look like nvidia has not made any significant strides in async compute since kepler(async only with compute operations only, but not with compute+graphics)
 

the54thvoid

Moderator
Staff member
Joined
Dec 14, 2009
Messages
7,854 (2.06/day)
Location
Glasgow - home of formal profanity
System Name Newer Ho'Ryzen
Processor Ryzen 3700X
Motherboard Asus Crosshair VI Hero
Cooling TR Le Grand Macho
Memory 16Gb G.Skill 3200 RGB
Video Card(s) RTX 2080ti MSI Duke @2Ghz ish
Storage Samsumg 960 Pro m2. 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Lian Li PC-V33WX
Audio Device(s) On Board
Power Supply Seasonic Prime TItanium 850
Software W10
Benchmark Scores Look, it's a Ryzen on air........ What's the point?
Here's a balanced opinion.

AMD have focused on Mantle to get better hardware level implementation to suit their GCN1.1+ architecture. From this they have set some fire under MS and got DX12 to be closer to the metal. This focus has left Nvidia to keep on top of things at DX11 level.
Following Kepler, Nvidia have focused on efficiency and performance and Maxwell has brought them that in spades with DX11. Nvidia have effectively taken the opposite gamble of AMD. Nvidia has stuck with DX11 focus and AMD has forged on toward DX12.

So far so neutral.

They have both gambled and they will both win and lose. AMD have gambled DX12 adoption will be rapid and that will allow their GCN1.1+ to provide a massive performance increase and quite likely surpass Maxwell architecture designs. Even possibly in best case scenarios with rebranded Hawaii matching top level Maxwell (bravo AMD). Nvidia have likely thought that DX12 implementation will not occur rapidly enough until 2016, therefore they have settled with the Maxwell DX11 performance efficiency. Nvidia for their part have probably 'fiddled' to pretend they have most awesome DX12 support when in reality it;s a driver thing (as AoS apparently shows).

So, if DX12 implementation is slow, Nvidia gamble pays off. If DX12 uptake is rapid and occurs before Pascal, Nvidia lose (and will most definitely cheat with massive developer pressure and incentive). If DX12 comes through in bits and bobs, it will come down to what games you play (as always). However, as a gamer, I'm not upgrading to W10 until MS patches the 'big brother' updating mechanisms I keep reading about.

TL.DR? = Like everyone has been saying - despite AMD's GCN advantage, without a slew of top AAA titles, the hardware is irrelevant. If DX11 games are still being pumped out, GCN wont help. If DX12 comes earlier, AMD win.
 
Joined
Oct 23, 2004
Messages
101 (0.02/day)
Location
Perth, Western Australia
System Name THOR
Processor Core i7 3820 @ 4.3GHz
Motherboard Asus Rampage Formula IV
Cooling Corsair H80 + 5x12cm Case Fans
Memory 16GB G.Skill 2400MHz
Video Card(s) GTX 980 + GTX 570 + GTX 680 SLI
Storage OCZ Vectror 256GB + Vertex IV 256GB + Intel 520 128GB + 1TB WD Caviar Black
Display(s) 27" Dell U2711 + 24" Dell U2410 + 24" BenQ FP241W
Case Cooler Master Cosmos II Ultra
Audio Device(s) SupremeFX III
Power Supply Corsair AX1200 1200w
Software Windows 10 Pro 64bit
From what I have read Maxwell is capable of Async compute (and Async Shaders), and is actually faster when it can stay within its work order limit (1+31 queues).

The GTX 980 Ti is twice as the Fury X but only when it is under 31 simultaneous command lists.

The GTX 980 Ti performed roughly equal to the Fury X at up to 128 command lists.

This is why we need to wait for more games to be released before we jump to conclusions.
 
Joined
Nov 3, 2011
Messages
364 (0.12/day)
System Name Fractal Define R5 | Fractal Define R6
Processor AMD Ryzen 9 3900X | Intel Core i7-9900K @ 5 Ghz all cores
Motherboard ASUS ROG Strix X570 Gaming | MSI Z390 Gaming Pro Carbon AC
Cooling CORSAIR Hydro H115i, RGB | CORSAIR Hydro H150i RGB
Memory G.Skill Trident 32GB 3200 Mhz RGB| HyperX 32GB 3600 Mhz RGB
Video Card(s) MSI Geforce GTX 1080 Ti FE| MSI Geforce RTX 2080 Ti GX TRIO
Display(s) 3X Samsung 23 in LED | LG 32UL950-W 32in 4K HDR FreeSync
Case Fractal R5 tempered glass | Fractal R6 tempered glass
Audio Device(s) Creative Sound Blaster Z | Creative Sound Blaster AE-7
Power Supply Seasonic 750 watts| Seasonic 1000 watts
Mouse Bloody P95s
Keyboard Logitech G810s
Software MS Windows 10 Pro version 2004
Benchmark Scores Intel Core i7-7820X 4.5 Ghz and ASUS ROG Strix X299-E Gaming parts in storage.
From what I have read Maxwell is capable of Async compute (and Async Shaders), and is actually faster when it can stay within its work order limit (1+31 queues).

The GTX 980 Ti is twice as the Fury X but only when it is under 31 simultaneous command lists.

The GTX 980 Ti performed roughly equal to the Fury X at up to 128 command lists.

This is why we need to wait for more games to be released before we jump to conclusions.
That Beyond3D benchmark is pretty simple and designed to be in-order.

Maxwellv2 is not capable of concurrent async + rendering without incurring context penalties and it's under this context that Oxdie made it's remarks.

AMD's ACE units are designed to run concurrently with rendering without context penalties and includes out-of-order features.

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-10

From sebbbi:
The latency doesn't matter if you are using GPU compute (including async) for rendering. You should not copy the results back to CPU or wait for the GPU on CPU side. Discrete GPUs are far away from the CPU. You should not expect to see low latency. Discrete GPUs are not good for tightly interleaved mixed CPU->GPU->CPU work.

To see realistic results, you should benchmark async compute in rendering tasks. For example render a shadow map while you run a tiled lighting compute shader concurrently (for the previous frame). Output the result to display instread of waiting compute to finish on CPU. For result timing, use GPU timestamps, do not use a CPU timer. CPU side timing of GPU results in lots of noise and even false results because of driver related buffering.
---------------------

AMD APU would be king for tightly interleaved mixed CPU->GPU->CPU work e.g. PS4's APU was designed for this kind of work.

PS4 sports the same 8 ACE units as Tonga, Hawaii and Fury.


They tried with the last gen and Sony choose them (sadly they couldn't wait a few months for the g80), this gen would have been way much more expensive without APUs, what Nvidia lacks, so there was no real price-war at all, AMD simply made a bad deal imo.

Please read my two earlier posts, I suggested a slightly different conclusion with those, but to reply to you; We are already screwed because the consoles (the main target platforms for developers) have Jaguar cores. That's why it doesn't really matter what things will be supported from software or hardware, because we will have so many free CPU cycles on the PC, some driver magic won't matter much. I bet even Fermi cards will do just "fine" under dx12 (and yes, Fermi dx12 support is coming probably the end of this year where nvidia will simply implement the missing features from software just how AMD will do with their older architectures).
XBO is the baseline DirectX12 GPU and it has two ACE units with 8 queues per unit as per Radeon HD 7790 (GCN 1.1).

The older GCN 1.0 still has two ACE units with 2 queues per unit but it's less capable than GCN 1.1.

GCN 1.0 such as 7970/R9-280X is still better than Fermi and Kelper in concurrent Async+Render category.

Ahahaha, did this dude just flag me as NVIDIA fanboy? Kiddo, you are hilarious. For last 10 years, I've owned nothing but ATi/AMD graphic cards. This is the first NVIDIA after all those years. Nice try. XD

And I never said it's "exclusive". All I said is that this very specific game has been developed with cooperation with AMD basically since day one using Mantle. And when that dropped in the water, it's DX12. No one says Project Cars is NVIDIA exclusive, but we all know it has been developed with NVIDIA since day one basically and surprise surprise, it runs on NVIDIA far better than on any AMD card. Wanna calls omeone AMD fanboy for that one? 1 single game doesn't reflect perfrmance in ALL games.
With Project Cars, AMD's lower DX11 draw call limit is the problem.

Read SMS PC Lead's comment on this issue from
http://forums.guru3d.com/showpost.php?p=5116716&postcount=901

For our mix of DX11 API calls, the API call consumption rate of the AMD driver is the bottleneck.

In Project Cars the range of draw calls per frame varies from around 5-6000 with everything at low up-to 12-13000 with everything at Ultra. Depending on the single threaded performance of your CPU there will be a limit of the amount of draw calls that can be consumed and as I mentioned above, once that is exceeded GPU usage starts to reduce. On AMD/Windows 10 this threshold is much higher which is why you can run with higher settings without FPS loss.

I also mentioned about 'gaps' in the GPU timeline caused by not being able to feed the GPU fast enough - these gaps are why increasing resolution (like to 4k in the Anandtech analysis) make for a better comparison between GPU vendors... In 4k, the GPU is being given more work to do and either the gaps get filled by the extra work and are smaller.. or the extra work means the GPU is now always running behind the CPU submission rate.

So, on my i7-5960k@3.0ghz the NVIDIA (Titan X) driver can consume around 11,000 draw-calls with our DX11 API call mix - the same Windows 7 System with a 290x and the AMD driver is CPU limited at around 7000 draw-calls : On Windows 10 AMD is somewhere around 8500 draw-calls before the limit is reached (I can't be exact since my Windows 10 box runs on a 3.5ghz 6Core i7)

In Patch 2.5 (next week) I did a pass to reduce small draw-calls when using the Ultra settings, as a concession to help driver thread limitations. It gains around 8% for NVIDIA and about 15% (minimum) for AMD.
...

For Project Cars the 1040 driver is easily the fastest under Windows 10 at the moment - but my focus at the moment is on the fairly large engineering task of implementing DX12 support...


----------------------------------------


Project Cars with DX12 is coming.
 
Last edited:
Joined
Oct 23, 2004
Messages
101 (0.02/day)
Location
Perth, Western Australia
System Name THOR
Processor Core i7 3820 @ 4.3GHz
Motherboard Asus Rampage Formula IV
Cooling Corsair H80 + 5x12cm Case Fans
Memory 16GB G.Skill 2400MHz
Video Card(s) GTX 980 + GTX 570 + GTX 680 SLI
Storage OCZ Vectror 256GB + Vertex IV 256GB + Intel 520 128GB + 1TB WD Caviar Black
Display(s) 27" Dell U2711 + 24" Dell U2410 + 24" BenQ FP241W
Case Cooler Master Cosmos II Ultra
Audio Device(s) SupremeFX III
Power Supply Corsair AX1200 1200w
Software Windows 10 Pro 64bit
Maxwellv2 is not capable of concurrent async + rendering without incurring context penalties.

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-10

From sebbbi:
The latency doesn't matter if you are using GPU compute (including async) for rendering. You should not copy the results back to CPU or wait for the GPU on CPU side. Discrete GPUs are far away from the CPU. You should not expect to see low latency. Discrete GPUs are not good for tightly interleaved mixed CPU->GPU->CPU work.

To see realistic results, you should benchmark async compute in rendering tasks. For example render a shadow map while you run a tiled lighting compute shader concurrently (for the previous frame). Output the result to display instread of waiting compute to finish on CPU. For result timing, use GPU timestamps, do not use a CPU timer. CPU side timing of GPU results in lots of noise and even false results because of driver related buffering.
---------------------

AMD APU would be king for tightly interleaved mixed CPU->GPU->CPU work.
https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/
 
Top