• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for DirectX 12

Joined
Sep 7, 2011
Messages
2,785 (0.61/day)
Location
New Zealand
System Name MoneySink
Processor 2600K @ 4.8
Motherboard P8Z77-V
Cooling AC NexXxos XT45 360, RayStorm, D5T+XSPC tank, Tygon R-3603, Bitspower
Memory 16GB Crucial Ballistix DDR3-1600C8
Video Card(s) GTX 780 SLI (EVGA SC ACX + Giga GHz Ed.)
Storage Kingston HyperX SSD (128) OS, WD RE4 (1TB), RE2 (1TB), Cav. Black (2 x 500GB), Red (4TB)
Display(s) Achieva Shimian QH270-IPSMS (2560x1440) S-IPS
Case NZXT Switch 810
Audio Device(s) onboard Realtek yawn edition
Power Supply Seasonic X-1050
Software Win8.1 Pro
Benchmark Scores 3.5 litres of Pale Ale in 18 minutes.
Wow.......what's up with the hostility man. Is it really you? Doesn't seem like it.
I wonder why Humansmoke hasn't commented on this news....
1. Because the whole thing is based upon a demo of an unreleased game which may - or may not, have any significant impact on PC gaming
2. Because as others have said, the time to start sweating is when DX12 games actually arrive.
3. As I've said in earlier posts, there are going to instances where game engines favour one vendor or the other - it has always been the case, it will very likely continue to be so. Nitrous is built for GCN. No real surprises since Oxide's Star Swarm was the original Mantle demo poster child. AMD gets its licks in early. Smart marketing move. It will be interesting how they react when they are at the disadvantage, and what games draw what mix of hardware and software features available to DX12
4. With the previous point in mind, Unreal launched UE 4.9 yesterday. The engine supports a number of features that AMD has had problems with (GameWorks), or has architectural/driver issues with. 4.9 I believe has VXGI support, and ray tracing. My guess is that the same people screaming "Nvidia SUCK IT!!!!!" will be the same people crying foul when a game emerges that leverages any of these graphical effects.....of course, Unreal Engine 4 might be inconsequential WRT AAA titles, but I very much doubt it.

PC Gaming benchmarks and performance - vendors win some, lose some. Wash.Rinse.Repeat. I just hope the knee-jerk comments keep on coming - I just love bookmarking (and screencapping for those who retroactively rewrite history) for future reference.
The UE4.9 notes are pretty extensive, so here's an editor shot showing the VXGI support.
 
Last edited:

the54thvoid

Intoxicated Moderator
Staff member
Joined
Dec 14, 2009
Messages
12,378 (2.37/day)
Location
Glasgow - home of formal profanity
Processor Ryzen 7800X3D
Motherboard MSI MAG Mortar B650 (wifi)
Cooling be quiet! Dark Rock Pro 4
Memory 32GB Kingston Fury
Video Card(s) Gainward RTX4070ti
Storage Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Asus Prime AP201
Audio Device(s) On Board
Power Supply be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software W10
Wow.......what's up with the hostility man. Is it really you? Doesn't seem like it.

Because I tire to my old bones of idiots spouting ill thought out shite. This bit:

haha, jokes on you Nvidia fanboy

is where my ire is focused because my post isn't in any way Nvidia-centric. I believe i say 'kudos' to AMD for bringing the hardware level support to the fore. This forum is all too often littered with idiotic and childish school ground remarks that would otherwise be met with a smack on the chops. I'm a pleasant enough chap but I'm no pacifist and the veil of internet anonymity is just one area where cowards love to hide.

So while you decry my hostility - which was in fact a simple retort of intellectual deficit (aimed at myself as well, having the IQ of a slug) why are you not attacking the tone of the post from the fud that laughs in my face and calls me a fanboy? I'm not turning the other cheek if someone intentionally offends me.

EDIT: where I come from, my post wasn't even a tickle near hostility.
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,866 (3.00/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
A while back when it was being said that Kepler and Maxwell were DX12 compliant I said no way, only partial at the most and that we should wait for Pascal for full compliance, since these GPUs precede the DX12 standard and hence cannot possibly fully support it. Nice to see this article prove me right on this one.

It's inevitably the case that the most significant feature of a new graphics API will require new hardware to go with it and that's what we have here.

It also doesn't surprise me that NVIDIA would pressure a dev to remove problematic code from a DX12 benchmark in order not to be shown up. :shadedshu:

What should really happen is that the benchmark point out what isn't supported when run on pre-Pascal GPUs and pre-Fury ones for AMD) but that's not happening is it? It should then run that part of the benchmark on AMD Fury hardware since it does support it. However, that part of the benchmark is simply not there at all and that's the scandal.
 

Frick

Fishfaced Nincompoop
Joined
Feb 27, 2006
Messages
18,914 (2.86/day)
Location
Piteå
System Name Black MC in Tokyo
Processor Ryzen 5 5600
Motherboard Asrock B450M-HDV
Cooling Be Quiet! Pure Rock 2
Memory 2 x 16GB Kingston Fury 3400mhz
Video Card(s) XFX 6950XT Speedster MERC 319
Storage Kingston A400 240GB | WD Black SN750 2TB |WD Blue 1TB x 2 | Toshiba P300 2TB | Seagate Expansion 8TB
Display(s) Samsung U32J590U 4K + BenQ GL2450HT 1080p
Case Fractal Design Define R4
Audio Device(s) Line6 UX1 + some headphones, Nektar SE61 keyboard
Power Supply Corsair RM850x v3
Mouse Logitech G602
Keyboard Cherry MX Board 1.0 TKL Brown
VR HMD Acer Mixed Reality Headset
Software Windows 10 Pro
Benchmark Scores Rimworld 4K ready!

the54thvoid

Intoxicated Moderator
Staff member
Joined
Dec 14, 2009
Messages
12,378 (2.37/day)
Location
Glasgow - home of formal profanity
Processor Ryzen 7800X3D
Motherboard MSI MAG Mortar B650 (wifi)
Cooling be quiet! Dark Rock Pro 4
Memory 32GB Kingston Fury
Video Card(s) Gainward RTX4070ti
Storage Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Asus Prime AP201
Audio Device(s) On Board
Power Supply be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software W10
Incest-spawned shitface.

Love you too! But I was borne of an ill conceived ICI and brewery conglomerate. I'm a turpentine spawned expletive.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.96/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
All this tells me is that GCN has untapped resources that DX12 (in this case,) could take advantage of. Probably a great example of how engines and rendering libraries in the past did a piss poor job of utilizing resources properly. nVidia caught on fast and started throwing away parts of the GPU that games weren't needing while crippling things like DP performance where GCN has always been biased toward compute-heavy workloads. If anything, this is just another example of how DX12 very well might utilize resources better than DX11 and earlier have. The real question is as more DX12 games and benchmarks start cropping up, how many of them will show similar results?
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,866 (3.00/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
All this tells me is that GCN has untapped resources that DX12 (in this case,) could take advantage of. Probably a great example of how engines and rendering libraries in the past did a piss poor job of utilizing resources properly. nVidia caught on fast and started throwing away parts of the GPU that games weren't needing while crippling things like DP performance where GCN has always been biased toward compute-heavy workloads. If anything, this is just another example of how DX12 very well might utilize resources better than DX11 and earlier have. The real question is as more DX12 games and benchmarks start cropping up, how many of them will show similar results?
What I don't want to see are games purporting to be DX12 catering to the lowest common denominator, ie graphics cards with only partial DX12 support and there's an awful lot of those about, from both brands.

If DX12 with the latest games and full GPU DX12 features (eg Pascal) doesn't have a real wow factor compelling users to upgrade then this becomes a distinct possibility. :ohwell:
 
Last edited:
Joined
Dec 22, 2011
Messages
3,890 (0.87/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
After all this fuss let's hope Ashes of the Singularity isn't shit.
 
Joined
Apr 1, 2015
Messages
23 (0.01/day)
Location
Athens Greece
System Name The Sentinel Reloaded
Processor AMD Ryzen 7 2700X
Motherboard Asus Rog Strix X470-F Gaming
Cooling CoolerMaster MasterLiquid Lite 240
Memory 24GB Patriot Viper (2X8) + HyperX Predator 16GB (2X4) DDR4-3000MHz
Video Card(s) Sapphire Radeon RX 570 4GB Nitro+
Storage WD Μ.2 Black NVME 500Gb, Seagate Barracuda 500GB (2.5"), Seagate Firecuda 2TB (2.5")
Display(s) LG 29UM59-P
Case Lian-Li PC-011 Dynamic Black
Audio Device(s) Onboard
Power Supply Super Flower Leadex II 80 Plus Gold 1000W Black
Mouse Logitech Marathon M705
Keyboard Logitech K330
Software Windows 10 Pro
Benchmark Scores Beyond this Galaxy!
It's again a SINGLE game. Until I see more (that aren't exclusive to either camp like this one is to AMD), then I'll accept the info...

So this is what you understand from the article? That "Ashes" is exclusive to AMD? Please take off the green glasses!
 
Joined
Jan 2, 2012
Messages
1,079 (0.24/day)
Location
Indonesia
Processor AMD Ryzen 7 5700X
Motherboard ASUS STRIX X570-E
Cooling NOCTUA NH-U12A
Memory G.Skill FlareX 32 GB (4 x 8 GB) DDR4-3200
Video Card(s) ASUS RTX 4070 DUAL
Storage 1 TB WD Black SN850X | 2 TB WD Blue SN570 | 10 TB WD Purple Pro
Display(s) LG 32QP880N 32"
Case Fractal Design Define R5 Black
Power Supply Seasonic Focus Gold 750W
Mouse Pulsar X2
Keyboard KIRA EXS
I think the title is a little bit misleading. "Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for Ashes of Singularity" would be better :D

And I think it's not surprising if AMD have the upper had on async compute. They just have more "muscles" to do that, especially if the game devs spam the GPU with a lot of graphic and compute task.

As far as I understand, NVIDIA GPUs will still do an async compute, BUT it will be limited to 31 command queue to be effective (a.k.a. not overloading their scheduler) meanwhile AMD can do up to 64 command queue and still be as effective.

NVIDIA = 1 graphic engine, 1 shader engine, with 32 depth command queue (total of 32 queues, 31 maximum usable for graphic/compute mode)
AMD = 1 graphic engine, 8 shader engines (they coined it as an ACE or Asinc Compute Engine), with 8 depth command queue (total of 64 queues)

So if you spam lot of graphics and computes command (on the GPU in a non-sequential way) to an NVIDIA GPU, it will end up overloading its scheduler and then it will do a lot of context switching (from graphic command to compute command and vice versa), this will result in increased latency, hence the increased time for processing. This is what happened in this specific game demo (Ashes of Singularity, AoS), they use our GPU(s) to process the graphic command (to render all of those little space ships thingy) AND also to process the compute command (the AIs for every single space ship thingy), and the more the space ship thingy, the more NVIDIA GPUs will suffer.

And you'll all thinking : "AMD can only win in DX12, async compute rulezz!", well, the fact is we don't know yet. We don't know how most game devs deal with the graphic and compute side of their games, whether they think it would be wise to offload most compute task to our GPUs (so freeing CPU resource a.k.a. removing most of CPU bottleneck) or just let the CPU do the compute tasks (less hassle in coding and especially synchronizing).

Oh and UE4 wrote in their documentation for async compute implementation in their engine :

As more more APIs expose the hardware feature we would like make the system more cross platform. Features that make use use AsyncCompute you always be able to run without (console variable / define) to run on other platforms and easier debugging and profiling. AsyncCompute should be used with caution as it can cause more unpredicatble performance and requires more coding effort for synchromization.

From here : https://docs.unrealengine.com/lates...ing/ShaderDevelopment/AsyncCompute/index.html
 
Joined
Oct 16, 2013
Messages
41 (0.01/day)
Processor i7 4930k
Motherboard Rampage IV Extreme
Cooling Thermalright HR-02 Macho
Memory 4 X 4096 MB G.Skill DDR3 1866 9-10-9-26
Video Card(s) Gigabyte GV-N780OC-3GD
Storage Crucial M4 128GB, M500 240GB, Samsung HD103SJ 1TB
Display(s) Planar PX2710MW 27" 1920x1080
Case Corsair 500R
Power Supply RAIDMAX RX-1200AE
Software Windows 10 64-bit
I think the title is a little bit misleading. "Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for Ashes of Singularity" would be better :D

And I think it's not surprising if AMD have the upper had on async compute. They just have more "muscles" to do that, especially if the game devs spam the GPU with a lot of graphic and compute task.

As far as I understand, NVIDIA GPUs will still do an async compute, BUT it will be limited to 31 command queue to be effective (a.k.a. not overloading their scheduler) meanwhile AMD can do up to 64 command queue and still be as effective.

NVIDIA = 1 graphic engine, 1 shader engine, with 32 depth command queue (total of 32 queues, 31 maximum usable for graphic/compute mode)
AMD = 1 graphic engine, 8 shader engines (they coined it as an ACE or Asinc Compute Engine), with 8 depth command queue (total of 64 queues)

So if you spam lot of graphics and computes command (on the GPU in a non-sequential way) to an NVIDIA GPU, it will end up overloading its scheduler and then it will do a lot of context switching (from graphic command to compute command and vice versa), this will result in increased latency, hence the increased time for processing. This is what happened in this specific game demo (Ashes of Singularity, AoS), they use our GPU(s) to process the graphic command (to render all of those little space ships thingy) AND also to process the compute command (the AIs for every single space ship thingy), and the more the space ship thingy, the more NVIDIA GPUs will suffer.

And you'll all thinking : "AMD can only win in DX12, async compute rulezz!", well, the fact is we don't know yet. We don't know how most game devs deal with the graphic and compute side of their games, whether they think it would be wise to offload most compute task to our GPUs (so freeing CPU resource a.k.a. removing most of CPU bottleneck) or just let the CPU do the compute tasks (less hassle in coding and especially synchronizing).

Oh and UE4 wrote in their documentation for async compute implementation in their engine :



From here : https://docs.unrealengine.com/lates...ing/ShaderDevelopment/AsyncCompute/index.html

I don't think it's the command queues causing problem. By AMD side, only hawaii, tonga and fiji have 8 ACEs. Older GCNs have 1 or 2 ACEs. If there is a huge amount of command queues that is causing problem, then not only NVIDIA card's but also those GCN cards with fewer ACEs will have a performance downgrade in DX12. But the benchmark result does not support this argument at all.
 
Joined
Jan 2, 2012
Messages
1,079 (0.24/day)
Location
Indonesia
Processor AMD Ryzen 7 5700X
Motherboard ASUS STRIX X570-E
Cooling NOCTUA NH-U12A
Memory G.Skill FlareX 32 GB (4 x 8 GB) DDR4-3200
Video Card(s) ASUS RTX 4070 DUAL
Storage 1 TB WD Black SN850X | 2 TB WD Blue SN570 | 10 TB WD Purple Pro
Display(s) LG 32QP880N 32"
Case Fractal Design Define R5 Black
Power Supply Seasonic Focus Gold 750W
Mouse Pulsar X2
Keyboard KIRA EXS
I don't think it's the command queues causing problem. By AMD side, only hawaii, tonga and fiji have 8 ACEs. Older GCNs have 1 or 2 ACEs. If there is a huge amount of command queues that is causing problem, then not only NVIDIA card's but also those GCN cards with fewer ACEs will have a performance downgrade in DX12. But the benchmark result does not support this argument at all.

I can't find any AoS benchmark result with older GCN cards like Tahiti, Pitcairn, etc.
It would be much appreciated if you can provide one for reading purpose, thank you very much.
 
Joined
Oct 16, 2013
Messages
41 (0.01/day)
Processor i7 4930k
Motherboard Rampage IV Extreme
Cooling Thermalright HR-02 Macho
Memory 4 X 4096 MB G.Skill DDR3 1866 9-10-9-26
Video Card(s) Gigabyte GV-N780OC-3GD
Storage Crucial M4 128GB, M500 240GB, Samsung HD103SJ 1TB
Display(s) Planar PX2710MW 27" 1920x1080
Case Corsair 500R
Power Supply RAIDMAX RX-1200AE
Software Windows 10 64-bit
Joined
Aug 20, 2007
Messages
20,709 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
This. We all know Nvdia has cheated at benchmarks in the past.

so its like this:

Oxide: Look! we got benchmarks!
AMD: oh my we almost beat The Green Meanies
Nvidia: @oxide you cheated on the benchmarks
Oxide: did not. nyah.
Nvidia: disable competitive features so our non-async bluff works right!
Oxide: not gonna happen
Nvidia: F*** you AMD! you're not better than us! we'll fix you with our l33t bluffing skillz
AMD: *poops pants* and hides in a corner eating popcorn.

This is a more accurate summary than I'd like to admit. I always felt AMD would be more partial to eating paste than popcorn however.
 

rtwjunkie

PC Gaming Enthusiast
Supporter
Joined
Jul 25, 2008
Messages
13,909 (2.43/day)
Location
Louisiana -Laissez les bons temps rouler!
System Name Bayou Phantom
Processor Core i7-8700k 4.4Ghz @ 1.18v
Motherboard ASRock Z390 Phantom Gaming 6
Cooling All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax T40F Black CPU cooler
Memory 2x 16GB Mushkin Redline DDR-4 3200
Video Card(s) EVGA RTX 2080 Ti Xc
Storage 1x 500 MX500 SSD; 2x 6TB WD Black; 1x 4TB WD Black; 1x400GB VelRptr; 1x 4TB WD Blue storage (eSATA)
Display(s) HP 27q 27" IPS @ 2560 x 1440
Case Fractal Design Define R4 Black w/Titanium front -windowed
Audio Device(s) Soundblaster Z
Power Supply Seasonic X-850
Mouse Coolermaster Sentinel III (large palm grip!)
Keyboard Logitech G610 Orion mechanical (Cherry Brown switches)
Software Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)
So this is what you understand from the article? That "Ashes" is exclusive to AMD? Please take off the green glasses!

See, now this is the danger in making assumptions about people's hardware preferences when you know nothing about them.

What you don't know is that @RejZoR has been a long time AMD supporter, and only recently got a 980 out of frustration.
 
Last edited:
Joined
Aug 20, 2007
Messages
20,709 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
A while back when it was being said that Kepler and Maxwell were DX12 compliant

They are. Heck frickin' fermi is COMPLIANT.

Compliant being the key word. They aren't FULLY SUPPORTED. Careful with your words there man. ;)

See, now this is the danger in making assumptions about people's hardware preferences when you know nothing about them.

What you don't know is that @RejZoR has been a long time AMD supporter, and only recently got a 980 out of frustration.

Indeed. He did it shortly after the "I'll eat my shoes if AMD make R9 390x GCN 1.1" debacle. He was like our AMD posterboy only he's proven he'll go green if it's required to get a good game. I would not target him for assumptions, if I were you.
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,274 (7.69/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
After all this fuss let's hope Ashes of the Singularity isn't shit.

Oh it is shit. But that's not the point.

I think the title is a little bit misleading. "Lack of Async Compute on Maxwell Makes AMD GCN Better Prepared for Ashes of Singularity" would be better :D


Async compute is a DirectX feature, not an AotS feature.
 
Joined
Feb 18, 2011
Messages
1,259 (0.26/day)
Its about time for the console deal "with no profit in it" to pay back.

Its on NV that they are out of it. Next time they should get more involved in the gaming ecosystem if they want to have a word in its development, even if it doesn't bring them the big bucks immediately.
They tried with the last gen and Sony choose them (sadly they couldn't wait a few months for the g80), this gen would have been way much more expensive without APUs, what Nvidia lacks, so there was no real price-war at all, AMD simply made a bad deal imo.

There bigger implications if either side is up to mischief.

If the initial game engine is being developed under AMD GNC with Async Compute for consoles then we PC gamers are getting even more screwed.

We are already getting a bad port, non-optimize, down-graded from improved API paths. GameWorks middle-ware. driver side emulation, developer who don't care = PC version of Batman Arkham Knight
Please read my two earlier posts, I suggested a slightly different conclusion with those, but to reply to you; We are already screwed because the consoles (the main target platforms for developers) have Jaguar cores. That's why it doesn't really matter what things will be supported from software or hardware, because we will have so many free CPU cycles on the PC, some driver magic won't matter much. I bet even Fermi cards will do just "fine" under dx12 (and yes, Fermi dx12 support is coming probably the end of this year where nvidia will simply implement the missing features from software just how AMD will do with their older architectures).
 
Last edited:
Joined
Mar 24, 2011
Messages
2,356 (0.50/day)
Location
VT
Processor Intel i7-10700k
Motherboard Gigabyte Aurorus Ultra z490
Cooling Corsair H100i RGB
Memory 32GB (4x8GB) Corsair Vengeance DDR4-3200MHz
Video Card(s) MSI Gaming Trio X 3070 LHR
Display(s) ASUS MG278Q / AOC G2590FX
Case Corsair X4000 iCue
Audio Device(s) Onboard
Power Supply Corsair RM650x 650W Fully Modular
Software Windows 10
Is anyone surprised AMD's Architecture does better at a Compute-centric task? They have been championing Compute for the better part of the past 5 years, as Nvidia was shedding it until the technology was actually relevant. I think this is a good indicator that Pascal is going to usher in the return of a lot of Compute functionality.
 
Joined
Oct 2, 2004
Messages
13,791 (1.94/day)
So this is what you understand from the article? That "Ashes" is exclusive to AMD? Please take off the green glasses!

Ahahaha, did this dude just flag me as NVIDIA fanboy? Kiddo, you are hilarious. For last 10 years, I've owned nothing but ATi/AMD graphic cards. This is the first NVIDIA after all those years. Nice try. XD

And I never said it's "exclusive". All I said is that this very specific game has been developed with cooperation with AMD basically since day one using Mantle. And when that dropped in the water, it's DX12. No one says Project Cars is NVIDIA exclusive, but we all know it has been developed with NVIDIA since day one basically and surprise surprise, it runs on NVIDIA far better than on any AMD card. Wanna calls omeone AMD fanboy for that one? 1 single game doesn't reflect perfrmance in ALL games.
 
Joined
Sep 25, 2007
Messages
5,965 (0.99/day)
Location
New York
Processor AMD Ryzen 9 5950x, Ryzen 9 5980HX
Motherboard MSI X570 Tomahawk
Cooling Be Quiet Dark Rock Pro 4(With Noctua Fans)
Memory 32Gb Crucial 3600 Ballistix
Video Card(s) Gigabyte RTX 3080, Asus 6800M
Storage Adata SX8200 1TB NVME/WD Black 1TB NVME
Display(s) Dell 27 Inch 165Hz
Case Phanteks P500A
Audio Device(s) IFI Zen Dac/JDS Labs Atom+/SMSL Amp+Rivers Audio
Power Supply Corsair RM850x
Mouse Logitech G502 SE Hero
Keyboard Corsair K70 RGB Mk.2
VR HMD Samsung Odyssey Plus
Software Windows 10
not really surprised, but its starting to look like nvidia has not made any significant strides in async compute since kepler(async only with compute operations only, but not with compute+graphics)
 

the54thvoid

Intoxicated Moderator
Staff member
Joined
Dec 14, 2009
Messages
12,378 (2.37/day)
Location
Glasgow - home of formal profanity
Processor Ryzen 7800X3D
Motherboard MSI MAG Mortar B650 (wifi)
Cooling be quiet! Dark Rock Pro 4
Memory 32GB Kingston Fury
Video Card(s) Gainward RTX4070ti
Storage Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Asus Prime AP201
Audio Device(s) On Board
Power Supply be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software W10
Here's a balanced opinion.

AMD have focused on Mantle to get better hardware level implementation to suit their GCN1.1+ architecture. From this they have set some fire under MS and got DX12 to be closer to the metal. This focus has left Nvidia to keep on top of things at DX11 level.
Following Kepler, Nvidia have focused on efficiency and performance and Maxwell has brought them that in spades with DX11. Nvidia have effectively taken the opposite gamble of AMD. Nvidia has stuck with DX11 focus and AMD has forged on toward DX12.

So far so neutral.

They have both gambled and they will both win and lose. AMD have gambled DX12 adoption will be rapid and that will allow their GCN1.1+ to provide a massive performance increase and quite likely surpass Maxwell architecture designs. Even possibly in best case scenarios with rebranded Hawaii matching top level Maxwell (bravo AMD). Nvidia have likely thought that DX12 implementation will not occur rapidly enough until 2016, therefore they have settled with the Maxwell DX11 performance efficiency. Nvidia for their part have probably 'fiddled' to pretend they have most awesome DX12 support when in reality it;s a driver thing (as AoS apparently shows).

So, if DX12 implementation is slow, Nvidia gamble pays off. If DX12 uptake is rapid and occurs before Pascal, Nvidia lose (and will most definitely cheat with massive developer pressure and incentive). If DX12 comes through in bits and bobs, it will come down to what games you play (as always). However, as a gamer, I'm not upgrading to W10 until MS patches the 'big brother' updating mechanisms I keep reading about.

TL.DR? = Like everyone has been saying - despite AMD's GCN advantage, without a slew of top AAA titles, the hardware is irrelevant. If DX11 games are still being pumped out, GCN wont help. If DX12 comes earlier, AMD win.
 
Joined
Oct 23, 2004
Messages
101 (0.01/day)
Location
Perth, Western Australia
System Name THOR
Processor Core i7 3820 @ 4.3GHz
Motherboard Asus Rampage Formula IV
Cooling Corsair H80 + 5x12cm Case Fans
Memory 16GB G.Skill 2400MHz
Video Card(s) GTX 980 + GTX 570 + GTX 680 SLI
Storage OCZ Vectror 256GB + Vertex IV 256GB + Intel 520 128GB + 1TB WD Caviar Black
Display(s) 27" Dell U2711 + 24" Dell U2410 + 24" BenQ FP241W
Case Cooler Master Cosmos II Ultra
Audio Device(s) SupremeFX III
Power Supply Corsair AX1200 1200w
Software Windows 10 Pro 64bit
From what I have read Maxwell is capable of Async compute (and Async Shaders), and is actually faster when it can stay within its work order limit (1+31 queues).

The GTX 980 Ti is twice as the Fury X but only when it is under 31 simultaneous command lists.

The GTX 980 Ti performed roughly equal to the Fury X at up to 128 command lists.

This is why we need to wait for more games to be released before we jump to conclusions.
 
Joined
Nov 3, 2011
Messages
690 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H115i Elite Capellix XT
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB and Toshiba N300 NAS 10TB HDD
Display(s) 2X LG 27UL600 27in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
From what I have read Maxwell is capable of Async compute (and Async Shaders), and is actually faster when it can stay within its work order limit (1+31 queues).

The GTX 980 Ti is twice as the Fury X but only when it is under 31 simultaneous command lists.

The GTX 980 Ti performed roughly equal to the Fury X at up to 128 command lists.

This is why we need to wait for more games to be released before we jump to conclusions.
That Beyond3D benchmark is pretty simple and designed to be in-order.

Maxwellv2 is not capable of concurrent async + rendering without incurring context penalties and it's under this context that Oxdie made it's remarks.

AMD's ACE units are designed to run concurrently with rendering without context penalties and includes out-of-order features.

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-10

From sebbbi:
The latency doesn't matter if you are using GPU compute (including async) for rendering. You should not copy the results back to CPU or wait for the GPU on CPU side. Discrete GPUs are far away from the CPU. You should not expect to see low latency. Discrete GPUs are not good for tightly interleaved mixed CPU->GPU->CPU work.

To see realistic results, you should benchmark async compute in rendering tasks. For example render a shadow map while you run a tiled lighting compute shader concurrently (for the previous frame). Output the result to display instread of waiting compute to finish on CPU. For result timing, use GPU timestamps, do not use a CPU timer. CPU side timing of GPU results in lots of noise and even false results because of driver related buffering.
---------------------

AMD APU would be king for tightly interleaved mixed CPU->GPU->CPU work e.g. PS4's APU was designed for this kind of work.

PS4 sports the same 8 ACE units as Tonga, Hawaii and Fury.


They tried with the last gen and Sony choose them (sadly they couldn't wait a few months for the g80), this gen would have been way much more expensive without APUs, what Nvidia lacks, so there was no real price-war at all, AMD simply made a bad deal imo.

Please read my two earlier posts, I suggested a slightly different conclusion with those, but to reply to you; We are already screwed because the consoles (the main target platforms for developers) have Jaguar cores. That's why it doesn't really matter what things will be supported from software or hardware, because we will have so many free CPU cycles on the PC, some driver magic won't matter much. I bet even Fermi cards will do just "fine" under dx12 (and yes, Fermi dx12 support is coming probably the end of this year where nvidia will simply implement the missing features from software just how AMD will do with their older architectures).
XBO is the baseline DirectX12 GPU and it has two ACE units with 8 queues per unit as per Radeon HD 7790 (GCN 1.1).

The older GCN 1.0 still has two ACE units with 2 queues per unit but it's less capable than GCN 1.1.

GCN 1.0 such as 7970/R9-280X is still better than Fermi and Kelper in concurrent Async+Render category.

Ahahaha, did this dude just flag me as NVIDIA fanboy? Kiddo, you are hilarious. For last 10 years, I've owned nothing but ATi/AMD graphic cards. This is the first NVIDIA after all those years. Nice try. XD

And I never said it's "exclusive". All I said is that this very specific game has been developed with cooperation with AMD basically since day one using Mantle. And when that dropped in the water, it's DX12. No one says Project Cars is NVIDIA exclusive, but we all know it has been developed with NVIDIA since day one basically and surprise surprise, it runs on NVIDIA far better than on any AMD card. Wanna calls omeone AMD fanboy for that one? 1 single game doesn't reflect perfrmance in ALL games.
With Project Cars, AMD's lower DX11 draw call limit is the problem.

Read SMS PC Lead's comment on this issue from
http://forums.guru3d.com/showpost.php?p=5116716&postcount=901

For our mix of DX11 API calls, the API call consumption rate of the AMD driver is the bottleneck.

In Project Cars the range of draw calls per frame varies from around 5-6000 with everything at low up-to 12-13000 with everything at Ultra. Depending on the single threaded performance of your CPU there will be a limit of the amount of draw calls that can be consumed and as I mentioned above, once that is exceeded GPU usage starts to reduce. On AMD/Windows 10 this threshold is much higher which is why you can run with higher settings without FPS loss.

I also mentioned about 'gaps' in the GPU timeline caused by not being able to feed the GPU fast enough - these gaps are why increasing resolution (like to 4k in the Anandtech analysis) make for a better comparison between GPU vendors... In 4k, the GPU is being given more work to do and either the gaps get filled by the extra work and are smaller.. or the extra work means the GPU is now always running behind the CPU submission rate.

So, on my i7-5960k@3.0ghz the NVIDIA (Titan X) driver can consume around 11,000 draw-calls with our DX11 API call mix - the same Windows 7 System with a 290x and the AMD driver is CPU limited at around 7000 draw-calls : On Windows 10 AMD is somewhere around 8500 draw-calls before the limit is reached (I can't be exact since my Windows 10 box runs on a 3.5ghz 6Core i7)

In Patch 2.5 (next week) I did a pass to reduce small draw-calls when using the Ultra settings, as a concession to help driver thread limitations. It gains around 8% for NVIDIA and about 15% (minimum) for AMD.
...

For Project Cars the 1040 driver is easily the fastest under Windows 10 at the moment - but my focus at the moment is on the fairly large engineering task of implementing DX12 support...


----------------------------------------


Project Cars with DX12 is coming.
 
Last edited:
Joined
Oct 23, 2004
Messages
101 (0.01/day)
Location
Perth, Western Australia
System Name THOR
Processor Core i7 3820 @ 4.3GHz
Motherboard Asus Rampage Formula IV
Cooling Corsair H80 + 5x12cm Case Fans
Memory 16GB G.Skill 2400MHz
Video Card(s) GTX 980 + GTX 570 + GTX 680 SLI
Storage OCZ Vectror 256GB + Vertex IV 256GB + Intel 520 128GB + 1TB WD Caviar Black
Display(s) 27" Dell U2711 + 24" Dell U2410 + 24" BenQ FP241W
Case Cooler Master Cosmos II Ultra
Audio Device(s) SupremeFX III
Power Supply Corsair AX1200 1200w
Software Windows 10 Pro 64bit
Maxwellv2 is not capable of concurrent async + rendering without incurring context penalties.

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-10

From sebbbi:
The latency doesn't matter if you are using GPU compute (including async) for rendering. You should not copy the results back to CPU or wait for the GPU on CPU side. Discrete GPUs are far away from the CPU. You should not expect to see low latency. Discrete GPUs are not good for tightly interleaved mixed CPU->GPU->CPU work.

To see realistic results, you should benchmark async compute in rendering tasks. For example render a shadow map while you run a tiled lighting compute shader concurrently (for the previous frame). Output the result to display instread of waiting compute to finish on CPU. For result timing, use GPU timestamps, do not use a CPU timer. CPU side timing of GPU results in lots of noise and even false results because of driver related buffering.
---------------------

AMD APU would be king for tightly interleaved mixed CPU->GPU->CPU work.

https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/
 
Top