• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Futuremark Releases 3DMark Time Spy DirectX 12 Benchmark

Joined
Sep 25, 2012
Messages
2,074 (0.49/day)
Location
Jacksonhole Florida
System Name DEVIL'S ABYSS
Processor i7-4790K@4.6 GHz
Motherboard Asus Z97-Deluxe
Cooling Corsair H110 (2 x 140mm)(3 x 140mm case fans)
Memory 16GB Adata XPG V2 2400MHz
Video Card(s) EVGA 780 Ti Classified
Storage Intel 750 Series 400GB (AIC), Plextor M6e 256GB (M.2), 13 TB storage
Display(s) Crossover 27QW (27"@ 2560x1440)
Case Corsair Obsidian 750D Airflow
Audio Device(s) Realtek ALC1150
Power Supply Cooler Master V1000
Mouse Ttsports Talon Blu
Keyboard Logitech G510
Software Windows 10 Pro x64 version 1803
Benchmark Scores Passmark CPU score = 13080
A middlin' result for my 2-3 year old system (i7-4790K/GTX 780 Ti)
http://www.3dmark.com/spy/38286
I need a GPU upgrade - seriously considering 980 Ti, prices are around $400 even for hybrid water-cooled models. And they're available right now, unlike the new cards. If I wait a few months, they'll get even lower...
 
Joined
Nov 3, 2013
Messages
2,141 (0.56/day)
Location
Serbia
Processor Ryzen 5600
Motherboard X570 I Aorus Pro
Cooling Deepcool AG400
Memory HyperX Fury 2 x 8GB 3200 CL16
Video Card(s) RX 6700 10GB SWFT 309
Storage SX8200 Pro 512 / NV2 512
Display(s) 24G2U
Case NR200P
Power Supply Ion SFX 650
Mouse G703 (TTC Gold 60M)
Keyboard Keychron V1 (Akko Matcha Green) / Apex m500 (Gateron milky yellow)
Software W10
A middlin' result for my 2-3 year old system (i7-4790K/GTX 780 Ti)
http://www.3dmark.com/spy/38286
I need a GPU upgrade - seriously considering 980 Ti, prices are around $400 even for hybrid water-cooled models. And they're available right now, unlike the new cards. If I wait a few months, they'll get even lower...
I would rather go for an air cooled 1070 than any kind of 980ti. If I had to choose between 2 NV cards
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.94/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
It's correct that AMD's architecture is wastly different (in terms of queues and scheduling) compared to Nvidia's. But the reason why AMD may draw larger benefits from async shaders is because their scheduler is unable to saturate their huge count of cores. If you compare GTX 980 Ti to Fury X we are talking about:
GTX 980 Ti: 2816 cores, 5632 GFlop/s
Fury X: 4096 cores, 8602 GFlop/s
(The relation is similar with other comparable products with AMD vs. Nvidia)
Nvidia is getting the same performance from far fewer resources using a way more advanced scheduler. In many cases their scheduler has more than 95% computational utilization, and since the primary purpose of async shaders is to utilize idle resources for different tasks, there is really very little to use for compute (which mostly utilizes the same resources as rendering). Multiple queues is not overhead free either, so in order for it to have any purpose there have to be a significant performance gain. This is basically why AofT gave up on Nvidia hardware and just disabled the feature, and their game was fine tuned for AMD in the first place.

It has very little to do with Direct3D 11 vs 12.


This benchmark proves Nvidia can utilize async shaders, ending the lie about lacking hardware features once and for all.

AMD is drawing larger benefits because they have more idle resources. Remember e.g. Fury X has 53% more Flop/s than 980 Ti, so there is a lot to use.

This benchmark also ends the myth that Nvidia is less fit for Direct3D 12.
Not that I disagree but, a big difference between nVidia and AMD that isn't mentioned often is the size of each CU/SM in terms of shader count and count themselves as a result. nVidia's SMs have a lot more shaders each whereas AMD tends to have more CUs with fewer shaders each. It's the same thing they did with their CPUs, they sacrificed some serial throughput to gain more parallel throughput. On top of that, nVidia's GPUs are clocked higher so, if parallel throughput isn't the rendering bottleneck, it comes down to how quick each CU/SM is with any given workload, which will favor nVidia thanks to the beefier SMs and higher clocks.
 
Joined
Nov 9, 2008
Messages
2,318 (0.41/day)
Location
Texas
System Name Mr. Reliable
Processor Ryzen R9 5950x
Motherboard MSI Meg X570s Ace Max
Cooling D5 Pump, Singularity Top/Res, 2x360mm EK P rads, EK Magnitude/Alphacool Blocks
Memory 32Gb (4x8Gb) Corsair Dominator Platinum 3600Mhz @ 16/19/20/36 1.35v
Video Card(s) MSI 3080ti with Alphacool Block
Storage 2 x Corsair Force MP400 1TB Nvme; 2 x T-Force Cardea Z340; 2 x Mushkin Reactor 1TB
Display(s) Acer 32" Z321QU 2560x1440; LG 34GP83A-B 34" 3440x1440
Case Lian Li PC-011 Dynamic XL; Synology DS218j w/ 2 x 2TB WD Red
Audio Device(s) SteelSeries Arctis Pro+
Power Supply EVGA SuperNova 850G3
Mouse Razer Basilisk V2
Keyboard Das Keyboard 6; Razer Orbweaver Chroma
Software Windows 10 Pro
"In the case of async compute, Futuremark is using it to overlap rendering passes, though they do note that 'the asynchronous compute workload per frame varies between 10-20%.' " Source: http://www.anandtech.com/show/10486/futuremark-releases-3dmark-time-spy-directx12-benchmark

It seems no one has noticed this. AMD cards are not shining like they did in the Vulkan Doom patch, because TimeSpy has very limited use of Async workloads. Nvidia cards show less gain than the AMD cards, and that is with very limited usage. Take the use of Async workloads up to 60-70% per frame and the AMD cards would have dramatic increases, just like in the Vulkan and AotS demos.

Correct me if I am misinterpreting the quote, but in my opinion it appears to me this is why AMD cards are not showing the same dramatic increase we are seeing elsewhere using Async.

JAT
 
Joined
Dec 22, 2011
Messages
3,890 (0.86/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
"In the case of async compute, Futuremark is using it to overlap rendering passes, though they do note that 'the asynchronous compute workload per frame varies between 10-20%.' " Source: http://www.anandtech.com/show/10486/futuremark-releases-3dmark-time-spy-directx12-benchmark

It seems no one has noticed this. AMD cards are not shining like they did in the Vulkan Doom patch, because TimeSpy has very limited use of Async workloads. Nvidia cards show less gain than the AMD cards, and that is with very limited usage. Take the use of Async workloads up to 60-70% per frame and the AMD cards would have dramatic increases, just like in the Vulkan and AotS demos.

Correct me if I am misinterpreting the quote, but in my opinion it appears to me this is why AMD cards are not showing the same dramatic increase we are seeing elsewhere using Async.

JAT

No I say it's in fact the opposite, async compute is used heavily in Time Spy, it's worth reading the nicely detailed technical guide:

http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf

It's also interesting is it uses FL 11_0 for maximum compatibility.
 
Joined
Nov 9, 2008
Messages
2,318 (0.41/day)
Location
Texas
System Name Mr. Reliable
Processor Ryzen R9 5950x
Motherboard MSI Meg X570s Ace Max
Cooling D5 Pump, Singularity Top/Res, 2x360mm EK P rads, EK Magnitude/Alphacool Blocks
Memory 32Gb (4x8Gb) Corsair Dominator Platinum 3600Mhz @ 16/19/20/36 1.35v
Video Card(s) MSI 3080ti with Alphacool Block
Storage 2 x Corsair Force MP400 1TB Nvme; 2 x T-Force Cardea Z340; 2 x Mushkin Reactor 1TB
Display(s) Acer 32" Z321QU 2560x1440; LG 34GP83A-B 34" 3440x1440
Case Lian Li PC-011 Dynamic XL; Synology DS218j w/ 2 x 2TB WD Red
Audio Device(s) SteelSeries Arctis Pro+
Power Supply EVGA SuperNova 850G3
Mouse Razer Basilisk V2
Keyboard Das Keyboard 6; Razer Orbweaver Chroma
Software Windows 10 Pro
No I say it's in fact the opposite, async compute is used heavily in Time Spy, it's worth reading the nicely detailed technical guide:

http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf

It's also interesting is it uses FL 11_0 for maximum compatibility.

I have read the tech guide, but still do not understand how this is considered "heavy usage". How can a 10-20% async workload be considered "heavy use"?

Please note I am not being argumentative, and will happily conceded if it is being "heavily used", but I would like someone to explain how 10-20% workload is considered "heavy". I would assume, like most things, even to be considered "regular" usage would be around 50%.

JAT
 
Last edited:
Joined
Dec 22, 2011
Messages
3,890 (0.86/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
I have read the tech guide, but still do not understand how this is considered "heavy usage". How can a 10-20% async workload be considered "heavy use"?

Please note I am not being argumentative, and will happily conceded if it is being "heavily used", but I would like someone to explain how 10-20% workload is considered "heavy". I would assume, like most things, even to be considered "regular" usage would be around 50%.

JAT

Well as it shows a large part of the scene illumination and in turn things like the ambient occlusion are all down asynchronously:

For example:

Before the main illumination passes, asynchronous compute shaders are used to cull lights, evaluate illumination from prebaked environment reflections, compute screen-space ambient occlusion, and calculate unshadowed surface illumination. These tasks are started right after G-buffer rendering has finished and are executed alongside shadow rendering.

And other stuff like particles:

Particles are simulated on the GPU using asynchronous compute queue. Simulation work is submitted to the asynchronous queue while G-buffer and shadow map rendering commands are submitted to the main command queue.

Asynchronous compute is therefore fundamental to how the scene is generated and in turn rendered.

The workload is then clearly very high as shown here:


So yeah, basically it's pretty fundamental to the test.
 
Joined
Nov 9, 2008
Messages
2,318 (0.41/day)
Location
Texas
System Name Mr. Reliable
Processor Ryzen R9 5950x
Motherboard MSI Meg X570s Ace Max
Cooling D5 Pump, Singularity Top/Res, 2x360mm EK P rads, EK Magnitude/Alphacool Blocks
Memory 32Gb (4x8Gb) Corsair Dominator Platinum 3600Mhz @ 16/19/20/36 1.35v
Video Card(s) MSI 3080ti with Alphacool Block
Storage 2 x Corsair Force MP400 1TB Nvme; 2 x T-Force Cardea Z340; 2 x Mushkin Reactor 1TB
Display(s) Acer 32" Z321QU 2560x1440; LG 34GP83A-B 34" 3440x1440
Case Lian Li PC-011 Dynamic XL; Synology DS218j w/ 2 x 2TB WD Red
Audio Device(s) SteelSeries Arctis Pro+
Power Supply EVGA SuperNova 850G3
Mouse Razer Basilisk V2
Keyboard Das Keyboard 6; Razer Orbweaver Chroma
Software Windows 10 Pro
Well as it shows a large part of the scene illumination and in turn things like the ambient occlusion are all down asynchronously:

For example:



And other stuff like particles:



Asynchronous compute is therefore fundamental to how the scene is generated and in turn rendered.

The workload is then clearly very high as shown here:


So yeah, basically it's pretty fundamental to the test.

I understand how async works, and what it is being used for. I concede it us being USED, but it appears it is being very UNDER utilized.

Please explain how a 10-20% async workload is "heavy use". That's seems like a really low workload usage, statistically.
 
Joined
Dec 22, 2011
Messages
3,890 (0.86/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
I understand how async works, and what it is being used for. I concede it us being USED, but it appears it is being very UNDER utilized.

Please explain how a 10-20% async workload is "heavy use". That's seems like a really low workload usage, statistically.

The crossover between async and all the other tasks the GPU deals with is 10-20% per frame, the GPU has other things to deal with you know. ;)

I get that async is the new buzzword word that people cling too, but why do you think it should be 60%+? Clearly workloads can vary from app to app, but what specific compute tasks do you think this benchmark doesn't address?

Are you suggesting this test isn't stressful for modern GPU's?
 
Joined
Nov 9, 2008
Messages
2,318 (0.41/day)
Location
Texas
System Name Mr. Reliable
Processor Ryzen R9 5950x
Motherboard MSI Meg X570s Ace Max
Cooling D5 Pump, Singularity Top/Res, 2x360mm EK P rads, EK Magnitude/Alphacool Blocks
Memory 32Gb (4x8Gb) Corsair Dominator Platinum 3600Mhz @ 16/19/20/36 1.35v
Video Card(s) MSI 3080ti with Alphacool Block
Storage 2 x Corsair Force MP400 1TB Nvme; 2 x T-Force Cardea Z340; 2 x Mushkin Reactor 1TB
Display(s) Acer 32" Z321QU 2560x1440; LG 34GP83A-B 34" 3440x1440
Case Lian Li PC-011 Dynamic XL; Synology DS218j w/ 2 x 2TB WD Red
Audio Device(s) SteelSeries Arctis Pro+
Power Supply EVGA SuperNova 850G3
Mouse Razer Basilisk V2
Keyboard Das Keyboard 6; Razer Orbweaver Chroma
Software Windows 10 Pro
The crossover between async and all the other tasks the GPU deals with is 10-20% per frame, the GPU has other things to deal with you know. ;)

I get that async is the new buzzword word that people cling too, but why do you think it should be 60%+? Clearly workloads can vary from app to app, but what specific compute tasks do you think this benchmark doesn't address?

Are you suggesting this test isn't stressful for modern GPU's?

I understand that the GPU is busy with other tasks as well, however even Anandtech implies the low usage: "In the case of async compute, Futuremark is using it to overlap rendering passes, though they do note that 'the asynchronous compute workload per frame varies between 10-20%."

Particles are simulated on the GPU using asynchronous compute queue. Simulation work is submitted to the asynchronous queue while G-buffer and shadow map rendering commands are submitted to the main command queue.

Doesn't this state that it is not being fully utilized? Gbuffer and shadow map rendering commands are able to be asynchronously executed, but are being submitted to the main command queue, and are not being done asynchronously...why?

I am not remotely suggesting that it is not stressful on modern GPUs, but are you saying that 80-90% of all the Compute Units of the GPU are being used 100% of the time during the benchmark, leaving only 10-20% for compute and copy commands over what is being used for 3D rendering commands? I do not believe that is accurate. It simply appears that async commands to the compute units are being under utilized and being limited to particular instructions.

Like I said, maybe I am misinterpreting, but I haven't seen anything showing the contrary. I'm just hoping someone with more knowledge than me can explain it to me.
 
Last edited:
Joined
Dec 22, 2011
Messages
3,890 (0.86/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
I get where you're coming from, but where is the evidence it's too low and relative to what exactly?

Like i said workloads can vary drastically on an app by app basis, there doesn't have to be a right or wrong way, what matters is there is another baseline to compare, Futuremark after all claim most of the major parties had input into it's development, and I'd put more credence on them than some random game dev known to have pimped one brand or another in the past.
 
Joined
Nov 9, 2008
Messages
2,318 (0.41/day)
Location
Texas
System Name Mr. Reliable
Processor Ryzen R9 5950x
Motherboard MSI Meg X570s Ace Max
Cooling D5 Pump, Singularity Top/Res, 2x360mm EK P rads, EK Magnitude/Alphacool Blocks
Memory 32Gb (4x8Gb) Corsair Dominator Platinum 3600Mhz @ 16/19/20/36 1.35v
Video Card(s) MSI 3080ti with Alphacool Block
Storage 2 x Corsair Force MP400 1TB Nvme; 2 x T-Force Cardea Z340; 2 x Mushkin Reactor 1TB
Display(s) Acer 32" Z321QU 2560x1440; LG 34GP83A-B 34" 3440x1440
Case Lian Li PC-011 Dynamic XL; Synology DS218j w/ 2 x 2TB WD Red
Audio Device(s) SteelSeries Arctis Pro+
Power Supply EVGA SuperNova 850G3
Mouse Razer Basilisk V2
Keyboard Das Keyboard 6; Razer Orbweaver Chroma
Software Windows 10 Pro
I get where you're coming from, but where is the evidence it's too low and relative to what exactly?

Like i said workloads can vary drastically on an app by app basis, there doesn't have to be a right or wrong way, what matters is there is another baseline to compare, Futuremark after all claim most of the major parties had input into it's development, and I'd put more credence on them than some random game dev known to have pimped one brand or another in the past.

Hahaha. Truth! I'm stoked about the new bench, (scored 6551 woot!), I'm just looking for clarity on how this whole async thing works. Thanks for a little education.
 
Joined
Jul 13, 2016
Messages
2,840 (1.00/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
I get where you're coming from, but where is the evidence it's too low and relative to what exactly?

Like i said workloads can vary drastically on an app by app basis, there doesn't have to be a right or wrong way, what matters is there is another baseline to compare, Futuremark after all claim most of the major parties had input into it's development, and I'd put more credence on them than some random game dev known to have pimped one brand or another in the past.

I trust Futuremark's claims about as much as the project cars devs. They have been involved in benchmark fixing in the past using Intel Compilers.

Their new benchmark doesn't use Async compute in many scenarios where it should be universally usable in any game. My guess is the "input" they received from Nvidia was to do as little with Async as possible as Nvidia cards only support Async through drivers.

We know that proper use of Async yeilds a large advantage for AMD cards. Every game that has utilized it correctly has shown so.
 
Joined
Dec 22, 2011
Messages
3,890 (0.86/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
I trust Futuremark's claims about as much as the project cars devs. They have been involved in benchmark fixing in the past using Intel Compilers.

Their new benchmark doesn't use Async compute in many scenarios where it should be universally usable in any game. My guess is the "input" they received from Nvidia was to do as little with Async as possible as Nvidia cards only support Async through drivers.

We know that proper use of Async yeilds a large advantage for AMD cards. Every game that has utilized it correctly has shown so.

Cool story bro.
 
Joined
Sep 15, 2011
Messages
6,471 (1.40/day)
Processor Intel® Core™ i7-13700K
Motherboard Gigabyte Z790 Aorus Elite AX
Cooling Noctua NH-D15
Memory 32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s) ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage 2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s) Acer Predator X34 3440x1440@100Hz G-Sync
Case NZXT PHANTOM410-BK
Audio Device(s) Creative X-Fi Titanium PCIe
Power Supply Corsair 850W
Mouse Logitech Hero G502 SE
Software Windows 11 Pro - 64bit
Benchmark Scores 30FPS in NFS:Rivals
A middlin' result for my 2-3 year old system (i7-4790K/GTX 780 Ti)
http://www.3dmark.com/spy/38286
I need a GPU upgrade - seriously considering 980 Ti, prices are around $400 even for hybrid water-cooled models. And they're available right now, unlike the new cards. If I wait a few months, they'll get even lower...
The demo was a slide show for me on 3440x1440, but the score same as yours, 3576. I dont think is that bad...
 
Joined
Nov 9, 2008
Messages
2,318 (0.41/day)
Location
Texas
System Name Mr. Reliable
Processor Ryzen R9 5950x
Motherboard MSI Meg X570s Ace Max
Cooling D5 Pump, Singularity Top/Res, 2x360mm EK P rads, EK Magnitude/Alphacool Blocks
Memory 32Gb (4x8Gb) Corsair Dominator Platinum 3600Mhz @ 16/19/20/36 1.35v
Video Card(s) MSI 3080ti with Alphacool Block
Storage 2 x Corsair Force MP400 1TB Nvme; 2 x T-Force Cardea Z340; 2 x Mushkin Reactor 1TB
Display(s) Acer 32" Z321QU 2560x1440; LG 34GP83A-B 34" 3440x1440
Case Lian Li PC-011 Dynamic XL; Synology DS218j w/ 2 x 2TB WD Red
Audio Device(s) SteelSeries Arctis Pro+
Power Supply EVGA SuperNova 850G3
Mouse Razer Basilisk V2
Keyboard Das Keyboard 6; Razer Orbweaver Chroma
Software Windows 10 Pro
The demo was a slide show for me on 3440x1440, but the score same as yours, 3576. I dont think is that bad...

Demo was very choppy for me at 3440x1440 as well, but scores were good.
 
Joined
Jun 10, 2014
Messages
2,902 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
I think NV uses pre-emptive through drivers, AMD uses it through hardware.
As Rejzor stated, NV is doing it with brute force. As long as they can ofc its fine. Anand should have shown Maxwell with async on/off for comparison.
Async shaders is a feature of CUDA, and is now also used by a Direct3D 12 benchmark, proving beyond any doubt that it's supported in hardware. Async shaders has been supported since Kepler with very limited support, greatly improved on Maxwell and refined in Pascal. It's a core feature of the architectures, anyone who has read the white papers would know that.
 
Joined
Jul 16, 2016
Messages
274 (0.10/day)
Location
Rochester, NY
System Name Xbox Series S
Processor AMD Zen2 8 core 3.6 GHz
Memory 10GB GDDR6
Video Card(s) RDNA2 with 20 CUs
Storage 512Gb SSD NVMe Internal + 8TB WD Black USB External
Display(s) Acer VG270U P 2k
Joined
Jul 16, 2016
Messages
25 (0.01/day)
This benchmark uses async and GTX 1080, GTX 1070 and GTX Titan X outperform everything from the red camp according to Guru3D.

Tweaktown and few other sources all have fury x beating Titan X.
http://www.tweaktown.com/articles/7785/3dmark-time-spy-dx12-benchmarking-masses/index3.html

Maybe the first time we see Nvidia cards gaining something from Async. Futuremark will have to give a few explanations to the world, if in a year from now, their benchmark is the only thing that shows gains in Pascal cards.
On the other hand this is good news. If that dynamic load balancing that Nvidia cooked there, works, It means that developers will have NO excuse to not use async in their games, which will mean at least a 5-10% better performance in ALL future titles.


Nvidia gained a sizable performance in Doom's Vulkan api benches as well particularly for 1070 and 1080 pascal cards. That's even without nvidia's software version of "async compute".

It's just that maxwell don't really do async compute in any meaningful fashion. (which nVidia said they could improve with a driver update, about 4 months ago).

Also shows how much more superior Vulkan is compare to DX12. Too bad I doubt 3dMark would make a Vulkan api version of time spy.
 

Attachments

  • 7785_514_3dmark-time-spy-dx12-benchmarking-masses.png
    7785_514_3dmark-time-spy-dx12-benchmarking-masses.png
    1.7 MB · Views: 385
Last edited by a moderator:
Joined
Nov 3, 2013
Messages
2,141 (0.56/day)
Location
Serbia
Processor Ryzen 5600
Motherboard X570 I Aorus Pro
Cooling Deepcool AG400
Memory HyperX Fury 2 x 8GB 3200 CL16
Video Card(s) RX 6700 10GB SWFT 309
Storage SX8200 Pro 512 / NV2 512
Display(s) 24G2U
Case NR200P
Power Supply Ion SFX 650
Mouse G703 (TTC Gold 60M)
Keyboard Keychron V1 (Akko Matcha Green) / Apex m500 (Gateron milky yellow)
Software W10
Async shaders is a feature of CUDA, and is now also used by a Direct3D 12 benchmark, proving beyond any doubt that it's supported in hardware. Async shaders has been supported since Kepler with very limited support, greatly improved on Maxwell and refined in Pascal. It's a core feature of the architectures, anyone who has read the white papers would know that.
Kepler... Maxwell??
Maxwell cards gain 0.1% performance increase with async on.
Core feature... Yeah right, dont make me laugh
 
Joined
Jun 10, 2014
Messages
2,902 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Kepler... Maxwell??
Maxwell cards gain 0.1% performance increase with async on.
Core feature... Yeah right, dont make me laugh

Since some of you still don't understand the basics, I'm saying this once again:
- The primary purpose of async shaders is to utilize different resources for different purposes simultaneously.
- Rendering and compute does primarily utilize the exact same resources, so an already saturated GPU will only show minor gains.
- The fact that Radeon 200/300/RX400 series shows gains from utilizing the same resources for different tasks is proof that their GPUs are underutilized (which is confirmed by their low performance per GFlop). So it's a problem of their own making, which they have found a way for the game developers to "partially solve". It's a testament to their own inferior architecture, not to Nvidia's "lack of features".


All of this should be obvious. But when you guys can't even be bothered to get the basic understanding of the GPU architectures before you fill the forums with this trash, you have clearly prove yourself unqualified for a technical discussion.
 
Joined
Nov 3, 2013
Messages
2,141 (0.56/day)
Location
Serbia
Processor Ryzen 5600
Motherboard X570 I Aorus Pro
Cooling Deepcool AG400
Memory HyperX Fury 2 x 8GB 3200 CL16
Video Card(s) RX 6700 10GB SWFT 309
Storage SX8200 Pro 512 / NV2 512
Display(s) 24G2U
Case NR200P
Power Supply Ion SFX 650
Mouse G703 (TTC Gold 60M)
Keyboard Keychron V1 (Akko Matcha Green) / Apex m500 (Gateron milky yellow)
Software W10
Digital Foundry: Will we see async compute in the PC version via Vulkan?

Billy Khan: Yes, async compute will be extensively used on the PC Vulkan version running on AMD hardware. Vulkan allows us to finally code much more to the ;metal'. The thick driver layer is eliminated with Vulkan, which will give significant performance improvements that were not achievable on OpenGL or DX.

http://www.eurogamer.net/articles/digitalfoundry-2016-doom-tech-interview
 
Joined
Nov 9, 2008
Messages
2,318 (0.41/day)
Location
Texas
System Name Mr. Reliable
Processor Ryzen R9 5950x
Motherboard MSI Meg X570s Ace Max
Cooling D5 Pump, Singularity Top/Res, 2x360mm EK P rads, EK Magnitude/Alphacool Blocks
Memory 32Gb (4x8Gb) Corsair Dominator Platinum 3600Mhz @ 16/19/20/36 1.35v
Video Card(s) MSI 3080ti with Alphacool Block
Storage 2 x Corsair Force MP400 1TB Nvme; 2 x T-Force Cardea Z340; 2 x Mushkin Reactor 1TB
Display(s) Acer 32" Z321QU 2560x1440; LG 34GP83A-B 34" 3440x1440
Case Lian Li PC-011 Dynamic XL; Synology DS218j w/ 2 x 2TB WD Red
Audio Device(s) SteelSeries Arctis Pro+
Power Supply EVGA SuperNova 850G3
Mouse Razer Basilisk V2
Keyboard Das Keyboard 6; Razer Orbweaver Chroma
Software Windows 10 Pro
Since some of you still don't understand the basics, I'm saying this once again:
- The primary purpose of async shaders is to utilize different resources for different purposes simultaneously.
- Rendering and compute does primarily utilize the exact same resources, so an already saturated GPU will only show minor gains.
- The fact that Radeon 200/300/RX400 series shows gains from utilizing the same resources for different tasks is proof that their GPUs are underutilized (which is confirmed by their low performance per GFlop). So it's a problem of their own making, which they have found a way for the game developers to "partially solve". It's a testament to their own inferior architecture, not to Nvidia's "lack of features".


All of this should be obvious. But when you guys can't even be bothered to get the basic understanding of the GPU architectures before you fill the forums with this trash, you have clearly prove yourself unqualified for a technical discussion.

I agree with everything you stated, but draw a different conclusion, and here is why:

- The primary purpose of async shaders is to be able to accept varied instructions from the scheduler for different purposes simultaneously.
- Rendering and compute does primarily utilize the exact same resources, so an already saturated scheduler and pipeline will only show minor gains.
- The fact that Radeon 200/300/RX400 series shows gains from utilizing the same resources for different tasks is proof that their GPU scheduler is able to send more instructions to different shaders than the competition, allowing them to work at full capacity (which is confirmed by their higher performance when using a more efficient API and efficiently coded engine). So it's a solution of their own making, which they have found a way for the game developers to fully utilize. It's a testament to their own architecture that multiple generations are getting substantial gains when the market utilized the given resources correctly.


Now that all of the consoles will be using Compute Units with a scheduler that can make full use of the shaders, I have a feeling most game will start being written to fully utilize them, and NV's arch will have to be reworked to include a larger path for the scheduler. I explained it to my son like this: Imagine a grocery store with a line of people (instructions) waiting to check out, but there is only one cashier (scheduler)...what async does is opens other lanes with more cashiers so that more lines of people can get out of the store faster to their car (shaders). AMD's Aync Compute Engine opens LOTS of lanes, while the NV scheduler opens a few to handle certain lines of people (like the express lane in this analogy).

It appears TimeSpy has limited use of Async, as only certain instructions are being routed through the async scheduler, while most a being routed through the main schedule. 10-20% async workload is not fully utilizing the scheduler of AMDs cards, even 4 generations back.

My 2 Cents.

JAT
 
Joined
Jun 10, 2014
Messages
2,902 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
The fact that Radeon 200/300/RX400 series shows gains from utilizing the same resources for different tasks is proof that their GPU scheduler is able to send more instructions to different shaders than the competition,
No, it proves that the GPU was unable to saturate those CUs with a single task.
If parallelizing two tasks requiring the same resources yields a performance increase, then some resources had to be idling in the first place. Any alternative would be impossible.

It's a testament to their own architecture that multiple generations are getting substantial gains when the market utilized the given resources correctly.
When they need a bigger 8602 GFlop/s GPU to match a 5632 GFlop/s GPU it's clearly an inefficient archiecture. If AMD scaled as well as Nvidia Fury X would outperform GTX 980 Ti by ~53% and AMD would kick Nvidia's ass.

Now that all of the consoles will be using Compute Units with a scheduler that can make full use of the shaders, I have a feeling most game will start being written to fully utilize them, and NV's arch will have to be reworked to include a larger path for the scheduler.
Even with the help of async shaders AMD are still not able to beat Nvidia with or without them. When facing an architecture with is ~50% more efficient it's not going to be enough.
 
Top