• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Futuremark Releases 3DMark Time Spy DirectX 12 Benchmark

No, it proves that the Scheduler was unable to saturate those CUs with a single task.
If parallelizing two tasks requiring the same resources yields a performance increase, then some resources had to be idling in the first place, because they were unable to get instructions from the Scheduler. Any alternative would be impossible.

The difference is in the way tasks are handed out, and the whole point is to get more instructions to idle shaders. But they are two dramatically different approaches. NVidia is best using limited async with instructions running in a mostly serial nature.

So that is the way nVidia approaches multiple workloads. They have very high granularity in when they are able to switch between workloads. This approach bears similarities to time-slicing, and perhaps also SMT, as in being able to switch between contexts down to the instruction-level. This should lend itself very well for low-latency type scenarios, with a mostly serial nature. Scheduling can be done just-in-time.

AMD on the other hand seems to approach it more like a ‘multi-core’ system, where you have multiple ‘asynchronous compute engines’ or ACEs (up to 8 currently), which each processes its own queues of work. This is nice for inherently parallel/concurrent workloads, but is less flexible in terms of scheduling. It’s more of a fire-and-forget approach: once you drop your workload into the queue of a given ACE, it will be executed by that ACE, regardless of what the others are doing. So scheduling seems to be more ahead-of-time (at the high level, the ACEs take care of interleaving the code at the lower level, much like how out-of-order execution works on a conventional CPU).

And until we have a decent collection of software making use of this feature, it’s very difficult to say which approach will be best suited for the real-world. And even then, the situation may arise, where there are two equally valid workloads in widespread use, where one workload favours one architecture, and the other workload favours the other, so there is not a single answer to what the best architecture will be in practice.
Source: https://scalibq.wordpress.com/

This is why NVidia cards shine so well, APIs today send out instructions in a mostly serial nature, wherein preemption works relatively well...however the new APIs are able to be used with inherently parallel workloads, which causes AMD cards to shine.

Please bear in mind I am not bashing either approach, NV cards are pure muscle, and I love it! but that also comes with a price. AMDs approach to bring that kind of power without needing the brute force approach is good for everyone, and is more cost effective when utilized correctly.
 
Last edited:
The difference is in the way tasks are handed out, and the whole point is to get more instructions to idle shaders. But they are two dramatically different approaches. NVidia is best using limited async with instructions running in a mostly serial nature.
When AMD needs a bigger 8602 GFlop/s GPU to match a 5632 GFlop/s GPU, it's clearly an inefficient design. There is no dismissing that. Nvidia has demonstrated that they support async shaders, and it's a design feature of their CUDA architecture.

Please bear in mind I am not bashing either approach, NV cards are pure muscle, and I love it! but that also comes with a price. AMDs approach to bring that kind of power without needing the brute force approach is good for everyone, and is more cost effective when utilized correctly.
Actually no, AMD is using a more "brute force" approach with many more cores to do the same work, and with a much less sophisticated scheduler to keep them busy. Nvidia has made a much more refined and advanced architecture in order to scale well on any workload, and they have clearly demonstrated that with CUDA.
 
3dmark doesn't use Asynchronous Compute!!!!

http://steamcommunity.com/app/223850/discussions/0/366298942110944664/


All of the current games supporting Asynchronous Compute make use of parallel execution of compute and graphics tasks. 3D Mark Time Fly support concurrent. It is not the same Asynchronous Compute....

So yeah... 3D Mark does not use the same type of Asynchronous compute found in all of the recent game titles. Instead.. 3D Mark appears to be specifically tailored so as to show nVIDIA GPUs in the best light possible. It makes use of Context Switches (good because Pascal has that improved pre-emption) as well as the Dynamic Load Balancing on Maxwell through the use of concurrent rather than parallel Asynchronous compute tasks. If parallelism was used then we would see Maxwell taking a performance hit under Time Fly as admitted by nVIDIA in their GTX 1080 white paper and as we have seen from AotS.


Sources:

https://www.reddit.com/r/Amd/comments/4t5ckj/apparently_3dmark_doesnt_really_use_any/

http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/220#post_25351958

As indicates the user "Mahigan", the future games will use the asynchronous computation, that it allows, of parallel form, the execution of tasks of calculation and graphs, but the surprise comes when the own description of Time Spy indicates that the asynchronous calculation is in use for superposing to a great extent passes of rendered to maximize the utilization of the GPU, some kind of Concurrent Computing called (Concurrent Computation).

The Concurrent Computation is a form of computation in that several calculations execute during periods of time superposed - concurrently - instead of sequentially (one that ends before it begins the following one), and obviously, it is not the asynchronous computation about which they brag games as the DOOM to take advantage of the real potential of a gpu AMD Radeon, in this case, under the API DirectX 12, which is where the software is executed. On not having used the asynchronous computation, 3DMark it seems to be adapted specifically in order to show the best possible performance in a GPU Nvidia. It being uses Context Switches's Changes (that it is something positive for Pascal, improvement pre-emption) as well as the Dynamic Load Balancing in Maxwell across the use of the asynchronous simultaneous tasks of computation instead of parallel.

Asynchronous computation AMD The architecture AMD GCN not only can handle these tasks, but it improves even more when the Parallelism is used, test of it they are DOOM's results under the API Vulkan. How? On having reduced the latency for frame across the executions in parallel of the graphs and having calculated the tasks. A reduction in the latency for - frame means that every frame needs of less time to be executed and processed. The net profit is a major speed of images per second, but 3DMark he lacks this one. If 3DMark It Time Spy it had implemented both the concurrence and the parallelism, a Radeon Fury X had reached in performance the GeForce GTX 1070 (In DOOM, the Fury X not only reaches her, but it overcomes it in performance).

If so AMD like Nvidia they are executing the same code that Pascal, it would be to win a bit or even to lose performance. This one is the reason for Bethesda the asynchronous computation did not allow to activate + AMD's graphs for Pascal. In his place, Pascal will have his own optimized route. The one that also they will be call an asynchronous computation making believe to the people who is the same thing when actually they are two completely different things. So it itself is happening, not all the implementations of asynchronous computation are equal.

hjklh.jpg
 
Last edited:
gonna upgrade my LAN Party rig's OS to Win 10 I guess... if I wanna bench with this new feature. =/ Will post the results 2morrow.
 
Asynchronous computation AMD The architecture AMD GCN not only can handle these tasks, but it improves even more when the Parallelism is used, test of it they are DOOM's results under the API Vulkan. How? On having reduced the latency for frame across the executions in parallel of the graphs and having calculated the tasks. A reduction in the latency for - frame means that every frame needs of less time to be executed and processed. The net profit is a major speed of images per second, but 3DMark he lacks this one. If 3DMark It Time Spy it had implemented both the concurrence and the parallelism, a Radeon Fury X had reached in performance the GeForce GTX 1070 (In DOOM, the Fury X not only reaches her, but it overcomes it in performance).
None of that made any sense at all.
You clearly don't understand how a GPU processes a pipeline. Each queue needs to have as few dependencies as possible, otherwise they will just stall rendering the splitting of queues pointless. Separate queues can do physics(1), particle simulations(2), texture (de)compression, video encoding, data transfer and similar. (1) and (2) are computational intensive and does utilize the same hardware resources as rendering, and having multiple queues competing for the same resources does introduce overhead and clogging. So if a GPU is going to get a speedup for (1) or (2), it needs to have a significant amount of such resources idling. If a GPU with a certain pipeline is utilized ~98%, and the overhead for splitting some of the "compute" tasks is ~5%, then you'll get a net loss of ~>3% for splitting the queues. This is the reason why most games do and many games will continue to disable async shaders for Nvidia hardware. But in cases where there are more resources idling, splitting might help a bit, like proven in the new 3DMark.
 
Anyone else getting an error when trying to run the time spy benchmark? I am using the newest Nvidia driver with two 980ti's.
 
So what we are discussing is Nvidia paying Futuremark to release a benchmark that is more favourable to its cards by going easy on asynchronous compute functions in order to attempt to divert attention away from the fact that its cards cannot deal with asynchronous compute at the hardware level and perform worse than AMD's offerings in this regard, is that a fair summary?
 
So what we are discussing is Nvidia paying Futuremark to release a benchmark that is more favourable to its cards by going easy on asynchronous compute functions in order to attempt to divert attention away from the fact that its cards cannot deal with asynchronous compute at the hardware level and perform worse than AMD's offerings in this regard, is that a fair summary?

I suspect it more than likely boils down to AMD fans getting defensive again, it's clear you guys hate a free market.
 
I suspect it more than likely boils down to AMD fans getting defensive again, it's clear you guys hate a free market.

You are wrong on two counts: I am not an AMD fan and market dominance does not necessarily equate to better quality or a more sensible purchase.
 
I suspect it more than likely boils down to AMD fans getting defensive again, it's clear you guys hate a free market.

It's totally fair game to discuss a benchmark that supposedly uses Async compute when it's performance numbers don't match up with what we've seen from other DX 12 and Vulcan titles. Every DX 12 and Vulkan title that includes proper Async compute we've seen to date has seen AMD have a large improvement in performance.

Free Market? Since when has the CPU and GPU market been free? Intel have been using monopoly tactics against AMD since the start (and were even forced to pay a small amount in court because of it) and Nvidia is only a little bit better with it's GameWorks program. Screwing over it's own customers and AMD video cards is "The way it's meant to be payed" by Nvidia.
 
One of the developers for it said this

FM_Jarnis said:
Yes it is. There are no "real" from-ground-up DX12 engine games out there yet. Well, except Ashes of Singularity and maybe Quantum Break (not sure about that).

Don't get too Real Housewife on us now.
 
You are wrong on two counts: I am not an AMD fan and market dominance does not necessarily equate to better quality or a more sensible purchase.

Good for you on both counts.

It's totally fair game to discuss a benchmark that supposedly uses Async compute when it's performance numbers don't match up with what we've seen from other DX 12 and Vulcan titles. Every DX 12 and Vulkan title that includes proper Async compute we've seen to date has seen AMD have a large improvement in performance.

Free Market? Since when has the CPU and GPU market been free? Intel have been using monopoly tactics against AMD since the start (and were even forced to pay a small amount in court because of it) and Nvidia is only a little bit better with it's GameWorks program. Screwing over it's own customers and AMD video cards is "The way it's meant to be payed" by Nvidia.

You see, it's rants like this that prove me right.
 
Well, we just need now a benchmark or software that takes advantage of multi-engine where not just performance improvements but taking advantage of those enhanced performance to let the devs add additional details/visuals to the game
 
So what we are discussing is Nvidia paying Futuremark to release a benchmark that is more favourable to its cards by going easy on asynchronous compute functions in order to attempt to divert attention away from the fact that its cards cannot deal with asynchronous compute at the hardware level and perform worse than AMD's offerings in this regard, is that a fair summary?

Basically the "A-sync" on Timespy is not the real A-sync integrated in Dx12 and Vulkan. It's a code path that can offer the similar effects IN THIS BENCH, and it "happens" to work well on nVidia's hardware. Even Maxwell can have a good time with this "A-sync" o_O
 
So what we are discussing is Nvidia paying Futuremark to release a benchmark that is more favourable to its cards by going easy on asynchronous compute functions in order to attempt to divert attention away from the fact that its cards cannot deal with asynchronous compute at the hardware level and perform worse than AMD's offerings in this regard, is that a fair summary?

As it turns out, NV disables Async on the driver level, so no matter how hard the benchmark is trying to push, 3dmark will never get asynch working, hence it's DX12 (Feature_Level 11) on NV hardware.

So, not so much the fault of Futuremark, just the usual cover-up from the green team.
 
Well damn now I want my $5 back from purchasing this damned time spy. I could have used that to play 4 more round of MvM in TF2!
 
I find this async thing to be a bit blown out of proportions, just like 480 PCIe power consumption.
There is very limited number of games that support async and even less that support it for both vendors. Sure there might be more in next year or two, but it's irrelevant if you're only going to play 1 or 2 of those, and by the time it'll be relevant, we'll already have at least one or two new generations of gpus out.
I'm quite sure that there are more people playing Minecraft than those who are playing AotS, therefore it only makes sense for you to base your purchase not on some synthetic benchmarks or games that you'll never play, but based on what best matches your needs.
 
I find this async thing to be a bit blown out of proportions, just like 480 PCIe power consumption.
There is very limited number of games that support async and even less that support it for both vendors. Sure there might be more in next year or two, but it's irrelevant if you're only going to play 1 or 2 of those, and by the time it'll be relevant, we'll already have at least one or two new generations of gpus out.
I'm quite sure that there are more people playing Minecraft than those who are playing AotS, therefore it only makes sense for you to base your purchase not on some synthetic benchmarks or games that you'll never play, but based on what best matches your needs.

Did you really just drag Minecraft into a discussion about high-end graphics cards?
 
Did you really just drag Minecraft into a discussion about high-end graphics cards?
Well to be fair, it's not just about high end graphics cards.
 
Well to be fair, it's not just about high end graphics cards.

Indeed, it's about DirectX12 right now. I just can understand that everyone is ticked off that FutureMark is selling a "DirectX 12" benchmark, which actually doesn't do anything DirectX 12 related and just says "well, if we throw this work-load at it, we'll let the scheduler decide, which would be kinda like DX12" (talking about FM's response on Steam.)
 
Nice. Futuremark develops Time Spy to help Nvidia (in fact it is like writing a code that makes a two core cpu and a single core cpu with hyperthreading performing the same, think it like a program where a i5 is equal to a i3 - useful if you want to sell more i3) and in Doom Nvidia it looks like cheating, at least in the 3,5+0,5 GTX 970 case.

PS. Based on minecraft requirements, no one should care about discrete GPUs.
 
Did you really just drag Minecraft into a discussion about high-end graphics cards?
This discussion has nothing to do with high end. Even if it was, there are plenty of games that will benefit from the raw power of the new gpus even without all the async thing. Yet, it was not the point I was trying to make.
 
I find this async thing to be a bit blown out of proportions, just like 480 PCIe power consumption.
There is very limited number of games that support async and even less that support it for both vendors. Sure there might be more in next year or two, but it's irrelevant if you're only going to play 1 or 2 of those, and by the time it'll be relevant, we'll already have at least one or two new generations of gpus out.
I'm quite sure that there are more people playing Minecraft than those who are playing AotS, therefore it only makes sense for you to base your purchase not on some synthetic benchmarks or games that you'll never play, but based on what best matches your needs.

inst everything blown out all proportion on sites like this one..

its par for the course..

having said that.. benchmarks like time spy are all about high end.. people that download and run them dont do it to to prove how crap their PC is.. which is why its on my gaming desktop and not on my atom powered windows 10 tablet.. he he

trog
 
Last edited:
someone needs to make a benchmark with a few variations:

0%/50%/100% async, for direct comparisons.
 
Back
Top