• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Godfall Benchmark Test & Performance Analysis

I see the main "problem" with Ampere architecture in games, that each sub-unit in SM consist of two blocks. One proper block with only FP32 and second with concurent mix of INT32 and FP32 ops...and I assuming, that without better shader compiling games are using only first block(half of the total shader units). So in some cases we can see that 2080ti with proper 4352 cuda cores can easily outperform 3070 with proper 2944 cuda cores(5888/2). GodFall is good example as it is using tons of shader effects in materials...probably during development it was cooked too fast and we see not so good quality of coding.
Applications like luxmark, 3dmark, ...are better optimized for GPU utilization, there we can see masive performance boos, almost teoretical boost(turing vs ampere).
Nvidia is aware of all this and therefore msrp prices of Ampere TeraFlops monsters are not higher than turing GPUs.
While you are correct, this not specific to Ampere in any way. Every generation makes architectural choices and for each of them you can devise a workload that will act poorly on that specific architecture.
The thing is, when Nvidia or AMD make these choices, they rely on their relationships with game developers and they try to help current common usage patterns, not hinder them. That's why reviews show newer cards to always be faster than what came before ;)
 
Shouldn't it be trivial in such cases to run the 2nd block in just one mode (int or fp) ?
I am not an expert or engineer but I am completely satisfied how ampere is designed and how is working :). For me is the second block something like HyperThreading in Intel CPUs(which gives us 28-30% more performance without significant arch. changes).
I am assuming, that it was better and smarter decision to create second block with mix functionality, because graphics in games is still moving more towards pure raytraced...so it is good compromise.
 
I see the main "problem" with Ampere architecture in games, that each sub-unit in SM consist of two blocks. One proper block with only FP32 and second with concurent mix of INT32 and FP32 ops...and I assuming, that without better shader compiling games are using only first block(half of the total shader units). So in some cases we can see that 2080ti with proper 4352 cuda cores can easily outperform 3070 with proper 2944 cuda cores(5888/2). GodFall is good example as it is using tons of shader effects in materials...probably during development it was cooked too fast and we see not so good quality of coding.
Applications like luxmark, 3dmark, ...are better optimized for GPU utilization, there we can see masive performance boos, almost teoretical boost(turing vs ampere).
Nvidia is aware of all this and therefore msrp prices of Ampere TeraFlops monsters are not higher than turing GPUs.

Welcome to the forums, best first post I've seen here in a long time :)

Shouldn't it be trivial in such cases to run the 2nd block in just one mode (int or fp) ?

It doesn't quite work like that because it is exclusively up to the SM, the compiler just generates the instructions as they are. There is a scheduler in each SM which decides which warps (out of hundreds of in-flight threads) get executed in a clock cycle. The thing is that Int and FP operations are interleaved as there is usually a need to calculate some addresses before a bunch of FP instructions need to take place however most instructions are still going to be FP.

The most likely pattern for an Ampere SM is 64 Int + 64 FP operations in a clock cycle followed by bunch of 128 FP operations in the next clock cycles and so on. This is still better than a Turing SM because in a clock cycle in which 64 Int operations need to take place the Ampere SM can still execute up to 64 FP operations additionally.
 
I am not an expert or engineer but I am completely satisfied how ampere is designed and how is working :). For me is the second block something like HyperThreading in Intel CPUs(which gives us 28-30% more performance without significant arch. changes).
I am assuming, that it was better and smarter decision to create second block with mix functionality, because graphics in games is still moving more towards pure raytraced...so it is good compromise.
Yeah, that's probably the driver's job, figuring out how to schedule execution in order to use the resources optimally.

And welcome to TPU, btw.
 
i never got the appeal of looter shooters, it's like playing an mmo but singleplayer. i guess it's the progression system that gets people hooked.
 
The GTX970 3.5GB thing was way over blown by vast majority of people that don't understand how this sort of stuff works.

You're kidding right. Nvidia lost their lawsuit for this. The whole issue was from Nvidia severing a memory controller link and sharing it with the one beside it for the last 512mb of VRAM causing horrible performance issues when that last 512mb was used. It would be much better if Nvidia just went with 3.5GB but then the specs will say 224-bit which doesn't look as good as 256-bit.
 
Who did the testing and how did you do it? Because those results make 0 sense and go against any kind of logic. Have you double and tripple checked those results? Verified them on a different system? Reached out to game developer and gpu manufacturers for public statements? Because that is what you should do when some benchmarks are wildly out of expected behavior.

And no it is not a VRAM issue. If it was then the 3070 (8gb) would not pull ahead of the 2080ti (11gb) on 4k when VRAM demand is the highest when those 2 cards are perform pretty much the same across most games. And drivers are also a questionable explanation at best because 3080 and 3090 literally use the same chip and same GDDR6x memory. Unless the memory usage is well above 10gb there should not be a giant fps advantage for the 3090.

I looks like a user/reviewer error or some system/driver corruption somewhere while testing those cards. I don't want to accuse you guys of failing to do benchmarking but for the sake for trustworthiness you should not just put those illogical results out there and say "this is how it is" but investigate the issue. Best case you hold those benchmarks back until you got a response from developers/AMD/Nvidia saying "this is expected performance because x" or "we are aware of the issue and are working on a fix".

You should strive for maximal trustworthiness and not become a second Userbenchmark putting out bs numbers.
 
Any idea why the 3080 -> 3090 delta is so high in this game? Maybe vram?
That 3090 performance jump over 3080 is insane compared to typical games. What's the cause? VRAM?
This benchmark dosnt looks right, RTX 2080Ti about 3 fps ahead of RTX 3080, then an RTX 3090, a card that is 10% faster than the 3080, has a 40% performance leap over it ??!


12GB of VRAM used, so, yeah.

That is also how DF crippled Turing in "early preview" of Ampere, so that Jensen Huang's lies look a bit less rampant, it was Doom, but "doesn't fit into VRAM" again.

[AMD did NVidia to this game!]

It is a console exclusive and consoles have only 16GB of RAM for everything and 2GB out of that is actually reserved for OS and 10GB of it is actually faster than the remaining 6GB (on XSeX).
 
Maybe he's talking about another game?

1605535157684.png
 
  • Like
Reactions: bug
Who did the testing and how did you do it? Because those results make 0 sense and go against any kind of logic. Have you double and tripple checked those results? Verified them on a different system? Reached out to game developer and gpu manufacturers for public statements? Because that is what you should do when some benchmarks are wildly out of expected behavior.

And no it is not a VRAM issue. If it was then the 3070 (8gb) would not pull ahead of the 2080ti (11gb) on 4k when VRAM demand is the highest when those 2 cards are perform pretty much the same across most games. And drivers are also a questionable explanation at best because 3080 and 3090 literally use the same chip and same GDDR6x memory. Unless the memory usage is well above 10gb there should not be a giant fps advantage for the 3090.

I looks like a user/reviewer error or some system/driver corruption somewhere while testing those cards. I don't want to accuse you guys of failing to do benchmarking but for the sake for trustworthiness you should not just put those illogical results out there and say "this is how it is" but investigate the issue. Best case you hold those benchmarks back until you got a response from developers/AMD/Nvidia saying "this is expected performance because x" or "we are aware of the issue and are working on a fix".

You should strive for maximal trustworthiness and not become a second Userbenchmark putting out bs numbers.
Really? Because W1zzard is just “some guy” who hasn’t been reviewing and reliably benchmarking for many years and isn’t well -respected? LOL.

Oh, and welcome to TPU. :wtf:
 
Hardware unboxed just rele
Really? Because W1zzard is just “some guy” who hasn’t been reviewing and reliably benchmarking for many years and isn’t well -respected? LOL.

Oh, and welcome to TPU. :wtf:
Hardware Unboxed just released their Godfall benchmarks and those follow the expected performance scaling behavior across multiple gpus and are not subject to some illogical behaviors seen in this review. If you compare all cards in both review you can see that most of them are actually performing pretty similar. But some (like the 3080 for example) are just way worse in this TPU benchmark. Last driver update was on november 9th so both reviews should have used the latest drivers. So this leads me to believe that something went wrong in the TPU review, like 99% sure there was an error. Like an issue when installing drivers for example. It can happen, it should be caught before publishing the data but the most important part is to fix it now. Whenever something is not performing as expected you should be cautious and validate it.
 
Last edited:
I retested RTX 3080 and the results are totally different, and now in line with expectations. Charts have been updated. Not sure what happened.

Will retest 3070 and 3090, too

edit: 3070 was wrong, too. 3090, 2080 ti, 5700xt are fine
 
Last edited:
I retested RTX 3080 and the results are totally different, and now in line with expectations. Charts have been updated. Not sure what happened.

Will retest 3070 and 3090, too
Thank you!
Also Props to you for actually retesting and not being like "I did this hundreds of times, there is no way my results could be inaccurate!". I still think it should have been caught pre-publication but mistakes can happen and acknowledging and fixing them is even more important. It might even not have been an actual user error. Software can be finicky and sometimes drivers get corrupted or partially fail their installation because windows felt like it or decides to do " random totally ultra important task" during a benchmark.

There is so much talk about hardware these days in the middle of 2 giant gpu series launches. There are so many questionable or straight up wrong information among those talks, it is just so important that review and benchmarking sites, which undoubtedly will be linked to as sources hundreds of times during those debates, strive for maximum accuracy. I am glad to see that we are doing just that. :clap:
 
I still think it should have been caught pre-publication
Agreed, and I did fail here indeed, because I saw the oddities and didn't double-check them. Maybe because it's because I was so unimpressed with the game that I just wanted to get it over with
 
Looks like those blaming VRAM should just blame the W1zzmeister instead, no matter how you spin it.
 
Last edited:
Very bright correlation: compute vs. gaming performance, AMD made a good choice to seperate to rdna and cdna as GCN failed at gaming. Whats botters me with Ampere is massive deficite in pixelfillrate against rdna2, I know its not 2003 anymore, but fillrate matters.
 
Absolutely, but I can't anymore because he fixed it already

And admitted he was wrong, which is so not a 2020 like behavior. It's refreshing.
 
And admitted he was wrong, which is so not a 2020 like behavior. It's refreshing.
Was he wrong though? I was under the impression he did what he always does, but the setup had a glitch. It was all normal upon retesting.
 
I retested RTX 3080 and the results are totally different, and now in line with expectations. Charts have been updated. Not sure what happened.

Will retest 3070 and 3090, too

edit: 3070 was wrong, too. 3090, 2080 ti, 5700xt are fine

In this and future situations where the review is updated, for whatever reason, it may be a good idea to add in the title something like " - Updated XX / YY / ZZZZ"

This way, a visitor becomes aware the contents of the review have changed since the date provided.

EDIT

Also, and since the review is now updated, then it's essentially "a new review", so you can add it to that new feature you introduced with that Lexar SSD review, even if "in updated form".
 
Last edited:
  • Like
Reactions: bug
Was he wrong though? I was under the impression he did what he always does, but the setup had a glitch. It was all normal upon retesting.

That's still wrong, just not his fault.
 
  • Like
Reactions: bug
Game gets horrible reviews lol some reviewer's say they would rather watch paint dry

Funny to see 3070 smash the 2080 Ti at 4K tho, this will happen in pretty much all new games going forward tho

This means that the game does not break 8GB requirement at 4K, like the arcticle is saying, only on 3090 because 24GB means higher allocation
You should have been telling us the VRAM usage at 4K using a 8GB card

Minimum fps would have been usefull too

Funny that this game can't do 21:9 at all, a friend of mine uninstalled it after 20 mins because of black bars :laugh:
 
Last edited:
Back
Top