Feedback & Suggestions: Performance benchmarks.

CynicalCyanide · Dec 18, 2015

Hi.

First time poster, long time reader here. I've loved and used Techpowerup's reviews on GPUs, CPUs etc for many years now, and the data from this website has been invaluable for constructing my own guide to picking PC hardware for full builds (although I won't link it because of the Forum Guidelines, it currently sits on my own website and also on another forum with almost 200,000 views and 2,000 replies on the latter alone). You'll just have to believe me when I say that I spend probably 10 hours a week pouring over PC Hardware product reviews.

I mention all of this because I've noticed a couple of things that other review sites have done a bit better than TPU, which is a shame because TPU has by far the best and most consistent format which makes it easy to get straight into the data, with particular kudos for providing summary pages.

However, I think that only displaying average FPS creates a lack of a really basic feature in a world where the majority of the larger tech sites have even gone a step further than just providing average + minimum FPS by providing detailed FCAT data and/or measures of the 0.1% & 0.01% lowest framerates. Furthermore, while I understand that a 'flat' and minimalist graphic atheistic is all the rage these days (though personally I definitely dislike it) - Did we really need to remove the AA settings from the text at the top of the graphs?

While I presume that the AA settings remain the same as in previous reviews with the old graphical style, a new and/or niave reader might see the "All games are set to their highest quality setting unless indicated otherwise." line in the 'Test System' page and assume that AA is maxed out for every graph. Or that there is no AA being used at all (which would also be pretty silly, since who on earth with a high end GPU doesn't play with AA turned on at 1080p, or even 2560x1440?)

A smaller, perhaps pedantic bit of feedback: The resolution charts are descending from left to right, top to bottom. But on the Summary page, it's reversed. Probably should keep it consistent instead.

I'm also not sure if there's a particular objection to hosting a 'synthetic' benchmark such as 3DMark, but I think having one might be perhaps more valuable than at least one of the large selection of games in the test suite (and at least one of which I'm sure will turn out to be flavour of the month games and quickly die off in popularity). One that comes to in mind is Futuremark's upcoming DX12 "Time Spy" benchmark, which would kill two birds with one stone by also adding a DX12 benchmark. Personally I find synthetic benchmarks easier to compare and contrast, especially against my own hardware here at home. If you were to choose a synthetic benchmark and test a mode which is free for users, then that would also provide an opportunity for them to make sure their (and your!) machines are running as expected, and/or conveniently measure up their own hardware against that which is being reviewed.

My last idea is perhaps the most work intensive, but perhaps also the most interesting. The basic premise is a 'bias' measurement for games. In short, the results of all hardware of a single Architecture is averaged and compared on a game-by-game basis to the overall 'summary' performance of that Architecture. The explanation for how it would be done is a little confusing, so bear with me.

To explain by example: Say the average performance for all the GPUs in a given game (Game 'X') is 50FPS. The average performance across all games is 100FPS. The baseline for Game X would then be 50/100 = 50%. Then, if all GCN (1.0) based cards perform at an average of 60FPS for game X, but perform at an average of 90FPS across all games, then they would get a figure of 60/90 = 66.66% for Game X, which would then be compared to the baseline (which is normalised to 100%). Thus, with the baseline as 50%, normalised to 100%, you would get 133.33% in game X for GCN 1.0, which would indicate that the game is strongly favourably biased towards that particular Architecture (i.e. 1.33x faster than otherwise expected). Mind you, each resolution would need it's own set of calculations. As long as you're storing your results in some sane format (excel tables would do fine) it should be pretty easy to setup a formula and let it do its thing with minimal fuss.

Assuming I did my rough maths right, I tried this technique out on the reference 980 Ti results from the recent Gigabyte 980 Ti Waterforce review, and picked the 4K Civ 5: Beyond Earth benchmark as 'Game X'.

The Civ 5 result for the 980 Ti was 59.7 FPS.
The Civ 5 average result was 61.99 FPS.
The 980 Ti average across all games (summary) was 79%.
The average result (for the same set of cards tested in Civ 5) across the summary was 77.42%.

Therefore, taking the 980 Ti's Civ 5 result and dividing it by the average Civ 5 result:
59.7 / 61.99 = 96.31% as good as the average.

And taking the summary result for the 980 Ti and dividing it by the average summary result:
79% / 77.42% = 102.04% better than average.

i.e. We would expect the 980 Ti to perform ~2% better than the average in Civ 5, but it performed WORSE than the average Civ 5 result. A better way to show the result is by first obtaining the baseline relative performance (Civ 5 vs. Summary), then comparing it to the card's relative performance. I've thrown the Fury X results in here for comparison:

Baseline: 61.99 FPS / 77.42% = 80.01%
980 Ti: 59.7 FPS / 79% = 75.57%
Fury X: 71.4 FPS / 83% = 86.02%

Then, normalising the baseline (80.01%) to 100%, we get:
Baseline: 100%.
980 Ti: 94.37%. Or, ~6% slower than expected (judging by the summary results).
Fury X: 107.4 %. Or, ~7% faster than expected.

In conclusion, for this particular example Civ 5 was a bit kind to the Fury X, and a bit cruel to the 980 Ti.

Although I didn't have time, you could combine and average all the cards from the same Arch first, rather than averaging the entire list of cards. This prevents the amount of cards influencing the result (i.e. if you have many cards that are favourably biased in a game's benchmark list, then that will unfairly inflate the average score for the game relative to the summary score, and therefore exaggerate the apparent bias). The amount of cards from each arch in the Civ 5 example was fairly even, so I didn't bother. You could probably just keep it simple by broadly grouping everything into 'Maxwell', 'GCN', 'Kepler' etc if need be. Of course, there's nothing stopping you from using the same method on CPUs or whatever else as well.

This information could be handy for identifying which games are outliers in terms of performance bias, as well as looking at potential improvements (or 'fixes') from patches and driver updates over time. It's upto you how you would display this information, whether on a separate page or in-line on each bar in the graph or whatever.

While I know these ideas would probably be a lot of work, I think they would provide the best, most useful & thorough data, and the bias analysis is interesting enough to maybe even catch on I think.

Let me know what y'all think!

W1zzard · Dec 18, 2015

CynicalCyanide said:
gone a step further

I rather provide more games, more resolutions, more tested cards than other sites. I definitely don't want to provide 3 numbers (min/max/avg), but cut down the things I mentioned by 60% to make up for the time (it'll probably be even more time).

CynicalCyanide said:
Did we really need to remove the AA settings from the text at the top of the graphs?

Will put it back next time I think of it, it was never intended to be removed

CynicalCyanide said:
The resolution charts are descending from left to right, top to bottom. But on the Summary page, it's reversed. Probably should keep it consistent instead.

Probably

CynicalCyanide said:
DX12 "Time Spy" benchmark, which would kill two birds with one stone by also adding a DX12 benchmark

We'll see when it comes out

Right now the synthetics provide nothing and I rather have an additional game benchmark. "compare against own" imo will only result in people making comments why i got lower/higher scores than they got

CynicalCyanide said:
'bias' measurement for games

Interesting idea, probably for a single article, not for every review. w1zzard@techpowerup.com I can send you the raw data

CynicalCyanide · Dec 18, 2015

W1zzard said:
I rather provide more games, more resolutions, more tested cards than other sites. I definitely don't want to provide 3 numbers (min/max/avg), but cut down the things I mentioned by 60% to make up for the time (it'll probably be even more time).

I definitely respect that - But I don't think it would really increase the amount of time too much at all. You're already recording all of the data you need - The min FPS is simply the lowest value (max is unimportant). If so, it would just be a matter of adding a second bar, either alongside, as so: http://media.gamersnexus.net/images/media/2014/nvidia/gtx980-bench-mll-vhigh.jpg

... Or inline, like so: http://icrontic.com/uploads/features/gaming/2011/08/09.gamegpu.ru_.Battlefield3.jpg

It should only take a quick modification to whatever software you're using (I presume something like Excel, in which case it would simply be a matter of using =min on the raw FPS readings), then another to whatever generates your graphs (again assuming Excel, you can use a clustered bar, and then set series overlap to 100% if you want the inline look). From there it should be all automated.

If simply displaying one (or two for that matter) more metric(s) from your data would take you the equivalent time of actually conducting 60% of your actual benchmarks or MORE - Then you've got a serious work efficiency problem dealing with your raw data. I'm by no means a wizard at excel (pun intended), but it took me less than half an hour to record a short FO:4 test run (as well as simulate a couple others), make from scratch an Excel sheet/template importing data from a .csv file, then craft up a basic graph (below) - I even had a little time to mess around a bit with the colours and fonts and whatnot. You would only need to enter each card's name once, save it as a template, then when you've got your data, fire it up and enter it.

The graph shows a classic example (even if differences of this magnitude aren't exactly extremely common) of what I mean: Despite Card B having higher avg FPS, I would argue that card C has the better experience provided.

Besides - If I were buying a card, I would prefer to have a smaller, thoroughly tested set of key games rather than additional benchmarks which could be hiding stuttering or a large variance in FPS. Dual GPU setups is a classic example - they often perform exceptionally in average FPS, but very poorly in Min/Frametime tests ... Which is why most professionals didn't (and still often don't) recommend them over single GPU flagship cards in spite of the impressive avg results on paper. FPS drops due to sailing too close to VRAM limits is another example that can cause FPS dips and stutters that don't show very strongly in avg FPS but can in reality be unplayable. That sort of thing is why minimum FPS became the norm a long time ago for reviews, and is now taken a step further with frametime metrics: Because it's important information. It just seems strange to me that you would go to such lengths to provide detailed clock profile information, fan noise, temps, Perf/Watt ... The whole nine yards - But not a key performance metric.

W1zzard said:
We'll see when it comes out Right now the synthetics provide nothing and I rather have an additional game benchmark. "compare against own" imo will only result in people making comments why i got lower/higher scores than they got

Well personally I wouldn't go so far as to say they provide nothing, but fair enough. And I don't think it would be so bad to have that kind of feedback tbh.

W1zzard said:
Interesting idea, probably for a single article, not for every review. w1zzard@techpowerup.com I can send you the raw data

Certainly - I'll send you an email. What would you like me to do with the raw data?

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

Feedback & Suggestions: Performance benchmarks.

CynicalCyanide

New Member

W1zzard

Administrator

CynicalCyanide

New Member