• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

GPU-Z readings mismatch

jasonx

New Member
Joined
Nov 8, 2012
Messages
2 (0.00/day)
GPU-Z shows 2 different pixel Fill Rates, between 2 different versions.

from 0.5.7 to 0.6.6 - 10.1 GPixles/s
0.5.6- 23 GPixles/s

which one is correct

 

T4C Fantasy

CPU & GPU DB Maintainer
Staff member
Joined
May 7, 2012
Messages
2,562 (0.59/day)
Location
Rhode Island
System Name Whaaaat Kiiiiiiid!
Processor Intel Core i9-12900K @ Default
Motherboard Gigabyte Z690 AORUS Elite AX
Cooling Corsair H150i AIO Cooler
Memory Corsair Dominator Platinum 32GB DDR4-3200
Video Card(s) EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s) 27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case Thermaltake Core X9 Snow
Audio Device(s) Logitech G935 Headset
Power Supply SeaSonic Platinum 1050W Snow Silent
Mouse Logitech G903 Lightspeed
Keyboard Logitech G915
Software Windows 11 Pro
Benchmark Scores FFXV: 19329

jasonx

New Member
Joined
Nov 8, 2012
Messages
2 (0.00/day)
thanx for the answer and the trolling at the same time, i did search maybe i missed the correct search query or look in the wrong sub forum, but anyways thanx
 

95Viper

Super Moderator
Staff member
Joined
Oct 12, 2008
Messages
12,670 (2.23/day)
Welcome to TPU, jasonx!

W1zzard gets to troll or anything W1zzard wants to do... it is his site.

However, there are a lot of other Trolls here and you can hit the little triangle, with the exclamation in it, to report them or any violation... the mods here will respond and are very fair.

Again, Welcome, and feel free to contribute.:)
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
27,037 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
thanx for the answer and the trolling at the same time, i did search maybe i missed the correct search query or look in the wrong sub forum, but anyways thanx

I must have answered that question like 20 times and keep wondering why people can't find it. no offense, welcome to the forums
 
Joined
Mar 6, 2008
Messages
2,753 (0.47/day)
Location
Minnesota
Perhaps add something about it in the fillrate tooltip only for Fermi? Though I'm sure it would go unnoticed by most.
 

Flickspeed

New Member
Joined
Jan 2, 2013
Messages
4 (0.00/day)
Pixel Fillrate Calculation

Hey are you sure you are using the correct way to calculate pixel fillrate in the current versions? I read the other threads and see some inconsistencies.

It seems the pixel fillrate is still not calculated properly for Fermi Cards. Are you taking into account the following information?

Each Streaming Multiprocessor(SM) in the GPU of GF100 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF104/106/108 architecture contains 48 SPs and 8 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF110 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF114/116/118/119 architecture contains 48 SPs and 8 SFUs.

Each SP can fulfill up to two single precision operations FMA per clock. Each SFU can fulfill up to four operations SF per clock. The approximate ratio of operations FMA to operations SF is equal: for GF100 4:1 and for GF104/106/108 3:1. The theoretical shader performance in single-precision floating point operations(FMA) [FLOPSsp, GFLOPS] of the graphics card with shader count [n] and shader frequency [f, GHz], is estimated by the following: FLOPSsp ≈ f × n × 2. Alternative formula: for GF100 FLOPSsp ≈ f × m × (32 SPs × 2(FMA)) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA)). [m] - SM count. Total Processing Power: for GF100 FLOPSsp ≈ f × m ×(32 SPs × 2(FMA) + 4 × 4 SFUs) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA) + 4 × 8 SFUs) or for GF100 FLOPSsp ≈ f × n × 2.5 and for GF104/106/108 FLOPSsp ≈ f × n × 8 / 3.[16] where:
SP - Shader Processor (Unified Shader, CUDA Core), SFU - Special Function Unit, SM - Streaming Multiprocessor, FMA - Fused MUL+ADD.

Based on this information the current calculation method is wrong!
 
Last edited:

T4C Fantasy

CPU & GPU DB Maintainer
Staff member
Joined
May 7, 2012
Messages
2,562 (0.59/day)
Location
Rhode Island
System Name Whaaaat Kiiiiiiid!
Processor Intel Core i9-12900K @ Default
Motherboard Gigabyte Z690 AORUS Elite AX
Cooling Corsair H150i AIO Cooler
Memory Corsair Dominator Platinum 32GB DDR4-3200
Video Card(s) EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s) 27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case Thermaltake Core X9 Snow
Audio Device(s) Logitech G935 Headset
Power Supply SeaSonic Platinum 1050W Snow Silent
Mouse Logitech G903 Lightspeed
Keyboard Logitech G915
Software Windows 11 Pro
Benchmark Scores FFXV: 19329
Hey are you sure you are using the correct way to calculate pixel fillrate in the current versions? I read the other threads and see some inconsistencies.

It seems the pixel fillrate is still not calculated properly for Fermi Cards. Are you taking into account the following information?

Each Streaming Multiprocessor(SM) in the GPU of GF100 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF104/106/108 architecture contains 48 SPs and 8 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF110 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF114/116/118/119 architecture contains 48 SPs and 8 SFUs.

Each SP can fulfill up to two single precision operations FMA per clock. Each SFU can fulfill up to four operations SF per clock. The approximate ratio of operations FMA to operations SF is equal: for GF100 4:1 and for GF104/106/108 3:1. The theoretical shader performance in single-precision floating point operations(FMA) [FLOPSsp, GFLOPS] of the graphics card with shader count [n] and shader frequency [f, GHz], is estimated by the following: FLOPSsp ≈ f × n × 2. Alternative formula: for GF100 FLOPSsp ≈ f × m × (32 SPs × 2(FMA)) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA)). [m] - SM count. Total Processing Power: for GF100 FLOPSsp ≈ f × m ×(32 SPs × 2(FMA) + 4 × 4 SFUs) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA) + 4 × 8 SFUs) or for GF100 FLOPSsp ≈ f × n × 2.5 and for GF104/106/108 FLOPSsp ≈ f × n × 8 / 3.[16] where:
SP - Shader Processor (Unified Shader, CUDA Core), SFU - Special Function Unit, SM - Streaming Multiprocessor, FMA - Fused MUL+ADD.

Based on this information the current calculation method is wrong! Please recheck. For example the GTX 460 has 7 SM's for a total of 7*48 = 336 SP's!!!

check out the gpu database, it uses the latest known calculation for Fermi
http://www.techpowerup.com/gpudb/265/NVIDIA_GeForce_GTX_460.html
 
Joined
Mar 6, 2008
Messages
2,753 (0.47/day)
Location
Minnesota
Hey are you sure you are using the correct way to calculate pixel fillrate in the current versions? I read the other threads and see some inconsistencies.

It seems the pixel fillrate is still not calculated properly for Fermi Cards. Are you taking into account the following information?

Each Streaming Multiprocessor(SM) in the GPU of GF100 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF104/106/108 architecture contains 48 SPs and 8 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF110 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF114/116/118/119 architecture contains 48 SPs and 8 SFUs.

Each SP can fulfill up to two single precision operations FMA per clock. Each SFU can fulfill up to four operations SF per clock. The approximate ratio of operations FMA to operations SF is equal: for GF100 4:1 and for GF104/106/108 3:1. The theoretical shader performance in single-precision floating point operations(FMA) [FLOPSsp, GFLOPS] of the graphics card with shader count [n] and shader frequency [f, GHz], is estimated by the following: FLOPSsp ≈ f × n × 2. Alternative formula: for GF100 FLOPSsp ≈ f × m × (32 SPs × 2(FMA)) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA)). [m] - SM count. Total Processing Power: for GF100 FLOPSsp ≈ f × m ×(32 SPs × 2(FMA) + 4 × 4 SFUs) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA) + 4 × 8 SFUs) or for GF100 FLOPSsp ≈ f × n × 2.5 and for GF104/106/108 FLOPSsp ≈ f × n × 8 / 3.[16] where:
SP - Shader Processor (Unified Shader, CUDA Core), SFU - Special Function Unit, SM - Streaming Multiprocessor, FMA - Fused MUL+ADD.

Based on this information the current calculation method is wrong! Please recheck. For example the GTX 460 has 7 SM's for a total of 7*48 = 336 SP's!!!

What the hell are you going on about? This thread is about pixel fill rate not shader count. You are obviously trying to figure floating point performance. That is entirely out of the scope of this thread.
 

Flickspeed

New Member
Joined
Jan 2, 2013
Messages
4 (0.00/day)
What the hell are you going on about? This thread is about pixel fill rate not shader count. You are obviously trying to figure floating point performance. That is entirely out of the scope of this thread.

Maban you can disregard anything after the last sentence in red.

I am just making corrections to the following post which wizzard made a base for calculations. It has a fundental error.

The pixel fillrate in GPU-Z is displayed wrong for Nvidia Fermi based graphics cards. The pixel fillrate seems to be calculated by multiplying the number of ROPs and the GPU clock. But in case of Fermi gpus the pixel fillrate is generally not limited by the number of ROPs but by the number of streaming multiprocessors. Each streaming multiprocessor is capable of processing two pixels per clock. So if there are 16 SMs and 48 ROPs like in the GeForce GTX 580, the SMs limit the pixel fillrate. This is the case for all Fermi based graphics cards i know.
Having more ROPs than pixels that can be processed per clock help to sustain a high pixel fillrate when using multiple samples per pixel (i.e. multisampling antialiasing) but the peak pixel fillrate is limited by the stream processors.
Check out these benchmarks by hardware.fr (scroll down to section 'Fillrate'): http://www.hardware.fr/articles/806-4/nvidia-geforce-gtx-580-sli.html.
The measured peak pixel fillrate of the GeForce GTX 580 is 23,3 GPixel/s. Simply multiplying the 48 ROPs with the 772 MHz gpu clock would give you a peak pixel fillrate of 37,1 GPixel/s. But as the pixel fillrate is limited by the streaming multiprocessors, the peak fillrate is only 16*2*772 MPixel/s = 24,7 GPixel/s. This number corresponds well to the measurement taken by hardware.fr.
If you look at non fermi graphics cards you will see that the measured peak pixel fillrate corresponds well to the product of number of ROPs and gpu clock.

Many reviews cite the wrong peak pixel fillrate for Fermi cards and Nvidia doesn't publish pixel fillrate numbers on the product pages. But knowing the Fermi architectural properties you can easily calculate the right peak pixel fillrate. I hope that GPU-Z will be fixed in a way to show the right peak pixel fillrate on Nvidia Fermi graphics cards.

It is not the SM that is limiting anything :):) Thats the fundamental error I marked it in red ;) So at the end of the day, it is still ROPs times MHz. Prove me wrong and give me the source saying an SM can only process 2 Pixels per clock.
 
Last edited:
Joined
Mar 6, 2008
Messages
2,753 (0.47/day)
Location
Minnesota
Maban you can disregard anything after the last sentence in red.

I am just making corrections to the following post which wizzard made a base for calculations. It has a fundental error.



It is not the SM that is limiting anything :):) Thats the fundamental error I marked it in red ;) So at the end of the day, it is still ROPs times MHz. Prove me wrong and give me the source saying an SM can only process 2 Pixels per clock.

Read the white papers.
 

Flickspeed

New Member
Joined
Jan 2, 2013
Messages
4 (0.00/day)
Read the white papers.

Show me where is says an SM is limited to two pixels per clock in the white papers. Links please.....

I don't work for nvidia and I am not an nvidia fan boy, I am just not liking the fact of misreporting theoretical pixel fillrate values without any valid proof. The start would be to prove that an SM can only do 2 pixels per clock, I couldn't find this anywhere on www except in some post here :)

Maban instead of posting useless comments you can start here: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf :) good luck and have fun.

Also if there is no proof I would like to ask w1zzard to make GPU-Z calculate Theoretical Pixel Fillrates based on the old formula.
 
Last edited:
Top