GPU-Z readings mismatch

jasonx · Nov 8, 2012

GPU-Z shows 2 different pixel Fill Rates, between 2 different versions.

from 0.5.7 to 0.6.6 - 10.1 GPixles/s
0.5.6- 23 GPixles/s

which one is correct

W1zzard · Nov 8, 2012

http://googleitfor.me/?q=gpu-z+gtx+460+fillrate

temp02 · Nov 8, 2012

That or this juicy first post.

And yes, the second screen-shot (most recent version of GPU-Z) is the correct one.

T4C Fantasy · Nov 8, 2012

W1zzard said:
http://googleitfor.me/?q=gpu-z+gtx+460+fillrate

hahahah I laughed so hard at this comment even if it wasn't meant to be funny

jasonx · Nov 8, 2012

thanx for the answer and the trolling at the same time, i did search maybe i missed the correct search query or look in the wrong sub forum, but anyways thanx

95Viper · Nov 8, 2012

Welcome to TPU, jasonx!

W1zzard gets to troll or anything W1zzard wants to do... it is his site.

However, there are a lot of other Trolls here and you can hit the little triangle, with the exclamation in it, to report them or any violation... the mods here will respond and are very fair.

Again, Welcome, and feel free to contribute.

W1zzard · Nov 8, 2012

jasonx said:
thanx for the answer and the trolling at the same time, i did search maybe i missed the correct search query or look in the wrong sub forum, but anyways thanx

I must have answered that question like 20 times and keep wondering why people can't find it. no offense, welcome to the forums

Maban · Nov 8, 2012

Perhaps add something about it in the fillrate tooltip only for Fermi? Though I'm sure it would go unnoticed by most.

Flickspeed · Jan 2, 2013

Pixel Fillrate Calculation

Hey are you sure you are using the correct way to calculate pixel fillrate in the current versions? I read the other threads and see some inconsistencies.

It seems the pixel fillrate is still not calculated properly for Fermi Cards. Are you taking into account the following information?

Each Streaming Multiprocessor(SM) in the GPU of GF100 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF104/106/108 architecture contains 48 SPs and 8 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF110 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF114/116/118/119 architecture contains 48 SPs and 8 SFUs.

Each SP can fulfill up to two single precision operations FMA per clock. Each SFU can fulfill up to four operations SF per clock. The approximate ratio of operations FMA to operations SF is equal: for GF100 4:1 and for GF104/106/108 3:1. The theoretical shader performance in single-precision floating point operations(FMA) [FLOPSsp, GFLOPS] of the graphics card with shader count [n] and shader frequency [f, GHz], is estimated by the following: FLOPSsp ≈ f × n × 2. Alternative formula: for GF100 FLOPSsp ≈ f × m × (32 SPs × 2(FMA)) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA)). [m] - SM count. Total Processing Power: for GF100 FLOPSsp ≈ f × m ×(32 SPs × 2(FMA) + 4 × 4 SFUs) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA) + 4 × 8 SFUs) or for GF100 FLOPSsp ≈ f × n × 2.5 and for GF104/106/108 FLOPSsp ≈ f × n × 8 / 3.[16] where:
SP - Shader Processor (Unified Shader, CUDA Core), SFU - Special Function Unit, SM - Streaming Multiprocessor, FMA - Fused MUL+ADD.

Based on this information the current calculation method is wrong!

T4C Fantasy · Jan 2, 2013

Flickspeed said:
Hey are you sure you are using the correct way to calculate pixel fillrate in the current versions? I read the other threads and see some inconsistencies.

It seems the pixel fillrate is still not calculated properly for Fermi Cards. Are you taking into account the following information?

Each Streaming Multiprocessor(SM) in the GPU of GF100 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF104/106/108 architecture contains 48 SPs and 8 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF110 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF114/116/118/119 architecture contains 48 SPs and 8 SFUs.

Each SP can fulfill up to two single precision operations FMA per clock. Each SFU can fulfill up to four operations SF per clock. The approximate ratio of operations FMA to operations SF is equal: for GF100 4:1 and for GF104/106/108 3:1. The theoretical shader performance in single-precision floating point operations(FMA) [FLOPSsp, GFLOPS] of the graphics card with shader count [n] and shader frequency [f, GHz], is estimated by the following: FLOPSsp ≈ f × n × 2. Alternative formula: for GF100 FLOPSsp ≈ f × m × (32 SPs × 2(FMA)) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA)). [m] - SM count. Total Processing Power: for GF100 FLOPSsp ≈ f × m ×(32 SPs × 2(FMA) + 4 × 4 SFUs) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA) + 4 × 8 SFUs) or for GF100 FLOPSsp ≈ f × n × 2.5 and for GF104/106/108 FLOPSsp ≈ f × n × 8 / 3.[16] where:
SP - Shader Processor (Unified Shader, CUDA Core), SFU - Special Function Unit, SM - Streaming Multiprocessor, FMA - Fused MUL+ADD.

Based on this information the current calculation method is wrong! Please recheck. For example the GTX 460 has 7 SM's for a total of 7*48 = 336 SP's!!!

check out the gpu database, it uses the latest known calculation for Fermi
http://www.techpowerup.com/gpudb/265/NVIDIA_GeForce_GTX_460.html

Flickspeed · Jan 2, 2013

T4C Fantasy said:
check out the gpu database, it uses the latest known calculation for Fermi
http://www.techpowerup.com/gpudb/265/NVIDIA_GeForce_GTX_460.html

Yep and I say that's wrong. Because the assumption was that Each SM could do 2 operations, but it's not each SM, it's each SP

Maban · Jan 2, 2013

Flickspeed said:
Hey are you sure you are using the correct way to calculate pixel fillrate in the current versions? I read the other threads and see some inconsistencies.

It seems the pixel fillrate is still not calculated properly for Fermi Cards. Are you taking into account the following information?

Each Streaming Multiprocessor(SM) in the GPU of GF100 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF104/106/108 architecture contains 48 SPs and 8 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF110 architecture contains 32 SPs and 4 SFUs.
Each Streaming Multiprocessor(SM) in the GPU of GF114/116/118/119 architecture contains 48 SPs and 8 SFUs.

Each SP can fulfill up to two single precision operations FMA per clock. Each SFU can fulfill up to four operations SF per clock. The approximate ratio of operations FMA to operations SF is equal: for GF100 4:1 and for GF104/106/108 3:1. The theoretical shader performance in single-precision floating point operations(FMA) [FLOPSsp, GFLOPS] of the graphics card with shader count [n] and shader frequency [f, GHz], is estimated by the following: FLOPSsp ≈ f × n × 2. Alternative formula: for GF100 FLOPSsp ≈ f × m × (32 SPs × 2(FMA)) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA)). [m] - SM count. Total Processing Power: for GF100 FLOPSsp ≈ f × m ×(32 SPs × 2(FMA) + 4 × 4 SFUs) and for GF104/106/108 FLOPSsp ≈ f × m × (48 SPs × 2(FMA) + 4 × 8 SFUs) or for GF100 FLOPSsp ≈ f × n × 2.5 and for GF104/106/108 FLOPSsp ≈ f × n × 8 / 3.[16] where:
SP - Shader Processor (Unified Shader, CUDA Core), SFU - Special Function Unit, SM - Streaming Multiprocessor, FMA - Fused MUL+ADD.

Based on this information the current calculation method is wrong! Please recheck. For example the GTX 460 has 7 SM's for a total of 7*48 = 336 SP's!!!

What the hell are you going on about? This thread is about pixel fill rate not shader count. You are obviously trying to figure floating point performance. That is entirely out of the scope of this thread.

Flickspeed · Jan 2, 2013

Maban said:
What the hell are you going on about? This thread is about pixel fill rate not shader count. You are obviously trying to figure floating point performance. That is entirely out of the scope of this thread.

Maban you can disregard anything after the last sentence in red.

I am just making corrections to the following post which wizzard made a base for calculations. It has a fundental error.

3dc_member said:
The pixel fillrate in GPU-Z is displayed wrong for Nvidia Fermi based graphics cards. The pixel fillrate seems to be calculated by multiplying the number of ROPs and the GPU clock. But in case of Fermi gpus the pixel fillrate is generally not limited by the number of ROPs but by the number of streaming multiprocessors. Each streaming multiprocessor is capable of processing two pixels per clock. So if there are 16 SMs and 48 ROPs like in the GeForce GTX 580, the SMs limit the pixel fillrate. This is the case for all Fermi based graphics cards i know.
Having more ROPs than pixels that can be processed per clock help to sustain a high pixel fillrate when using multiple samples per pixel (i.e. multisampling antialiasing) but the peak pixel fillrate is limited by the stream processors.
Check out these benchmarks by hardware.fr (scroll down to section 'Fillrate'): http://www.hardware.fr/articles/806-4/nvidia-geforce-gtx-580-sli.html.
The measured peak pixel fillrate of the GeForce GTX 580 is 23,3 GPixel/s. Simply multiplying the 48 ROPs with the 772 MHz gpu clock would give you a peak pixel fillrate of 37,1 GPixel/s. But as the pixel fillrate is limited by the streaming multiprocessors, the peak fillrate is only 16*2*772 MPixel/s = 24,7 GPixel/s. This number corresponds well to the measurement taken by hardware.fr.
If you look at non fermi graphics cards you will see that the measured peak pixel fillrate corresponds well to the product of number of ROPs and gpu clock.

Many reviews cite the wrong peak pixel fillrate for Fermi cards and Nvidia doesn't publish pixel fillrate numbers on the product pages. But knowing the Fermi architectural properties you can easily calculate the right peak pixel fillrate. I hope that GPU-Z will be fixed in a way to show the right peak pixel fillrate on Nvidia Fermi graphics cards.

It is not the SM that is limiting anything

Thats the fundamental error I marked it in red

So at the end of the day, it is still ROPs times MHz. Prove me wrong and give me the source saying an SM can only process 2 Pixels per clock.

Maban · Jan 2, 2013

Flickspeed said:
Maban you can disregard anything after the last sentence in red.

I am just making corrections to the following post which wizzard made a base for calculations. It has a fundental error.

It is not the SM that is limiting anything Thats the fundamental error I marked it in red So at the end of the day, it is still ROPs times MHz. Prove me wrong and give me the source saying an SM can only process 2 Pixels per clock.

Read the white papers.

Flickspeed · Jan 2, 2013

Maban said:
Read the white papers.

Show me where is says an SM is limited to two pixels per clock in the white papers. Links please.....

I don't work for nvidia and I am not an nvidia fan boy, I am just not liking the fact of misreporting theoretical pixel fillrate values without any valid proof. The start would be to prove that an SM can only do 2 pixels per clock, I couldn't find this anywhere on www except in some post here

Maban instead of posting useless comments you can start here: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf

good luck and have fun.

Also if there is no proof I would like to ask w1zzard to make GPU-Z calculate Theoretical Pixel Fillrates based on the old formula.

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

System Name	Whaaaat Kiiiiiiid!
Processor	Intel Core i9-12900K @ Default
Motherboard	Gigabyte Z690 AORUS Elite AX
Cooling	Corsair H150i AIO Cooler
Memory	Corsair Dominator Platinum 32GB DDR4-3200
Video Card(s)	EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage	Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s)	27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case	Thermaltake Core X9 Snow
Audio Device(s)	Logitech G935 Headset
Power Supply	SeaSonic Platinum 1050W Snow Silent
Mouse	Logitech G903 Lightspeed
Keyboard	Logitech G915
Software	Windows 11 Pro
Benchmark Scores	FFXV: 19329

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

System Name	Whaaaat Kiiiiiiid!
Processor	Intel Core i9-12900K @ Default
Motherboard	Gigabyte Z690 AORUS Elite AX
Cooling	Corsair H150i AIO Cooler
Memory	Corsair Dominator Platinum 32GB DDR4-3200
Video Card(s)	EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage	Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s)	27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case	Thermaltake Core X9 Snow
Audio Device(s)	Logitech G935 Headset
Power Supply	SeaSonic Platinum 1050W Snow Silent
Mouse	Logitech G903 Lightspeed
Keyboard	Logitech G915
Software	Windows 11 Pro
Benchmark Scores	FFXV: 19329

GPU-Z readings mismatch

jasonx

New Member

W1zzard

Administrator

temp02

New Member

T4C Fantasy

CPU & GPU DB Maintainer

jasonx

New Member

95Viper

Super Moderator

W1zzard

Administrator

Maban

Flickspeed

New Member

T4C Fantasy

CPU & GPU DB Maintainer

Flickspeed

New Member

Maban

Flickspeed

New Member

Maban

Flickspeed

New Member