• We've upgraded our forums. Please post any issues/requests in this thread.

Pixel Fillrate and Fermi

Joined
Mar 31, 2010
Messages
15 (0.01/day)
Likes
1
#1
The pixel fillrate in GPU-Z is displayed wrong for Nvidia Fermi based graphics cards. The pixel fillrate seems to be calculated by multiplying the number of ROPs and the GPU clock. But in case of Fermi gpus the pixel fillrate is generally not limited by the number of ROPs but by the number of streaming multiprocessors. Each streaming multiprocessor is capable of processing two pixels per clock. So if there are 16 SMs and 48 ROPs like in the GeForce GTX 580, the SMs limit the pixel fillrate. This is the case for all Fermi based graphics cards i know.
Having more ROPs than pixels that can be processed per clock help to sustain a high pixel fillrate when using multiple samples per pixel (i.e. multisampling antialiasing) but the peak pixel fillrate is limited by the stream processors.
Check out these benchmarks by hardware.fr (scroll down to section 'Fillrate'): http://www.hardware.fr/articles/806-4/nvidia-geforce-gtx-580-sli.html.
The measured peak pixel fillrate of the GeForce GTX 580 is 23,3 GPixel/s. Simply multiplying the 48 ROPs with the 772 MHz gpu clock would give you a peak pixel fillrate of 37,1 GPixel/s. But as the pixel fillrate is limited by the streaming multiprocessors, the peak fillrate is only 16*2*772 MPixel/s = 24,7 GPixel/s. This number corresponds well to the measurement taken by hardware.fr.
If you look at non fermi graphics cards you will see that the measured peak pixel fillrate corresponds well to the product of number of ROPs and gpu clock.

Many reviews cite the wrong peak pixel fillrate for Fermi cards and Nvidia doesn't publish pixel fillrate numbers on the product pages. But knowing the Fermi architectural properties you can easily calculate the right peak pixel fillrate. I hope that GPU-Z will be fixed in a way to show the right peak pixel fillrate on Nvidia Fermi graphics cards.
 

Athlon2K15

HyperVtX™
Joined
Sep 27, 2006
Messages
7,848 (1.91/day)
Likes
2,305
Location
O-H-I-O
Processor AMD Ryzen 7 1800x
Motherboard Asus Crosshair VI Hero
Cooling CH6 EK MonoBlock
Memory TridentZ 16GB DDR4 3600
Video Card(s) GTX 1080Ti EK Full Cover Block
Storage Samsung 960 Pro
Display(s) LG 34UC88 Curved Ultrawide
Case EVGA DG86
Power Supply Corsair RM850x
Mouse Asus Strix Evolve
Keyboard Asus Strix Claymore
#2
the almighty w1zzard should stop by in a moment
 
Joined
Mar 31, 2010
Messages
15 (0.01/day)
Likes
1
#3
Some additional background information on this topic:

Now, you may be thinking: no more than 8 fragments can be rasterised per GPC per base clock, thus it'd take 4 base clocks to fill a fragment warp, thus apparent rate would be 8 fragments per GPC per clock and thus 32 across the entire chip – why so many ROPs (6 of them equate to a theoretical maximum of 48-fragments per base clock)? Two reasons, at least in our opinion: first, the memory controller-to-ROP connection is so tight that it would have been quite intrusive to remove the extra ROPs, and second, atomics.
http://www.beyond3d.com/content/reviews/55/9

As we said, juicy! It's obvious that maximum ROP throughput equals ~16.8 GPixels/s, which is eerily close to 16.996 GPixels/s and exactly what we'd expect if there were only 28 ROPs on-chip, except we know that there are 40 of them. This is the point where we urge you to look upstream, at the ROP analysis, where we had already told you so.
http://www.beyond3d.com/content/reviews/55/13

If you are looking for the correct peak pixel fillrates of reference GeForces have a look at the German Wikipedia:

Especially for the lower mainstream GeForces the discrepancy between real peak fillrate and displayed fillrate in GPU-Z is large. I.e. the GeForce GT 540M:

10.8 GPixel/s in GPU-Z while the corect value is 2.7 GPixel/s. 2 SMs each processing 2 pixels per clock gives you 4 pixels/clock. At 672 MHz this gives you a peak pixel fillrate of 672*4 GPixel/s = 2688 MPixel/s.
 
Last edited:

Athlon2K15

HyperVtX™
Joined
Sep 27, 2006
Messages
7,848 (1.91/day)
Likes
2,305
Location
O-H-I-O
Processor AMD Ryzen 7 1800x
Motherboard Asus Crosshair VI Hero
Cooling CH6 EK MonoBlock
Memory TridentZ 16GB DDR4 3600
Video Card(s) GTX 1080Ti EK Full Cover Block
Storage Samsung 960 Pro
Display(s) LG 34UC88 Curved Ultrawide
Case EVGA DG86
Power Supply Corsair RM850x
Mouse Asus Strix Evolve
Keyboard Asus Strix Claymore
#5
It does and im a bit suprised wizzard hasnt said anything
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
17,070 (3.44/day)
Likes
17,985
Processor Core i7-4790K
Memory 16 GB
Video Card(s) GTX 1080
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 7
#6
ah i forgot about this, i'll look into it over the weekend
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
17,070 (3.44/day)
Likes
17,985
Processor Core i7-4790K
Memory 16 GB
Video Card(s) GTX 1080
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 7
#7
3dc_member, please confirm that this logic needs to be applied to the following gpus:

GF100 (32 shaders per SM)
GF104 (48 shaders per SM)
GF106 (48 shaders per SM)
GF108 (48 shaders per SM)

GF110 (32 shaders per SM)
GF114 (48 shaders per SM)
GF116 (48 shaders per SM)
GF118 (48 shaders per SM) (even though i've never seen any credible evidence that it exists, seems to be GF108)
GF119 (48 shaders per SM)

all numbers correct? any other gpus?
 
Last edited:
Joined
Mar 31, 2010
Messages
15 (0.01/day)
Likes
1
#8
According to the benchmarks of hardware.fr i can confirm that for GF100, GF104, GF106, GF110, and GF114 at least:

I can find no corresponding benchmarks for the low end parts but i don't think that they show any other behaviour.
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
17,070 (3.44/day)
Likes
17,985
Processor Core i7-4790K
Memory 16 GB
Video Card(s) GTX 1080
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 7
#9
please check if the attached build gives you the correct results
 

Attachments

Joined
Mar 31, 2010
Messages
15 (0.01/day)
Likes
1
#10
I've got no Fermi-GPU available, so other users with a GeForce 400/500 should check it out and compare with those values (the ones with GPixel/s as unit):
 
Last edited:
Joined
Mar 31, 2010
Messages
15 (0.01/day)
Likes
1
#11
A user at the forums of 3DCenter.org tried your testbuild with a GeForce GTX 460 and it seems to work:
http://www.forum-3dcenter.org/vbulletin/showpost.php?p=9055220&postcount=359
He also measured the pixel fillrate which gave a peak of around 9.5 GPixel/s for color fills while your testbuild of GPU-Z displays 11.2 GPixel/s (which is the correct theoretical value in my opinion).

I noticed that you used the testbuild for the reviews of the GeForce GTX 560 Ti 448 cores.
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
17,070 (3.44/day)
Likes
17,985
Processor Core i7-4790K
Memory 16 GB
Video Card(s) GTX 1080
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 7
#12
He also measured the pixel fillrate which gave a peak of around 9.5 GPixel/s for color fills while your testbuild of GPU-Z displays 11.2 GPixel/s (which is the correct theoretical value in my opinion).
gpu-z displays theoretical values. thanks

I noticed that you used the testbuild for the reviews of the GeForce GTX 560 Ti 448 cores.
nice catch, didnt notice myself :) the vga test system has a shortcut to the latest debug build on my work pc. so when i took those screenshots it used that build.
 
Joined
Mar 31, 2010
Messages
15 (0.01/day)
Likes
1
#13
Just to confuse you: The peak pixel fillrate can also be limited by the memory bandwidth if there isn't enough write cache with sufficient bandwidth available for the ROPs.
I.e. you got 17 GB/s (~DDR3-1066 @ 128 Bit) of memory bandwidth and 4 ROPs @ 2000 MHz (overclocked Intel HD Graphics 3000). Then the theoretical fillrate according to the ROPs should be 8 GPixel/s. But if you want to write 8 GPixel/s with 32 Bit per Pixel = 4 Byte per Pixel you need a memory bandwidth of 8*4 GByte/s = 32 GByte/s. So 17 GByte/s are not sufficient for 8 GPixel/s and the peak pixel fillrate will be limited to 4.25 GPixel/s.
As an example: http://www.forum-3dcenter.org/vbulletin/showpost.php?p=8497061&postcount=112.
DDR3-1066 -> 3.58 GPixel/s measured
DDR3-1333 -> 4.48 GPixel/s measured
DDR3-1400 -> 4.54 GPixel/s measured

As long as your write cache is large enough for the workload and has enough bandwidth, there won't be a limitation of the peak pixel fillrate.

I wouldn't consider bandwidth limitations of the peak pixel fillrate in GPU-Z. ;)
 
Joined
Aug 11, 2013
Messages
82 (0.05/day)
Likes
9
Location
Oshawa, Canada
System Name Old but Good.
Processor Intel Core i5 2500K at 4.8GHz (1.425)
Motherboard Asus P8Z68-V\GEN3 BIOS 3402
Cooling Cooler Master EVO +212, 4 120mm case fans.
Memory 4x4GB Kingston ValueRam DDR3-1333/1600 at 1600, 1.6v
Video Card(s) 2 eVGA GTX 760 2GB at 1214/7800 ACX SLI
Storage 2TB, 3TB SG Barracuda SATA 6 and a SG Expansion 3TB USB 3.0 drive
Display(s) 27" Acer G276HL LED Monitor 1920x1080@66Htz
Case Cooler Master
Audio Device(s) Realtek on-board 5.1 Surround
Power Supply 750w Cooler Master GXII SLI/XFire-ready
Mouse Logitech G500 Gamin' Mouse (8 Buttons)
Keyboard Mad Catz S.T.R.I.K.E.R. 3
Software Windows 10 Pro Build 10565
Benchmark Scores 3DMark2006-77891 3DMark Vantage: 64,558
#14
It makes sense. A GTX 760 2GB is over twice as fast as a GTX 460 1GB with the same amount of ROPS (32). My GTX 460 1GBs are limited to 11.7GPixels/sec whereas the GTX 760s are not limited.

I guess I'm going to upgrade to 2 eVGA GTX 760 2GB SC w/ ACX coolers SLI.