• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

TPU's GPU Database Portal & Updates

For Nvidia, it is clear that it is easier to find the information, since they communicate a lot on their GPU.

But AMD is really heartbreaking, especially that in the same generation of architecture, we can have different ratios, as the case of Bristol Ridge against others of its version as Fiji or Tonga.
true, i really need more info on fermi though, its Cuda 2.0 for GL and cuda 2.1 for the rest
 
No double précision for all Tesla G8x, G9x and GT215/216/218. Only GT200 with 1:8

Two exemple from me:
 

Attachments

  • GeForce GTX 280.png
    GeForce GTX 280.png
    323.6 KB · Views: 387
  • GeForce 8800 ULTRA.png
    GeForce 8800 ULTRA.png
    324 KB · Views: 567
For Fermi: (From me also)

1:8 for GF100/110 GeForce
1:2 for GF100/110 Quadro/Tesla
1:12 for other Fermi GeForce and Quadro
 

Attachments

  • GTX 560 Ti.png
    GTX 560 Ti.png
    118.4 KB · Views: 565
  • GTX 580.png
    GTX 580.png
    118.4 KB · Views: 576
  • Quadro 2000.png
    Quadro 2000.png
    114.9 KB · Views: 568
For Fermi: (From me also)

1:8 for GF100/110 GeForce
1:2 for GF100/110 Quadro/Tesla
1:12 for other Fermi GeForce and Quadro
yes but remeber this isnt always correct, nvidia seems to go by Cuda version not workstation server, since Fermi shares the same chip and same cuda versions

GF100/110 is Cuda 2.0 and 2.0 is 1/2

but if you can find a chart that says 2.0 is an exception ill change xD
 
yes but remeber this isnt always correct, nvidia seems to go by Cuda version not workstation server, since Fermi shares the same chip and same cuda versions

GF100/110 is Cuda 2.0 and 2.0 is 1/2

but if you can find a chart that says 2.0 is an exception ill change xD

Version CUDA ok, but GeForce is for public, no necessary to have FP64 in max speed. The situation of Fermi is the same as Hawaii....or GK110 is CUDA 3.5, but 1:3 for Quadro/Tesla and Titan.....and 1:24 for GeForce GTX 780 (Ti).......same situation
 
Version CUDA ok, but GeForce is for public, no necessary to have FP64 in max speed. The situation of Fermi is the same as Hawaii....or GK110 is CUDA 3.5, but 1:3 for Quadro/Tesla and Titan.....and 1:24 for GeForce GTX 780 (Ti).......same situation
in the cuda documents it shows 3.5 as 1:3 only there is no 1:24

i fixed it.
 
Last edited:
The version gives the specifications in general of a series of chipset, and the max of their potentials. However, the functions of a video card template are defined by the bios of the card.

Physically, the GF100 and GK110 are identical between the Quadro and GeForce with the same number of FP64 units. But the BIOS will define how the softs will access the functions and instructions in the GPU. The Quadro and Tesla BIOS give 100% access to FP64 units, but GeForce gives them half, if not less.

There are two methods for bridging the FP64: Either by disabling FP64-specific instructions or by decreasing the frequency of FP64 shaders when they are called. This is the case of the GTX 780/780 Ti where the frequencies are simply decreased in the 30-40 MHz in FP64.

The CUDA version gives the MAX capabilities as possible, but it's the BIOS that sets the priorities, not just the frequencies, voltages or whatever (as some people think).

If we could search and compare the BIOS of a Quadro 7000 with a GTX 580, we will see differences about it.
 
The version gives the specifications in general of a series of chipset, and the max of their potentials. However, the functions of a video card template are defined by the bios of the card.

Physically, the GF100 and GK110 are identical between the Quadro and GeForce with the same number of FP64 units. But the BIOS will define how the softs will access the functions and instructions in the GPU. The Quadro and Tesla BIOS give 100% access to FP64 units, but GeForce gives them half, if not less.

There are two methods for bridging the FP64: Either by disabling FP64-specific instructions or by decreasing the frequency of FP64 shaders when they are called. This is the case of the GTX 780/780 Ti where the frequencies are simply decreased in the 30-40 MHz in FP64.

The CUDA version gives the MAX capabilities as possible, but it's the BIOS that sets the priorities, not just the frequencies, voltages or whatever (as some people think).

If we could search and compare the BIOS of a Quadro 7000 with a GTX 580, we will see differences about it.
i already fixed it.
 
I have information for old GPU:

This is still a bit difficult (information about these generations is less complete than at present), but it seems that the GeForce FX series were capable of FP32 calculations (that for 3D of course). It remains to know the true values, because being non-unified architectures. I know it's VLIW vector, but I do not know what type (like VLIW4 for Pixel Shaders, and VLIW3 for Vertexs Shaders). Also, you need to know how a GeForce 7900 GTX (650 MHz) can reach 300 GFLOP / s FP32 (Source: Nvidia slide) ..... namely.

In any case, all this started with the GeForce FX5000 and the Radeon R300.
 
I have information for old GPU:

This is still a bit difficult (information about these generations is less complete than at present), but it seems that the GeForce FX series were capable of FP32 calculations (that for 3D of course). It remains to know the true values, because being non-unified architectures. I know it's VLIW vector, but I do not know what type (like VLIW4 for Pixel Shaders, and VLIW3 for Vertexs Shaders). Also, you need to know how a GeForce 7900 GTX (650 MHz) can reach 300 GFLOP / s FP32 (Source: Nvidia slide) ..... namely.

In any case, all this started with the GeForce FX5000 and the Radeon R300.
Look at my rsx gpu in database to see how fp32 is calculated with pixel vertex
 
I could also know the actual speed of GPUs in texture filtering (with TMUs)

All Pascal / Maxwell / Kepler / Fermi: INT8-INT16 (1: 1) - FP16 (1: 2) - FP32 (1: 4)
Tesla GT2xx / G9x / G8x: INT8-INT16 (1: 1) - FP16 (1: 2) - FP32 (1: 4)
All AMD (GCN / TeraScale): INT8 (1: 1) - INT16 (1: 2) - FP16 (1: 2) - FP32 (1: 4)

(Source: techreport.com)
 
some generation sorting improvements will be coming soon! :)
 
Last edited:
Hi, new information for nvidia Turing: FP16 is fast: 2:1 FP32 :)
 
I was discussing Vega10 die size with someone in a thread. AMD news release contains official die sizes for both Vega10 as well as Vega20:
http://ir.amd.com/news-releases/news-release-details/amd-takes-high-performance-datacenter-computing-next-horizon said:
Radeon Instinct™ MI60 contains 13.2 billion transistors on a package size of 331.46mm2, while the previous generation Radeon Instinct™ MI25 had 12.5 billion transistors on a package size of 494.8mm2 – a 58% improvement in number of transistors per mm2.

(vs inner/outer die size of Vega10:
die-size physical vs official)
 
I was discussing Vega10 die size with someone in a thread. AMD news release contains official die sizes for both Vega10 as well as Vega20:


(vs inner/outer die size of Vega10:
die-size physical vs official)
So basically if you say die size abd dont specify then 510mm2 technically isnt incorrect since its still part of the die?
 
Vega 20 have already 128 ROPs? o_O
 
I find myself going to that GPU Database site every week lol, i love it. Props to the staff for making it, it comes in handy.
Question though, what is faster, my W5000 or the https://www.techpowerup.com/gpu-specs/firepro-v7900.c580
https://www.techpowerup.com/gpu-specs/firepro-w5000.c588

Might be a dumb question because the w5000 is newer, but the specs on the v7900 is much faster. Just curiuos, I'm really aiming for W7000 4GB version (if i can get a deal)
The w5000 is much more power efficient and still supported in drivers

added usb-c to outputs picture

added amd and nvidia graphics ip
 
Last edited:
wait a second, you said the w5100 is more power efficient, but the V7900 specs are way higher than the w5000. what's faster in performance overall? even though the w5000 might be cooler i am sure.
 
wait a second, you said the w5100 is more power efficient, but the V7900 specs are way higher than the w5000. what's faster in performance overall? even though the w5000 might be cooler i am sure.
GCN vs terascale 28nm vs 40nm, so its just more efficent by that alone
 
Back
Top