TPU's GPU Database Portal & Updates

W1zzard · Feb 2, 2025

Tanzmusikus said:
Hello @eidairaman1
I would love to upload the vBIOS my new office graphics card.
It's a Nvidia GeForce GT 1030 4GHD4 LP OC with 4 GiB DDR4 as vRAM.

It's already in the GPU database, but vBIOS is not in the Video BIOS Collection.

MSI GT 1030 Low Profile Passive OC DDR4 Specs

NVIDIA GP108, 1430 MHz, 384 Cores, 24 TMUs, 16 ROPs, 2048 MB DDR4, 1050 MHz, 64 bit

www.techpowerup.com

Thanks

Upload it with GPU-Z, this submits some additional information that helps us categorize the BIOS correctly

yzonker · Feb 3, 2025

So I guess 50 series bios extraction is still in work? No mention of it in the GPUZ release notes but doesn't seem to be working.

TRINITAS · Feb 14, 2025

Hi,

I saw that you don't have the Chinese MTT video cards, I was able to gather the information concerning them

MTT S80:	MTT S70:	MTT S50:	MTT S30:	MTT S10:
=>Architecture: MTT MUSA-Chunxiao	=>Architecture: MTT MUSA-Chunxiao	=>Architecture: MTT MUSA-Chunxiao	=>Architecture: MTT MUSA-Chunxiao	=>Architecture: MTT MUSA-Chunxiao
=>Process Size: 12nm	=>Process Size: 12nm	=>Process Size: 12nm	=>Process Size: 12nm	=>Process Size: 12nm
=>Die size: ?	=>Die size: ?	=>Die size: ?	=>Die size: ?	=>Die size: ?
=>Transistors: ~22.000M	=>Transistors: ~22.000M	=>Transistors: ?	=>Transistors: ?	=>Transistors: ?
=>8 MPC (GPC NVIDIA equivalent)	=>7 MPC (GPC NVIDIA equivalent)	=>4 MPC (GPC NVIDIA equivalent)	=>2 MPC (GPC NVIDIA equivalent)	=>2 MPC (GPC NVIDIA equivalent)
=>16 MPX (TPC NVIDIA equivalent)	=>14 MPX (TPC NVIDIA equivalent)	=>8 MPX (TPC NVIDIA equivalent)	=>4 MPX (TPC NVIDIA equivalent)	=>4 MPX (TPC NVIDIA equivalent)
=>32 MP (SM/CU equivalent)	=>28 MP (SM/CU equivalent)	=>16 MP (SM/CU equivalent)	=>8 MP (SM/CU equivalent)	=>8 MP (SM/CU equivalent)
=>4096 ALU FP32 / 1024 ALU INT32	=>3584 ALU FP32 / 896 ALU INT32	=>2048 ALU FP32 / 512 ALU INT32	=>1024 ALU FP32 / 256 ALU INT32	=>1024 ALU FP32 / 256 ALU INT32
=>64 ALU FP64	=>56 ALU FP64	=>32 ALU FP64	=>16 ALU FP64	=>16 ALU FP64
=>32 ALU SFU	=>28 ALU SFU	=>16 ALU SFU	=>8 ALU SFU	=>8 ALU SFU
=>32 Matrix ALU	=>28 Matrix ALU	=>16 Matrix ALU	=>8 Matrix ALU	=>8 Matrix ALU
=>TMU: 256	=>TMU: 224	=>TMU: 128	=>TMU: 64	=>TMU: 64
=>ROP: 256	=>ROP: 224	=>ROP: 128	=>ROP: 64	=>ROP: 64
=>Cache L0: ?/MP	=>Cache L0: ?/MP	=>Cache L0: ?/MP	=>Cache L0: ?/MP	=>Cache L0: ?/MP
=>Cache L1: ?	=>Cache L1: ?	=>Cache L1: ?	=>Cache L1: ?	=>Cache L1: ?
=>Cache L2: 4 Mo	=>Cache L2: 3,5 Mo	=>Cache L2: ?	=>Cache L2: ?	=>Cache L2: ?
=>Clock: 1800 MHz	=>Clock: 1600 MHz	=>Clock: 1200 MHz	=>Clock: 1300 MHz	=>Clock: 1000 MHz
=>FP32: 14.74 TFLOPs	=>FP32: 11,46 TFLOPs	=>FP32: 4,91 TFLOPs	=>FP32: 2,66 TFLOPs	=>FP32: 2,04 TFLOPs
=>FP16: 29.49 TFLOPs (2:1)	=>FP16: 22,93 TFLOPs (2:1)	=>FP16: 9,83 TFLOPs (2:1)	=>FP16: 5,32 TFLOPs (2:1)	=>FP16: 4,09 TFLOPs (2:1)
=>FP64: 230.4 GFLOPs (1:64)	=>FP64: 179,2 GFLOPs (1:64)	=>FP64: 77,8 GFLOPs (1:64)	=>FP64: 41,6 GFLOPs (1:64)	=>FP64: 32,0 GFLOPs (1:64)
=>VRAM: 16GB G-DDR6	=>VRAM: 7GB G-DDR6	=>VRAM: 8GB G-DDR6	=>VRAM: 4GB G-DDR6	=>VRAM: 2GB G-DDR6
=>Clock VRAM: 14 GBps (1750 MHz)	=>Clock VRAM: 14 GBps (1750 MHz)	=>Clock VRAM: ? (? MHz)	=>Clock VRAM: ? (? MHz)	=>Clock VRAM: ? (? MHz)
=>Bus: 256-bits	=>Bus: 224-bits	=>Bus: 256-bits	=>Bus: 128-bits	=>Bus: 64-bits
=>Bandwidth: 448 GB/s	=>Bandwidth: 392 GB/s	=>Bandwidth: ?	=>Bandwidth: ?	=>Bandwidth: ?
=>PCIe: 16x 5.0	=>PCIe: 16x 4.0	=>PCIe: 16x 3.0	=>PCIe: 8x 4.0	=>PCIe: 8x 4.0
=>TBP: 255 Watts (1x8-pins)	=>TBP: 220 Watts (1x8-pins)	=>TBP: 85 Watts (1x6-pins)	=>TBP: 40 Watts	=>TBP: 30 Watts
=>Outputs: 3xDP 1.4a + 1xHDMI 2.1	=>Outputs: 3xDP 1.4a + 1xHDMI 2.1	=>Outputs: 2xDP 1.4a + 1xHDMI 2.0	=>Outputs: 1xHDMI + 1xVGA	=>Outputs: 1xHDMI + 1xVGA

T4C Fantasy · Feb 17, 2025

TRINITAS said:
Hi,

I saw that you don't have the Chinese MTT video cards, I was able to gather the information concerning them

MTT S80: MTT S70: MTT S50: MTT S30: MTT S10:
=>Architecture: MTT MUSA-Chunxiao =>Architecture: MTT MUSA-Chunxiao =>Architecture: MTT MUSA-Chunxiao =>Architecture: MTT MUSA-Chunxiao =>Architecture: MTT MUSA-Chunxiao
=>Process Size: 12nm =>Process Size: 12nm =>Process Size: 12nm =>Process Size: 12nm =>Process Size: 12nm
=>Die size: ? =>Die size: ? =>Die size: ? =>Die size: ? =>Die size: ?
=>Transistors: ~22.000M =>Transistors: ~22.000M =>Transistors: ? =>Transistors: ? =>Transistors: ?
=>8 MPC (GPC NVIDIA equivalent) =>7 MPC (GPC NVIDIA equivalent) =>4 MPC (GPC NVIDIA equivalent) =>2 MPC (GPC NVIDIA equivalent) =>2 MPC (GPC NVIDIA equivalent)
=>16 MPX (TPC NVIDIA equivalent) =>14 MPX (TPC NVIDIA equivalent) =>8 MPX (TPC NVIDIA equivalent) =>4 MPX (TPC NVIDIA equivalent) =>4 MPX (TPC NVIDIA equivalent)
=>32 MP (SM/CU equivalent) =>28 MP (SM/CU equivalent) =>16 MP (SM/CU equivalent) =>8 MP (SM/CU equivalent) =>8 MP (SM/CU equivalent)
=>4096 ALU FP32 / 1024 ALU INT32 =>3584 ALU FP32 / 896 ALU INT32 =>2048 ALU FP32 / 512 ALU INT32 =>1024 ALU FP32 / 256 ALU INT32 =>1024 ALU FP32 / 256 ALU INT32
=>64 ALU FP64 =>56 ALU FP64 =>32 ALU FP64 =>16 ALU FP64 =>16 ALU FP64
=>32 ALU SFU =>28 ALU SFU =>16 ALU SFU =>8 ALU SFU =>8 ALU SFU
=>32 Matrix ALU =>28 Matrix ALU =>16 Matrix ALU =>8 Matrix ALU =>8 Matrix ALU
=>TMU: 256 =>TMU: 224 =>TMU: 128 =>TMU: 64 =>TMU: 64
=>ROP: 256 =>ROP: 224 =>ROP: 128 =>ROP: 64 =>ROP: 64
=>Cache L0: ?/MP =>Cache L0: ?/MP =>Cache L0: ?/MP =>Cache L0: ?/MP =>Cache L0: ?/MP
=>Cache L1: ? =>Cache L1: ? =>Cache L1: ? =>Cache L1: ? =>Cache L1: ?
=>Cache L2: 4 Mo =>Cache L2: 3,5 Mo =>Cache L2: ? =>Cache L2: ? =>Cache L2: ?
=>Clock: 1800 MHz =>Clock: 1600 MHz =>Clock: 1200 MHz =>Clock: 1300 MHz =>Clock: 1000 MHz
=>FP32: 14.74 TFLOPs =>FP32: 11,46 TFLOPs =>FP32: 4,91 TFLOPs =>FP32: 2,66 TFLOPs =>FP32: 2,04 TFLOPs
=>FP16: 29.49 TFLOPs (2:1) =>FP16: 22,93 TFLOPs (2:1) =>FP16: 9,83 TFLOPs (2:1) =>FP16: 5,32 TFLOPs (2:1) =>FP16: 4,09 TFLOPs (2:1)
=>FP64: 230.4 GFLOPs (1:64) =>FP64: 179,2 GFLOPs (1:64) =>FP64: 77,8 GFLOPs (1:64) =>FP64: 41,6 GFLOPs (1:64) =>FP64: 32,0 GFLOPs (1:64)
=>VRAM: 16GB G-DDR6 =>VRAM: 7GB G-DDR6 =>VRAM: 8GB G-DDR6 =>VRAM: 4GB G-DDR6 =>VRAM: 2GB G-DDR6
=>Clock VRAM: 14 GBps (1750 MHz) =>Clock VRAM: 14 GBps (1750 MHz) =>Clock VRAM: ? (? MHz) =>Clock VRAM: ? (? MHz) =>Clock VRAM: ? (? MHz)
=>Bus: 256-bits =>Bus: 224-bits =>Bus: 256-bits =>Bus: 128-bits =>Bus: 64-bits
=>Bandwidth: 448 GB/s =>Bandwidth: 392 GB/s =>Bandwidth: ? =>Bandwidth: ? =>Bandwidth: ?
=>PCIe: 16x 5.0 =>PCIe: 16x 4.0 =>PCIe: 16x 3.0 =>PCIe: 8x 4.0 =>PCIe: 8x 4.0
=>TBP: 255 Watts (1x8-pins) =>TBP: 220 Watts (1x8-pins) =>TBP: 85 Watts (1x6-pins) =>TBP: 40 Watts =>TBP: 30 Watts
=>Outputs: 3xDP 1.4a + 1xHDMI 2.1 =>Outputs: 3xDP 1.4a + 1xHDMI 2.1 =>Outputs: 2xDP 1.4a + 1xHDMI 2.0 =>Outputs: 1xHDMI + 1xVGA =>Outputs: 1xHDMI + 1xVGA

@W1zzard

Still waiting for it to be added, will add once implemented

W1zzard · Feb 17, 2025

T4C Fantasy said:
Still waiting for it to be added, will add once implemented

Should be added now, please test

T4C Fantasy · Feb 20, 2025

I added a couple, need more info on formal naming, encode, decode support
Apis

Moore Threads MTT S70 Specs

Moore Threads Chunxiao, 1600 MHz, 3584 Cores, 224 TMUs, 224 ROPs, 7168 MB GDDR6, 1750 MHz, 224 bit

www.techpowerup.com

TRINITAS · Feb 22, 2025

I'll take this opportunity to add more detailed information about GPU architectures: SIMD organization

=>NVIDIA Blackwell: 4xSIMD32 (FP32/INT32) + 4xSIMD4 (SFU) + 4xMatrix ALU + 2xALU FP64 / SM
=>NVIDIA Ada: 4xSIMD16 (FP32) + 4xSIMD16 (FP32/INT32) + 4xSIMD4 (SFU) + 4xMatrix ALU + 2xALU FP64 / SM
=>NVIDIA Hopper: 4xSIMD32 (FP32) + 4xSIMD16 (INT32) + 4xSIMD16 (FP64) + 4xSIMD4 (SFU) + 4xMatrix ALU / SM
=>NVIDIA Ampère GA10x: 4xSIMD16 (FP32) + 4xSIMD16 (FP32/INT32) + 4xSIMD4 (SFU) + 4xMatrix ALU + 2xALU FP64 / SM
=>NVIDIA Ampère GA100: 4xSIMD16 (FP32) + 4xSIMD16 (FP32/INT32) + 4xSIMD8 (FP64) + 4xSIMD4 (SFU) + 4xMatrix ALU / SM
=>NVIDIA Turing: 4xSIMD16 (FP32) + 4xSIMD16 (INT32) + 4xSIMD4 (SFU) + 4xMatrix ALU + 2xALU FP64 / SM
=>NVIDIA Volta: 4xSIMD16 (FP32) + 4xSIMD16 (INT32) + 4xSIMD8 (FP64) + 4xSIMD4 (SFU) + 4xMatrix ALU / SM
=>NVIDIA Pascal GP100: 4xSIMD16 (FP32/INT32) + 4xSIMD8 (FP64) + 4xSIMD8 (SFU) / SM
=>NVIDIA Pascal GP10x: 4xSIMD32 (FP32/INT32) + 4xSIMD8 (SFU) + 4xALU FP64 / SM
=>NVIDIA Maxwell: 4xSIMD32 (FP32/INT32) + 4xSIMD8 (SFU) + 4xALU FP64 / SM
=>NVIDIA Kepler: 6xSIMD32 (FP32/INT32) + 4xSIMD8 (SFU) + 4xALU FP64 / SM

=>AMD RDNA3: 2xSIMD32 (FP32/INT32/WMMA) + 2xSIMD32 (FP32/WMMA) + 2xSIMD8 (SFU) + 2xALU FP64 / CU
=>AMD RDNA2/1: 2xSIMD32 (FP32/INT32) + 2xSIMD8 (SFU) + 2xALU FP64 / CU
=>AMD CDNA: 4xSIMD16 (FP32/INT32) + 4xSIMD8 (SFU) + 4xMatrix ALU / CU
=>AMD GCN: 4xSIMD16 (FP32/INT32) + 4xSIMD8 (SFU) / CU

=>INTEL Xe2-Battlemage: 8xSIMD16 (FP32) + 8xSIMD16 (INT32) + 8xSIMD4 (SFU) + 8xSIMD2 (FP64) / Xe
=>INTEL Xe-Alchemist: 16xSIMD8 (FP32) + 16xSIMD8 (INT32) + 16xSIMD2 (SFU) / Xe

=>MTT MUSA: 1xSIMD128 (FP32) + 1xSIMD32 (INT32) + 1xALU SFU + 1xMatrix ALU + 1x2 ALU FP64 / MP

Sweepi · Feb 24, 2025

Maybe this is a better place for feedback for the GPU Database:

Sweepi said:
Could not find a newer feedback thread for the GPU database, and wasn't in the mood the send an eMail:

For Navi III GPUs (e.g. 7800XT), FP32 Performance is[1] calculated by the following formula:
FP32 = Shading Units * Boost Clock * 4
Examples:
7900 XTX: 6144 * 2498 * 4 = 61390848 -> 61.39 TFLOPS
7800 XT: 3840 * 2430 * 4 = 37324800 -> 37.32 TFLOPS

However, for the preliminary Navi IV entries (e.g. 9070XT), FP32 Performance is calculated by the following formula:
FP32 = Shading Units * Boost Clock * 2
9070 XT: 4096 * 2970 * 2 = 24330240 -> 24.33 TFLOPS

This seem like an error to me, or is this intended?

[1] just a guess, but the math checks out

Question: If I were to provide values for a 'Theoretical Tensor Performance' section (FP4/8/16, BF16, INT4/8), would that be of interest?
I have collected them here: https://ethercalc.net/ih5riaqsy7i1 . Please let me know if a different format or additional graphics cards (such as AMD or Quadro) would be more useful.

Question2: Should there be anything explaining the max FP32 to max INT32 (non-Tensor) performance ratio, like it is already done for FP64?
A table is worth 1000 words:
[Unfortunately German:
"reine" -> "pure/mere"
"Einheiten pro" -> "Units/Cores per"]

Maybe "SIMD organization" by @TRINITAS already covers Question2.

TRINITAS · Feb 24, 2025

full info on RTX GPUs (I don't have info on Blackwell INT vector calculations yet

agent_x007 · Feb 24, 2025

@TRINITAS How Volta (full GV100), fits into all of this ?

TRINITAS · Feb 25, 2025

agent_x007 said:
@TRINITAS How Volta (full GV100), fits into all of this ?

	BLACKWELL (PRO)	BLACKWELL (RTX)	HOPPER	ADA	AMPERE (PRO)	AMPERE (RTX)	TURING (RTX)	VOLTA
Chipset exemple	GB100	GB202	GH100	AD102	GA100	GA102	TU102	GV100
Partitions	?	12 GPCs	8 GPCs	12 GPCs	8 GPCs	7 GPCs	6 GPCs	6 GPCs
Clusters	?	96 TPCs	72 TPCs	72 TPCs	64 TPCs	42 TPCs	36 TPCs	42 TPCs
Cores	?	192 SM	144 SM	144 SM	128 SM	84 SM	72 SM	84 SM
SIMD	?	4xSIMD32 (FP32/INT32) + 4xSIMD4 (SFU) + 2xFP64	4xSIMD32 (FP32) + 4xSIMD16 (INT32) + 4xSIMD4 (SFU) + 4xSIMD16 (FP64)	4xSIMD16 (FP32) + 4xSIMD16 (FP32/INT32) + 4xSIMD4 (SFU) + 2xFP64	4xSIMD16 (FP32) + 4xSIMD16 (INT32) + 4xSIMD4 (SFU) + 4xSIMD8 (FP64)	4xSIMD16 (FP32) + 4xSIMD16 (FP32/INT32) + 4xSIMD4 (SFU) + 2xFP64	4xSIMD16 (FP32) + 4xSIMD16 (INT32) + 4xSIMD4 (SFU) + 2xFP64	4xSIMD16 (FP32) + 4xSIMD16 (INT32) + 4xSIMD4 (SFU) + 4xSIMD8 (FP64)
Max ALU Vector	?	28032 (24576 FP32/INT32 + 3072 SFU + 384 FP64	39168 (18432 FP32 + 9216 INT32 + 2304 SFU + 9216 FP64	21024 (9216 FP32 + 9216 FP32/INT32 + 2304 SFU + 288 FP64	22528 (8192 FP32 + 8192 INT32 + 2048 SFU + 4096 FP64	12264 (5376 FP32 + 5376 FP32/INT32 + 1344 SFU + 168 FP64	10512 (4608 FP32 + 4608 INT32 + 1152 SFU + 144 FP64	14784 (5376 FP32 + 5376 INT32 + 1344 SFU + 2688 FP64
Matrix ALU	?	768 Gen5	576 Gen4	576 Gen4	512 Gen3	336 Gen3	576 Gen2	672 Gen1
RTU	-	192 Gen4	-	144 Gen3	-	84 Gen2	72 Gen1	-
Scalar ALU	?	768 (4/SM)	576 (4/SM)	576 (4/SM)	512 (4/SM)	336 (4/SM)	288 (4/SM)	336 (4/SM)
Raster Engine	?	12	8	12	8	7	6	6
Tesselator	?	96	72	72	64	42	36	84
TMU	?	768	576	576	512	366	288	336
ROP	?	192	24	192	192	112	96	128
Clock max	?	2407 MHz	1980 MHz	2520 MHz	1440 MHz	1860 MHz	1770 MHz	1627 MHz
INT8 Vector	?	473,23 TOPs		185,79 TOPs	94,37 TOPs	79,99 TOPs	65,25 TOPs	69,97 TOPs
INT16 Vector	?	?		?	?	?	?	?
INT24 Vector	?	?		46,48 TOPs	23,59 TOPs	19,99 TOPs	16,31 TOPs	17,49 TOPs
INT32 Vector	?	?		46,48 TOPs	23,59 TOPs	19,99 TOPs	16,31 TOPs	17,49 TOPs
INT64 Vector	?	?		11,61 TOPs	5,89 TOPs	4,99 TOPs	4,08 TOPs	4,37 TOPs
BF16 Vector	?	118,30 TFLOPs	145,98 TFLOPs	92,89 TFLOPs or 46,48 TFLOPs	47,18 TFLOPs	39,99 TFLOPs or 19,99 TFLOPs	-	-
FP16 Vector	?	118,30 TFLOPs	145,98 TFLOPs	92,89 TFLOPs or 46,48 TFLOPs	94,37 TFLOPs	39,99 TFLOPs or 19,99 TFLOPs	32,62 TFLOPs	34,98 TFLOPs
FP32 Vector	?	118,30 TFLOPs	72,99 TFLOPs	92,89 TFLOPs or 46,48 TFLOPs	23,59 TFLOPs	39,99 TFLOPs or 19,99 TFLOPs	16,31 TFLOPs	17,49 TFLOPs
FP64 Vector	?	1,85 TFLOPs	36,49 TFLOPs	1,45 TFLOPs	11,79 TFLOPs	624,9 GFLOPs	509,76 GFLOPs	8,74 TFLOPs
Tracendental Vector	?	14,79 TFLOPs	9,12 TFLOPs	11,61 TFLOPs	5,89 TFLOPs	4,99 TFLOPs	4,08 TFLOPs	4,37 TFLOPs
INT4 Matrix (Sparsity)	?	-	-	1486,35 TOPs (2972,71 TOPs)	1509,94 TOPs (3019,89 TOPs)	639,95 TOPs (1279,91 TOPs)	521,99 TOPs	-
INT8 Matrix (Sparsity)	?	946,47 TOPs (1892,94 TOPs)	-	743,17 TOPs (1486,35 TOPs)	754,97 TOPs (1509,94 TOPs)	319,97 TOPs (639,95 TOPs)	260,99 TOPs	-
FP4 wFP32 accumulate Matrix (Sparsity)	?	1892,94 TFLOPs (3785,88 TFLOPs)	-	-	-	-	-	-
FP8 wFP16 accumulate Matrix (Sparsity)	?	946,47 TFLOPs (1892,94 TFLOPs)	1751,77 TFLOPs (3503,55 TFLOPs)	743,17 TFLOPs (1486,35 TFLOPs)	-	-	-	-
FP8 wFP32 accumulate Matrix (Sparsity)	?	473,23 TFLOPs (946,47 TFLOPs)	1751,77 TFLOPs (3503,55 TFLOPs)	743,17 TFLOPs (1486,35 TFLOPs)	-	-	-	-
FP16 wFP16 accumulate Matrix (Sparsity)	?	473,23 TFLOPs (946,47 TFLOPs)	875,88 TFLOPs (1751,77 TFLOPs)	371,58 TFLOPs (743,17 TFLOPs)	377,48 TFLOPs (754,97 TFLOPs)	159,98 TFLOPs (319,97 TFLOPs)	130,49 TFLOPs	-
FP16 wFP32 accumulate Matrix (Sparsity)	?	236,62 TFLOPs (473,23 TFLOPs)	875,88 TFLOPs (1751,77 TFLOPs)	185,79 TFLOPs (371,58 TFLOPs)	377,48 TFLOPs (754,97 TFLOPs)	79,99 TFLOPs (159,98 TFLOPs)	130,49 TFLOPs	139,94 TFLOPs
BF16 wFP32 accumulate Matrix (Sparsity)	?	236,62 TFLOPs (473,23 TFLOPs)	875,88 TFLOPs (1751,77 TFLOPs)	185,79 TFLOPs (371,58 TFLOPs)	377,48 TFLOPs (754,97 TFLOPs)	79,99 TFLOPs (159,98 TFLOPs)	-	-
TF32 Matrix (Sparsity)	?	118,30 TFLOPs (236,62 TFLOPs)	437,94 TFLOPs (875,88 TFLOPs)	92,89 TFLOPs (185,79 TFLOPs)	188,74 TFLOPs (377,48 TFLOPs)	39,99 TFLOPs (79,99 TFLOPs)	-	-
FP64 Matrix	?		72,99 TFLOPs	-	23,59 TFLOPs	-	-	-

Here

I add Volta, Ampére PRO and Hopper. For Blackwell B100, nothing information.

T4C Fantasy · Feb 25, 2025

I love the tables, i have data tables with partial data in the first post under graphics ip, love the information feel free to add on to it.

Nvidia Graphics IP

Graphics ChipDirectXShader ModelWDDMOpenGLOpenCLVulkanCUDANVENCNVDECPureVideoVDPAUHDMIDisplayPortFP16FP64 NV11.0N/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/A NV35.0N/AN/A1.0N/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/A NV45.0N/AN/A1.2N/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/A...

www.techpowerup.com

agent_x007 said:
@TRINITAS How Volta (full GV100), fits into all of this ?

i need to revamp the L2 cache stat in the chip database and gpu database doe ADA and Blackwell, can you help me out?

TRINITAS · Feb 26, 2025

T4C Fantasy said:
I love the tables, i have data tables with partial data in the first post under graphics ip, love the information feel free to add on to it.

Nvidia Graphics IP

Graphics ChipDirectXShader ModelWDDMOpenGLOpenCLVulkanCUDANVENCNVDECPureVideoVDPAUHDMIDisplayPortFP16FP64 NV11.0N/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/A NV35.0N/AN/A1.0N/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/A NV45.0N/AN/A1.2N/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/A...

www.techpowerup.com

i need to revamp the L2 cache stat in the chip database and gpu database doe ADA and Blackwell, can you help me out?

For Ada: 96 Mo for AD102, 64 Mo for AD103, 48 Mo for AD104, 32 Mo for AD106/107
For Blackwell RTX: 128 Mo for GB202, 64 Mo for GB203, 48 Mo for GB205, 32 Mo for GB206/207
For Blackwell GB100: No information---

agent_x007 · Feb 26, 2025

TRINITAS said:
Here

I add Volta, Ampére PRO and Hopper. For Blackwell B100, nothing information.

Thank you !

Q : Shouldn't Tesselator count be linked to TPU count (42) instead of SMs (84) on Volta ?

TRINITAS · Feb 26, 2025

agent_x007 said:
Thank you !

Q : Shouldn't Tesselator count be linked to TPU count (42) instead of SMs (84) on Volta ?

Oh yes, srry
Its 42 indeed

	CDNA4	CDNA3	CDNA2	CDNA
Chipset exemple	?	AQUA VANJARAN	ALDEBARAN	ARCTURUS
Partitions	?	32 Shaders Engine	8 Shaders Engine	4 Shaders Engine
Clusters	?	-	-	-
Cores	?	320 CU	240 CU	128 CU
SIMD	?	4xSIMD16 (FP32/INT32) + 4xSIMD4 (SFU)	4xSIMD16 (FP32/INT32) + 4xSIMD4 (SFU)	4xSIMD16 (FP32/INT32) + 4xSIMD4 (SFU)
Max ALU Vector	?	25600 (20480 FP32/INT32 + 5120 SFU)	19200 (15360 FP32/INT32 + 3840 SFU)	10240 (8192 FP32/INT32 + 2048 SFU)
Matrix ALU	?	1280 Gen3	960 Gen2	512 Gen1
RTU	?	-	-	-
Scalar ALU	?	320 (1/CU)	240 (1/CU)	128 (1/CU)
Raster Engine	?	-	-	-
Tesselator	?	-	-	-
TMU	?	-	-	-
ROP	?	-	-	-
Clock max	?	2100 MHz	1700 MHz	1500 MHz
INT4 Vector	?	344,06 TOPs	208,89 TOPs	98,30 TOPs
INT8 Vector	?	172,03 TOPs	104,44 TOPs	49,15 TOPs
INT16 Vector	?	172,03 TOPs	104,44 TOPs	49,15 TOPs
INT24 Vector	?	86,01 TOPs	52,22 TOPs	24,57 TOPs
INT32 Vector	?	86,01 TOPs	52,22 TOPs	24,57 TOPs
INT64 Vector	?	21,50 TOPs	13,05 TOPs	6,14 TOPs
BF16 Vector	?	-	-	-
FP16 Vector (With Packed Math)	?	344,06 TFLOPs	104,44 TFLOPs (208,89 TFLOPs)	49,15 TFLOPs
FP32 Vector (With Packed Math)	?	172,03 TFLOPs	52,22 TFLOPs (104,44 TFLOPs)	24,57 TFLOPs
FP64 Vector	?	86,01 TFLOPs	52,22 TFLOPs	12,28 TFLOPs
Tracendental Vector	?	21,50 TFLOPs	13,05 TFLOPs	6,14 TFLOPs
INT4 Matrix (Sparsity)	?	-	-	-
INT8 Matrix (Sparsity)	?	2752,51 TOPs (5505,02 TOPs)	417,79 TOPs	-
FP4 wFP32 accumulate Matrix (Sparsity)	?	-	-	-
FP8 wFP16 accumulate Matrix (Sparsity)	?	2752,51 TFLOPs (5505,02 TFLOPs)	-	-
FP8 wFP32 accumulate Matrix (Sparsity)	?	2752,51 TFLOPs (5505,02 TFLOPs)	-	-
FP16 wFP16 accumulate Matrix (Sparsity)	?	1376,25 TFLOPs (2752,51 TFLOPs)	417,79 TFLOPs	196,60 TFLOPs
FP16 wFP32 accumulate Matrix (Sparsity)	?	1376,25 TFLOPs (2752,51 TFLOPs)	417,79 TFLOPs	196,60 TFLOPs
BF16 wFP32 accumulate Matrix (Sparsity)	?	1376,25 TFLOPs (2752,51 TFLOPs)	417,79 TFLOPs	98,30 TFLOPs
FP32 Matrix (Sparsity)	?	172,03 TFLOPs	104,44 TFLOPs	49,15 TFLOPs
TF32 Matrix (Sparsity)	?	688,12 TFLOPs (1376,25 TFLOPs)	-	-
FP64 Matrix	?	172,03 TFLOPs	104,44 TFLOPs	-

For AMD Instinct CDNA

x4it3n · Feb 27, 2025

TRINITAS said:
96 Mo for AD102
128 Mo for GB202

It's crazy how NVIDIA feel the need to shrink L2 Cache size on all the new x90 variants! Those GPUs cost a fortune (even at MSRPs) ! And they cheap out everywhere they can... L2 Cache, GDDR6X/7 speeds (never the best ones), lower amount of shunt resistors compared to 3090 Ti, etc.

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

System Name	Game computer
Processor	AMD RyZen 7 5800X3D 4.35GHZ
Motherboard	ASRock X470 Taichi
Cooling	be quiet! Pure Rock 2 Black
Memory	32768 Mo DDR4-3200 G-Skill CL16
Video Card(s)	AMD Radeon RX 7900 GRE (x2)
Storage	SSD Samsung 970 EVO M2 250 Go, Samsung 970 EVO M2 500 Go, Samsung 850 EVO SATA 500 Go, Toshiba 4 To
Display(s)	AOC 24' 1440p 144 Hz DisplayPort + ACER KG251Q 24' 1080p 144 Hz DisplayPort
Case	NZXT Phantom Black
Audio Device(s)	Corsair Gaming VOID Pro RGB Wireless Special Edition
Power Supply	BeQuiet Straight Power 11 1000W
Mouse	Roccat Kone XTD
Keyboard	BTC USB
Software	Windows 11 24H2 Pro x64

System Name	Whaaaat Kiiiiiiid!
Processor	Intel Core i9-14900K @ Default
Motherboard	Gigabyte Z690 AORUS Elite AX DDR4
Cooling	Corsair H150i AIO Cooler
Memory	Corsair Dominator Platinum 128GB DDR4-3200
Video Card(s)	EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage	Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s)	27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case	Thermaltake Core X9 Snow
Audio Device(s)	Logitech G PRO X 2 Lightspeed
Power Supply	SeaSonic Platinum 1050W Snow Silent
Mouse	Logitech G903 Lightspeed
Keyboard	Logitech G915 X Lightspeed
Software	Windows 11 Pro
Benchmark Scores	FFXV: 19329

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

System Name	Whaaaat Kiiiiiiid!
Processor	Intel Core i9-14900K @ Default
Motherboard	Gigabyte Z690 AORUS Elite AX DDR4
Cooling	Corsair H150i AIO Cooler
Memory	Corsair Dominator Platinum 128GB DDR4-3200
Video Card(s)	EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage	Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s)	27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case	Thermaltake Core X9 Snow
Audio Device(s)	Logitech G PRO X 2 Lightspeed
Power Supply	SeaSonic Platinum 1050W Snow Silent
Mouse	Logitech G903 Lightspeed
Keyboard	Logitech G915 X Lightspeed
Software	Windows 11 Pro
Benchmark Scores	FFXV: 19329

TPU's GPU Database Portal & Updates

W1zzard

Administrator

MSI GT 1030 Low Profile Passive OC DDR4 Specs

yzonker

TRINITAS

T4C Fantasy

CPU & GPU DB Maintainer

W1zzard

Administrator

T4C Fantasy

CPU & GPU DB Maintainer

Moore Threads MTT S70 Specs

TRINITAS

Sweepi

New Member

TRINITAS

agent_x007

TRINITAS

T4C Fantasy

CPU & GPU DB Maintainer

Nvidia Graphics IP

TRINITAS

Nvidia Graphics IP

agent_x007

TRINITAS

x4it3n

System Name	BOX
Processor	Core i7 6950X @ 4,26GHz (1,28V)
Motherboard	X99 SOC Champion (BIOS F23c + bifurcation mod)
Cooling	Thermalright Venomous-X + 2x Delta 38mm PWM (Push-Pull)
Memory	Patriot Viper Steel 4000MHz CL16 4x8GB (@3240MHz CL12.12.12.24 CR2T @ 1,48V)
Video Card(s)	Titan V (~1650MHz @ 0.77V, HBM2 1GHz, Forced P2 state [OFF])
Storage	WD SN850X 2TB + Samsung EVO 2TB (SATA) + Seagate Exos X20 20TB (4Kn mode)
Display(s)	LG 27GP950-B
Case	Fractal Design Meshify 2 XL
Audio Device(s)	Motu M4 (audio interface) + ATH-A900Z + Behringer C-1
Power Supply	Seasonic X-760 (760W)
Mouse	Logitech RX-250
Keyboard	HP KB-9970
Software	Windows 10 Pro x64

Processor	AMD Ryzen 7 9800X3D (+PBO 5.4GHz)
Motherboard	MSI MPG X870E Carbon Wifi
Cooling	ARCTIC Liquid Freezer II 280 A-RGB
Memory	2x32GB (64GB) G.Skill Trident Z Royal @ 6200MHz 1:1 (30-38-38-30)
Video Card(s)	MSI GeForce RTX 4090 SUPRIM Liquid X
Storage	Crucial T705 4TB (PCIe 5.0) w/ Heatsink + Samsung 990 PRO 2TB (PCIe 4.0) w/ Heatsink
Display(s)	AORUS FO32U2P 4K QD-OLED 240Hz (DP 2.1 UHBR20 80Gbps)
Case	CoolerMaster H500M (Mesh)
Audio Device(s)	AKG N90Q w/ AudioQuest DragonFly Red (USB DAC)
Power Supply	Seasonic Prime TX-1600 Noctua Edition (1600W 80Plus Titanium) ATX 3.1 & PCIe 5.1
Mouse	Logitech G PRO X SUPERLIGHT
Keyboard	Razer BlackWidow V3 Pro
Software	Windows 10 64-bit