• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

TPU's GPU Database Portal & Updates

Hello @eidairaman1
I would love to upload the vBIOS my new office graphics card.
It's a Nvidia GeForce GT 1030 4GHD4 LP OC with 4 GiB DDR4 as vRAM.

It's already in the GPU database, but vBIOS is not in the Video BIOS Collection.

Thanks
Upload it with GPU-Z, this submits some additional information that helps us categorize the BIOS correctly
 
So I guess 50 series bios extraction is still in work? No mention of it in the GPUZ release notes but doesn't seem to be working.
 
Hi,

I saw that you don't have the Chinese MTT video cards, I was able to gather the information concerning them :)

MTT S80:MTT S70:MTT S50:MTT S30:MTT S10:
=>Architecture: MTT MUSA-Chunxiao=>Architecture: MTT MUSA-Chunxiao=>Architecture: MTT MUSA-Chunxiao=>Architecture: MTT MUSA-Chunxiao=>Architecture: MTT MUSA-Chunxiao
=>Process Size: 12nm=>Process Size: 12nm=>Process Size: 12nm=>Process Size: 12nm=>Process Size: 12nm
=>Die size: ?=>Die size: ?=>Die size: ?=>Die size: ?=>Die size: ?
=>Transistors: ~22.000M=>Transistors: ~22.000M=>Transistors: ?=>Transistors: ?=>Transistors: ?
=>8 MPC (GPC NVIDIA equivalent)=>7 MPC (GPC NVIDIA equivalent)=>4 MPC (GPC NVIDIA equivalent)=>2 MPC (GPC NVIDIA equivalent)=>2 MPC (GPC NVIDIA equivalent)
=>16 MPX (TPC NVIDIA equivalent)=>14 MPX (TPC NVIDIA equivalent)=>8 MPX (TPC NVIDIA equivalent)=>4 MPX (TPC NVIDIA equivalent)=>4 MPX (TPC NVIDIA equivalent)
=>32 MP (SM/CU equivalent)=>28 MP (SM/CU equivalent)=>16 MP (SM/CU equivalent)=>8 MP (SM/CU equivalent)=>8 MP (SM/CU equivalent)
=>4096 ALU FP32 / 1024 ALU INT32=>3584 ALU FP32 / 896 ALU INT32=>2048 ALU FP32 / 512 ALU INT32=>1024 ALU FP32 / 256 ALU INT32=>1024 ALU FP32 / 256 ALU INT32
=>64 ALU FP64=>56 ALU FP64=>32 ALU FP64=>16 ALU FP64=>16 ALU FP64
=>32 ALU SFU=>28 ALU SFU=>16 ALU SFU=>8 ALU SFU=>8 ALU SFU
=>32 Matrix ALU=>28 Matrix ALU=>16 Matrix ALU=>8 Matrix ALU=>8 Matrix ALU
=>TMU: 256=>TMU: 224=>TMU: 128=>TMU: 64=>TMU: 64
=>ROP: 256=>ROP: 224=>ROP: 128=>ROP: 64=>ROP: 64
=>Cache L0: ?/MP=>Cache L0: ?/MP=>Cache L0: ?/MP=>Cache L0: ?/MP=>Cache L0: ?/MP
=>Cache L1: ?=>Cache L1: ?=>Cache L1: ?=>Cache L1: ?=>Cache L1: ?
=>Cache L2: 4 Mo=>Cache L2: 3,5 Mo=>Cache L2: ?=>Cache L2: ?=>Cache L2: ?
=>Clock: 1800 MHz=>Clock: 1600 MHz=>Clock: 1200 MHz=>Clock: 1300 MHz=>Clock: 1000 MHz
=>FP32: 14.74 TFLOPs=>FP32: 11,46 TFLOPs=>FP32: 4,91 TFLOPs=>FP32: 2,66 TFLOPs=>FP32: 2,04 TFLOPs
=>FP16: 29.49 TFLOPs (2:1)=>FP16: 22,93 TFLOPs (2:1)=>FP16: 9,83 TFLOPs (2:1)=>FP16: 5,32 TFLOPs (2:1)=>FP16: 4,09 TFLOPs (2:1)
=>FP64: 230.4 GFLOPs (1:64)=>FP64: 179,2 GFLOPs (1:64)=>FP64: 77,8 GFLOPs (1:64)=>FP64: 41,6 GFLOPs (1:64)=>FP64: 32,0 GFLOPs (1:64)
=>VRAM: 16GB G-DDR6=>VRAM: 7GB G-DDR6=>VRAM: 8GB G-DDR6=>VRAM: 4GB G-DDR6=>VRAM: 2GB G-DDR6
=>Clock VRAM: 14 GBps (1750 MHz)=>Clock VRAM: 14 GBps (1750 MHz)=>Clock VRAM: ? (? MHz)=>Clock VRAM: ? (? MHz)=>Clock VRAM: ? (? MHz)
=>Bus: 256-bits=>Bus: 224-bits=>Bus: 256-bits=>Bus: 128-bits=>Bus: 64-bits
=>Bandwidth: 448 GB/s=>Bandwidth: 392 GB/s=>Bandwidth: ?=>Bandwidth: ?=>Bandwidth: ?
=>PCIe: 16x 5.0=>PCIe: 16x 4.0=>PCIe: 16x 3.0=>PCIe: 8x 4.0=>PCIe: 8x 4.0
=>TBP: 255 Watts (1x8-pins)=>TBP: 220 Watts (1x8-pins)=>TBP: 85 Watts (1x6-pins)=>TBP: 40 Watts=>TBP: 30 Watts
=>Outputs: 3xDP 1.4a + 1xHDMI 2.1=>Outputs: 3xDP 1.4a + 1xHDMI 2.1=>Outputs: 2xDP 1.4a + 1xHDMI 2.0=>Outputs: 1xHDMI + 1xVGA=>Outputs: 1xHDMI + 1xVGA
 
Hi,

I saw that you don't have the Chinese MTT video cards, I was able to gather the information concerning them :)

MTT S80:MTT S70:MTT S50:MTT S30:MTT S10:
=>Architecture: MTT MUSA-Chunxiao=>Architecture: MTT MUSA-Chunxiao=>Architecture: MTT MUSA-Chunxiao=>Architecture: MTT MUSA-Chunxiao=>Architecture: MTT MUSA-Chunxiao
=>Process Size: 12nm=>Process Size: 12nm=>Process Size: 12nm=>Process Size: 12nm=>Process Size: 12nm
=>Die size: ?=>Die size: ?=>Die size: ?=>Die size: ?=>Die size: ?
=>Transistors: ~22.000M=>Transistors: ~22.000M=>Transistors: ?=>Transistors: ?=>Transistors: ?
=>8 MPC (GPC NVIDIA equivalent)=>7 MPC (GPC NVIDIA equivalent)=>4 MPC (GPC NVIDIA equivalent)=>2 MPC (GPC NVIDIA equivalent)=>2 MPC (GPC NVIDIA equivalent)
=>16 MPX (TPC NVIDIA equivalent)=>14 MPX (TPC NVIDIA equivalent)=>8 MPX (TPC NVIDIA equivalent)=>4 MPX (TPC NVIDIA equivalent)=>4 MPX (TPC NVIDIA equivalent)
=>32 MP (SM/CU equivalent)=>28 MP (SM/CU equivalent)=>16 MP (SM/CU equivalent)=>8 MP (SM/CU equivalent)=>8 MP (SM/CU equivalent)
=>4096 ALU FP32 / 1024 ALU INT32=>3584 ALU FP32 / 896 ALU INT32=>2048 ALU FP32 / 512 ALU INT32=>1024 ALU FP32 / 256 ALU INT32=>1024 ALU FP32 / 256 ALU INT32
=>64 ALU FP64=>56 ALU FP64=>32 ALU FP64=>16 ALU FP64=>16 ALU FP64
=>32 ALU SFU=>28 ALU SFU=>16 ALU SFU=>8 ALU SFU=>8 ALU SFU
=>32 Matrix ALU=>28 Matrix ALU=>16 Matrix ALU=>8 Matrix ALU=>8 Matrix ALU
=>TMU: 256=>TMU: 224=>TMU: 128=>TMU: 64=>TMU: 64
=>ROP: 256=>ROP: 224=>ROP: 128=>ROP: 64=>ROP: 64
=>Cache L0: ?/MP=>Cache L0: ?/MP=>Cache L0: ?/MP=>Cache L0: ?/MP=>Cache L0: ?/MP
=>Cache L1: ?=>Cache L1: ?=>Cache L1: ?=>Cache L1: ?=>Cache L1: ?
=>Cache L2: 4 Mo=>Cache L2: 3,5 Mo=>Cache L2: ?=>Cache L2: ?=>Cache L2: ?
=>Clock: 1800 MHz=>Clock: 1600 MHz=>Clock: 1200 MHz=>Clock: 1300 MHz=>Clock: 1000 MHz
=>FP32: 14.74 TFLOPs=>FP32: 11,46 TFLOPs=>FP32: 4,91 TFLOPs=>FP32: 2,66 TFLOPs=>FP32: 2,04 TFLOPs
=>FP16: 29.49 TFLOPs (2:1)=>FP16: 22,93 TFLOPs (2:1)=>FP16: 9,83 TFLOPs (2:1)=>FP16: 5,32 TFLOPs (2:1)=>FP16: 4,09 TFLOPs (2:1)
=>FP64: 230.4 GFLOPs (1:64)=>FP64: 179,2 GFLOPs (1:64)=>FP64: 77,8 GFLOPs (1:64)=>FP64: 41,6 GFLOPs (1:64)=>FP64: 32,0 GFLOPs (1:64)
=>VRAM: 16GB G-DDR6=>VRAM: 7GB G-DDR6=>VRAM: 8GB G-DDR6=>VRAM: 4GB G-DDR6=>VRAM: 2GB G-DDR6
=>Clock VRAM: 14 GBps (1750 MHz)=>Clock VRAM: 14 GBps (1750 MHz)=>Clock VRAM: ? (? MHz)=>Clock VRAM: ? (? MHz)=>Clock VRAM: ? (? MHz)
=>Bus: 256-bits=>Bus: 224-bits=>Bus: 256-bits=>Bus: 128-bits=>Bus: 64-bits
=>Bandwidth: 448 GB/s=>Bandwidth: 392 GB/s=>Bandwidth: ?=>Bandwidth: ?=>Bandwidth: ?
=>PCIe: 16x 5.0=>PCIe: 16x 4.0=>PCIe: 16x 3.0=>PCIe: 8x 4.0=>PCIe: 8x 4.0
=>TBP: 255 Watts (1x8-pins)=>TBP: 220 Watts (1x8-pins)=>TBP: 85 Watts (1x6-pins)=>TBP: 40 Watts=>TBP: 30 Watts
=>Outputs: 3xDP 1.4a + 1xHDMI 2.1=>Outputs: 3xDP 1.4a + 1xHDMI 2.1=>Outputs: 2xDP 1.4a + 1xHDMI 2.0=>Outputs: 1xHDMI + 1xVGA=>Outputs: 1xHDMI + 1xVGA
@W1zzard

Still waiting for it to be added, will add once implemented
 
I added a couple, need more info on formal naming, encode, decode support
Apis

 
I'll take this opportunity to add more detailed information about GPU architectures: SIMD organization :)

=>NVIDIA Blackwell: 4xSIMD32 (FP32/INT32) + 4xSIMD4 (SFU) + 4xMatrix ALU + 2xALU FP64 / SM
=>NVIDIA Ada: 4xSIMD16 (FP32) + 4xSIMD16 (FP32/INT32) + 4xSIMD4 (SFU) + 4xMatrix ALU + 2xALU FP64 / SM
=>NVIDIA Hopper: 4xSIMD32 (FP32) + 4xSIMD16 (INT32) + 4xSIMD16 (FP64) + 4xSIMD4 (SFU) + 4xMatrix ALU / SM
=>NVIDIA Ampère GA10x: 4xSIMD16 (FP32) + 4xSIMD16 (FP32/INT32) + 4xSIMD4 (SFU) + 4xMatrix ALU + 2xALU FP64 / SM
=>NVIDIA Ampère GA100: 4xSIMD16 (FP32) + 4xSIMD16 (FP32/INT32) + 4xSIMD8 (FP64) + 4xSIMD4 (SFU) + 4xMatrix ALU / SM
=>NVIDIA Turing: 4xSIMD16 (FP32) + 4xSIMD16 (INT32) + 4xSIMD4 (SFU) + 4xMatrix ALU + 2xALU FP64 / SM
=>NVIDIA Volta: 4xSIMD16 (FP32) + 4xSIMD16 (INT32) + 4xSIMD8 (FP64) + 4xSIMD4 (SFU) + 4xMatrix ALU / SM
=>NVIDIA Pascal GP100: 4xSIMD16 (FP32/INT32) + 4xSIMD8 (FP64) + 4xSIMD8 (SFU) / SM
=>NVIDIA Pascal GP10x: 4xSIMD32 (FP32/INT32) + 4xSIMD8 (SFU) + 4xALU FP64 / SM
=>NVIDIA Maxwell: 4xSIMD32 (FP32/INT32) + 4xSIMD8 (SFU) + 4xALU FP64 / SM
=>NVIDIA Kepler: 6xSIMD32 (FP32/INT32) + 4xSIMD8 (SFU) + 4xALU FP64 / SM


=>AMD RDNA3: 2xSIMD32 (FP32/INT32/WMMA) + 2xSIMD32 (FP32/WMMA) + 2xSIMD8 (SFU) + 2xALU FP64 / CU
=>AMD RDNA2/1: 2xSIMD32 (FP32/INT32) + 2xSIMD8 (SFU) + 2xALU FP64 / CU
=>AMD CDNA: 4xSIMD16 (FP32/INT32) + 4xSIMD8 (SFU) + 4xMatrix ALU / CU
=>AMD GCN: 4xSIMD16 (FP32/INT32) + 4xSIMD8 (SFU) / CU


=>INTEL Xe2-Battlemage: 8xSIMD16 (FP32) + 8xSIMD16 (INT32) + 8xSIMD4 (SFU) + 8xSIMD2 (FP64) / Xe
=>INTEL Xe-Alchemist: 16xSIMD8 (FP32) + 16xSIMD8 (INT32) + 16xSIMD2 (SFU) / Xe

=>MTT MUSA: 1xSIMD128 (FP32) + 1xSIMD32 (INT32) + 1xALU SFU + 1xMatrix ALU + 1x2 ALU FP64 / MP
 
Maybe this is a better place for feedback for the GPU Database:

Could not find a newer feedback thread for the GPU database, and wasn't in the mood the send an eMail:

For Navi III GPUs (e.g. 7800XT), FP32 Performance is[1] calculated by the following formula:
FP32 = Shading Units * Boost Clock * 4
Examples:
7900 XTX: 6144 * 2498 * 4 = 61390848 -> 61.39 TFLOPS
7800 XT: 3840 * 2430 * 4 = 37324800 -> 37.32 TFLOPS

However, for the preliminary Navi IV entries (e.g. 9070XT), FP32 Performance is calculated by the following formula:
FP32 = Shading Units * Boost Clock * 2
9070 XT: 4096 * 2970 * 2 = 24330240 -> 24.33 TFLOPS

This seem like an error to me, or is this intended?


[1] just a guess, but the math checks out

Question: If I were to provide values for a 'Theoretical Tensor Performance' section (FP4/8/16, BF16, INT4/8), would that be of interest?
I have collected them here: https://ethercalc.net/ih5riaqsy7i1 . Please let me know if a different format or additional graphics cards (such as AMD or Quadro) would be more useful.

Question2: Should there be anything explaining the max FP32 to max INT32 (non-Tensor) performance ratio, like it is already done for FP64?
A table is worth 1000 words:
[Unfortunately German:
"reine" -> "pure/mere"
"Einheiten pro" -> "Units/Cores per"]
1740383421506.png


Maybe "SIMD organization" by @TRINITAS already covers Question2.
 
Last edited:
1740427527490.png


full info on RTX GPUs (I don't have info on Blackwell INT vector calculations yet :)
 
@TRINITAS How Volta (full GV100), fits into all of this ?
 
Last edited:
@TRINITAS How Volta (full GV100), fits into all of this ?
BLACKWELL (PRO)BLACKWELL (RTX)HOPPERADAAMPERE (PRO)AMPERE (RTX)TURING (RTX)VOLTA
Chipset exempleGB100GB202GH100AD102GA100GA102TU102GV100
Partitions?12 GPCs8 GPCs12 GPCs8 GPCs7 GPCs6 GPCs6 GPCs
Clusters?96 TPCs72 TPCs72 TPCs64 TPCs42 TPCs36 TPCs42 TPCs
Cores?192 SM144 SM144 SM128 SM84 SM72 SM84 SM
SIMD?4xSIMD32 (FP32/INT32)
+
4xSIMD4 (SFU)
+
2xFP64
4xSIMD32 (FP32)
+
4xSIMD16 (INT32)
+
4xSIMD4 (SFU)
+
4xSIMD16 (FP64)
4xSIMD16 (FP32)
+
4xSIMD16 (FP32/INT32)
+
4xSIMD4 (SFU)
+
2xFP64
4xSIMD16 (FP32)
+
4xSIMD16 (INT32)
+
4xSIMD4 (SFU)
+
4xSIMD8 (FP64)
4xSIMD16 (FP32)
+
4xSIMD16 (FP32/INT32)
+
4xSIMD4 (SFU)
+
2xFP64
4xSIMD16 (FP32)
+
4xSIMD16 (INT32)
+
4xSIMD4 (SFU)
+
2xFP64
4xSIMD16 (FP32)
+
4xSIMD16 (INT32)
+
4xSIMD4 (SFU)
+
4xSIMD8 (FP64)
Max ALU Vector?28032
(24576 FP32/INT32 + 3072 SFU + 384 FP64
39168
(18432 FP32 + 9216 INT32 + 2304 SFU + 9216 FP64
21024
(9216 FP32 + 9216 FP32/INT32 + 2304 SFU + 288 FP64
22528
(8192 FP32 + 8192 INT32 + 2048 SFU + 4096 FP64
12264
(5376 FP32 + 5376 FP32/INT32 + 1344 SFU + 168 FP64
10512
(4608 FP32 + 4608 INT32 + 1152 SFU + 144 FP64
14784
(5376 FP32 + 5376 INT32 + 1344 SFU + 2688 FP64
Matrix ALU?768 Gen5576 Gen4576 Gen4512 Gen3336 Gen3576 Gen2672 Gen1
RTU-192 Gen4-144 Gen3-84 Gen272 Gen1-
Scalar ALU?768 (4/SM)576 (4/SM)576 (4/SM)512 (4/SM)336 (4/SM)288 (4/SM)336 (4/SM)
Raster Engine?128128766
Tesselator?96727264423684
TMU?768576576512366288336
ROP?1922419219211296128
Clock max?2407 MHz1980 MHz2520 MHz1440 MHz1860 MHz1770 MHz1627 MHz
INT8 Vector?473,23 TOPs185,79 TOPs94,37 TOPs79,99 TOPs65,25 TOPs69,97 TOPs
INT16 Vector???????
INT24 Vector??46,48 TOPs23,59 TOPs19,99 TOPs16,31 TOPs17,49 TOPs
INT32 Vector??46,48 TOPs23,59 TOPs19,99 TOPs16,31 TOPs17,49 TOPs
INT64 Vector??11,61 TOPs5,89 TOPs4,99 TOPs4,08 TOPs4,37 TOPs
BF16 Vector?118,30 TFLOPs145,98 TFLOPs92,89 TFLOPs or 46,48 TFLOPs47,18 TFLOPs39,99 TFLOPs or 19,99 TFLOPs--
FP16 Vector?118,30 TFLOPs145,98 TFLOPs92,89 TFLOPs or 46,48 TFLOPs94,37 TFLOPs39,99 TFLOPs or 19,99 TFLOPs32,62 TFLOPs34,98 TFLOPs
FP32 Vector?118,30 TFLOPs72,99 TFLOPs92,89 TFLOPs or 46,48 TFLOPs23,59 TFLOPs39,99 TFLOPs or 19,99 TFLOPs16,31 TFLOPs17,49 TFLOPs
FP64 Vector?1,85 TFLOPs36,49 TFLOPs1,45 TFLOPs11,79 TFLOPs624,9 GFLOPs509,76 GFLOPs8,74 TFLOPs
Tracendental Vector?14,79 TFLOPs9,12 TFLOPs11,61 TFLOPs5,89 TFLOPs4,99 TFLOPs4,08 TFLOPs4,37 TFLOPs
INT4 Matrix (Sparsity)?--1486,35 TOPs (2972,71 TOPs)1509,94 TOPs (3019,89 TOPs)639,95 TOPs (1279,91 TOPs)521,99 TOPs-
INT8 Matrix (Sparsity)?946,47 TOPs (1892,94 TOPs)-743,17 TOPs (1486,35 TOPs)754,97 TOPs (1509,94 TOPs)319,97 TOPs (639,95 TOPs)260,99 TOPs-
FP4 wFP32 accumulate Matrix (Sparsity)?1892,94 TFLOPs (3785,88 TFLOPs)------
FP8 wFP16 accumulate Matrix (Sparsity)?946,47 TFLOPs (1892,94 TFLOPs)1751,77 TFLOPs (3503,55 TFLOPs)743,17 TFLOPs (1486,35 TFLOPs)----
FP8 wFP32 accumulate Matrix (Sparsity)?473,23 TFLOPs (946,47 TFLOPs)1751,77 TFLOPs (3503,55 TFLOPs)743,17 TFLOPs (1486,35 TFLOPs)----
FP16 wFP16 accumulate Matrix (Sparsity)?473,23 TFLOPs (946,47 TFLOPs)875,88 TFLOPs (1751,77 TFLOPs)371,58 TFLOPs (743,17 TFLOPs)377,48 TFLOPs (754,97 TFLOPs)159,98 TFLOPs (319,97 TFLOPs)130,49 TFLOPs-
FP16 wFP32 accumulate Matrix (Sparsity)?236,62 TFLOPs (473,23 TFLOPs)875,88 TFLOPs (1751,77 TFLOPs)185,79 TFLOPs (371,58 TFLOPs)377,48 TFLOPs (754,97 TFLOPs)79,99 TFLOPs (159,98 TFLOPs)130,49 TFLOPs139,94 TFLOPs
BF16 wFP32 accumulate Matrix (Sparsity)?236,62 TFLOPs (473,23 TFLOPs)875,88 TFLOPs (1751,77 TFLOPs)185,79 TFLOPs (371,58 TFLOPs)377,48 TFLOPs (754,97 TFLOPs)79,99 TFLOPs (159,98 TFLOPs)--
TF32 Matrix (Sparsity)?118,30 TFLOPs (236,62 TFLOPs)437,94 TFLOPs (875,88 TFLOPs)92,89 TFLOPs (185,79 TFLOPs)188,74 TFLOPs (377,48 TFLOPs)39,99 TFLOPs (79,99 TFLOPs)--
FP64 Matrix?72,99 TFLOPs-23,59 TFLOPs---

Here :)

I add Volta, Ampére PRO and Hopper. For Blackwell B100, nothing information.
 
I love the tables, i have data tables with partial data in the first post under graphics ip, love the information feel free to add on to it.


@TRINITAS How Volta (full GV100), fits into all of this ?
i need to revamp the L2 cache stat in the chip database and gpu database doe ADA and Blackwell, can you help me out?
 
I love the tables, i have data tables with partial data in the first post under graphics ip, love the information feel free to add on to it.



i need to revamp the L2 cache stat in the chip database and gpu database doe ADA and Blackwell, can you help me out?
For Ada: 96 Mo for AD102, 64 Mo for AD103, 48 Mo for AD104, 32 Mo for AD106/107
For Blackwell RTX: 128 Mo for GB202, 64 Mo for GB203, 48 Mo for GB205, 32 Mo for GB206/207
For Blackwell GB100: No information---
 
Here :)

I add Volta, Ampére PRO and Hopper. For Blackwell B100, nothing information.
Thank you !

Q : Shouldn't Tesselator count be linked to TPU count (42) instead of SMs (84) on Volta ?
 
Thank you !

Q : Shouldn't Tesselator count be linked to TPU count (42) instead of SMs (84) on Volta ?
Oh yes, srry
Its 42 indeed :)

CDNA4CDNA3CDNA2CDNA
Chipset exemple?AQUA VANJARANALDEBARANARCTURUS
Partitions?32 Shaders Engine8 Shaders Engine4 Shaders Engine
Clusters?---
Cores?320 CU240 CU128 CU
SIMD?4xSIMD16 (FP32/INT32)
+
4xSIMD4 (SFU)
4xSIMD16 (FP32/INT32)
+
4xSIMD4 (SFU)
4xSIMD16 (FP32/INT32)
+
4xSIMD4 (SFU)
Max ALU Vector?25600
(20480 FP32/INT32 + 5120 SFU)
19200
(15360 FP32/INT32 + 3840 SFU)
10240
(8192 FP32/INT32 + 2048 SFU)
Matrix ALU?1280 Gen3960 Gen2512 Gen1
RTU?---
Scalar ALU?320 (1/CU)240 (1/CU)128 (1/CU)
Raster Engine?---
Tesselator?---
TMU?---
ROP?---
Clock max?2100 MHz1700 MHz1500 MHz
INT4 Vector?344,06 TOPs208,89 TOPs98,30 TOPs
INT8 Vector?172,03 TOPs104,44 TOPs49,15 TOPs
INT16 Vector?172,03 TOPs104,44 TOPs49,15 TOPs
INT24 Vector?86,01 TOPs52,22 TOPs24,57 TOPs
INT32 Vector?86,01 TOPs52,22 TOPs24,57 TOPs
INT64 Vector?21,50 TOPs13,05 TOPs6,14 TOPs
BF16 Vector?---
FP16 Vector (With Packed Math)?344,06 TFLOPs104,44 TFLOPs (208,89 TFLOPs)49,15 TFLOPs
FP32 Vector (With Packed Math)?172,03 TFLOPs52,22 TFLOPs (104,44 TFLOPs)24,57 TFLOPs
FP64 Vector?86,01 TFLOPs52,22 TFLOPs12,28 TFLOPs
Tracendental Vector?21,50 TFLOPs13,05 TFLOPs6,14 TFLOPs
INT4 Matrix (Sparsity)?---
INT8 Matrix (Sparsity)?2752,51 TOPs (5505,02 TOPs)417,79 TOPs-
FP4 wFP32 accumulate Matrix (Sparsity)?---
FP8 wFP16 accumulate Matrix (Sparsity)?2752,51 TFLOPs (5505,02 TFLOPs)--
FP8 wFP32 accumulate Matrix (Sparsity)?2752,51 TFLOPs (5505,02 TFLOPs)--
FP16 wFP16 accumulate Matrix (Sparsity)?1376,25 TFLOPs (2752,51 TFLOPs)417,79 TFLOPs196,60 TFLOPs
FP16 wFP32 accumulate Matrix (Sparsity)?1376,25 TFLOPs (2752,51 TFLOPs)417,79 TFLOPs196,60 TFLOPs
BF16 wFP32 accumulate Matrix (Sparsity)?1376,25 TFLOPs (2752,51 TFLOPs)417,79 TFLOPs98,30 TFLOPs
FP32 Matrix (Sparsity)?172,03 TFLOPs104,44 TFLOPs49,15 TFLOPs
TF32 Matrix (Sparsity)?688,12 TFLOPs (1376,25 TFLOPs)--
FP64 Matrix?172,03 TFLOPs104,44 TFLOPs-

For AMD Instinct CDNA :)
 
96 Mo for AD102
128 Mo for GB202

It's crazy how NVIDIA feel the need to shrink L2 Cache size on all the new x90 variants! Those GPUs cost a fortune (even at MSRPs) ! And they cheap out everywhere they can... L2 Cache, GDDR6X/7 speeds (never the best ones), lower amount of shunt resistors compared to 3090 Ti, etc.
 
Back
Top