• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA GA100 Scalar Processor Specs Sheet Released

T4C Fantasy

CPU & GPU DB Maintainer
Staff member
Joined
May 7, 2012
Messages
2,562 (0.59/day)
Location
Rhode Island
System Name Whaaaat Kiiiiiiid!
Processor Intel Core i9-12900K @ Default
Motherboard Gigabyte Z690 AORUS Elite AX
Cooling Corsair H150i AIO Cooler
Memory Corsair Dominator Platinum 32GB DDR4-3200
Video Card(s) EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s) 27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case Thermaltake Core X9 Snow
Audio Device(s) Logitech G935 Headset
Power Supply SeaSonic Platinum 1050W Snow Silent
Mouse Logitech G903 Lightspeed
Keyboard Logitech G915
Software Windows 11 Pro
Benchmark Scores FFXV: 19329
Joined
Nov 24, 2017
Messages
853 (0.37/day)
Location
Asia
Processor Intel Core i5 4590
Motherboard Gigabyte Z97x Gaming 3
Cooling Intel Stock Cooler
Memory 8GiB(2x4GiB) DDR3-1600 [800MHz]
Video Card(s) XFX RX 560D 4GiB
Storage Transcend SSD370S 128GB; Toshiba DT01ACA100 1TB HDD
Display(s) Samsung S20D300 20" 768p TN
Case Cooler Master MasterBox E501L
Audio Device(s) Realtek ALC1150
Power Supply Corsair VS450
Mouse A4Tech N-70FX
Software Windows 10 Pro
Benchmark Scores BaseMark GPU : 250 Point in HD 4600
400W??? Isn't Nvidia suppose to be efficient??
 
Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
There is no difference, A100 is just the Tesla name it uses a GA100
I don't know about no difference one's cut down and the price will vary , so they carry the same name though, weird.
 
Joined
Dec 22, 2011
Messages
3,890 (0.86/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11

T4C Fantasy

CPU & GPU DB Maintainer
Staff member
Joined
May 7, 2012
Messages
2,562 (0.59/day)
Location
Rhode Island
System Name Whaaaat Kiiiiiiid!
Processor Intel Core i9-12900K @ Default
Motherboard Gigabyte Z690 AORUS Elite AX
Cooling Corsair H150i AIO Cooler
Memory Corsair Dominator Platinum 32GB DDR4-3200
Video Card(s) EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s) 27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case Thermaltake Core X9 Snow
Audio Device(s) Logitech G935 Headset
Power Supply SeaSonic Platinum 1050W Snow Silent
Mouse Logitech G903 Lightspeed
Keyboard Logitech G915
Software Windows 11 Pro
Benchmark Scores FFXV: 19329
I don't know about no difference one's cut down and the price will vary , so they carry the same name though, weird.
The different one will be GA102, No HBM but just as many cuda cores more or less.

But there won't be 2 different 100s, technically that is what the 102 is.
 
Joined
Nov 24, 2017
Messages
853 (0.37/day)
Location
Asia
Processor Intel Core i5 4590
Motherboard Gigabyte Z97x Gaming 3
Cooling Intel Stock Cooler
Memory 8GiB(2x4GiB) DDR3-1600 [800MHz]
Video Card(s) XFX RX 560D 4GiB
Storage Transcend SSD370S 128GB; Toshiba DT01ACA100 1TB HDD
Display(s) Samsung S20D300 20" 768p TN
Case Cooler Master MasterBox E501L
Audio Device(s) Realtek ALC1150
Power Supply Corsair VS450
Mouse A4Tech N-70FX
Software Windows 10 Pro
Benchmark Scores BaseMark GPU : 250 Point in HD 4600
Compared to what exactly?
Compared to AMD. Nvidia's 12nm GPUs have same efficiency of AMD's 7nm GPUs, as a result Nvidia's 7nm GPU's should be more efficient.
 
Joined
Dec 22, 2011
Messages
3,890 (0.86/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
Compared to AMD. Nvidia's 12nm GPUs have same efficiency of AMD's 7nm GPUs, as a result Nvidia's 7nm GPU's should be more efficient.

So your comparing a 10.3 billion transistor Navi to a 54 billion transistor 40GB HBM2 HPC AI compute monster.

Got ya.
 
Last edited:
Joined
Oct 28, 2012
Messages
1,159 (0.28/day)
Processor AMD Ryzen 3700x
Motherboard asus ROG Strix B-350I Gaming
Cooling Deepcool LS520 SE
Memory crucial ballistix 32Gb DDR4
Video Card(s) RTX 3070 FE
Storage WD sn550 1To/WD ssd sata 1To /WD black sn750 1To/Seagate 2To/WD book 4 To back-up
Display(s) LG GL850
Case Dan A4 H2O
Audio Device(s) sennheiser HD58X
Power Supply Corsair SF600
Mouse MX master 3
Keyboard Master Key Mx
Software win 11 pro
400W??? Isn't Nvidia suppose to be efficient??
For what it's supposed to be the perf/watt ratio is actually great. A Single rack of a DGX A100 can replace several old racks.
From this :
1589478488398.png


To this:

1589478529647.png
 
Joined
Dec 22, 2011
Messages
3,890 (0.86/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
A picture paints a thousand words, thank you.
 
Joined
Jan 8, 2017
Messages
8,926 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
But there won't be 2 different 100s, technically that is what the 102 is.

But this one has an entire GPC disabled due to horrendous yields, I presume, and probably because it would throw even that eye watering 400W TDP out the window. There has to be one fully enabled chip right ? One would assume there would be different 100s.

To be honest this is borderline Thermi 2.0, a great compute architecture that can barley be implemented in actual silicon due to power and yields. These aren't exactly Nvidia's brightest hours in terms of chip design, it seems like they bit more than what they could chew, the chip was probably cut down in a last minute decision.

Suffice to say I doubt we'll see the full 8192 shaders in any GPU this generation, I doubt they could realistically fit that in a 250W power envelope and it seems like GA100 runs at 1.4 Ghz, no change from Volta nor from Turing probably. Let's see 35% more shaders than Volta but 60% more power and same clocks. It's not shaping up to be the "50% more efficient and 50% faster per SM" some hoped for.
 
Last edited:
Joined
Dec 22, 2011
Messages
3,890 (0.86/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
Well they can scrap the FP64 performance that the 5700XT offers in FP32 for starters so that is a bonus, with the TU102 being 18.6 billion transistors I'd suggest they have wiggle room. Just a thought.
 
Joined
Oct 4, 2017
Messages
695 (0.29/day)
Location
France
Processor RYZEN 7 5800X3D
Motherboard Aorus B-550I Pro AX
Cooling HEATKILLER IV PRO , EKWB Vector FTW3 3080/3090 , Barrow res + Xylem DDC 4.2, SE 240 + Dabel 20b 240
Memory Viper Steel 4000 PVS416G400C6K
Video Card(s) EVGA 3080Ti FTW3
Storage XPG SX8200 Pro 512 GB NVMe + Samsung 980 1TB
Display(s) Dell S2721DGF
Case NR 200
Power Supply CORSAIR SF750
Mouse Logitech G PRO
Keyboard Meletrix Zoom 75 GT Silver
Software Windows 11 22H2
Some of what your saying is wrong ,it takes up quite a lot of die space relatively hence Nvidia's large die sizes which are added to by the requirements of extra cache resources and hardware needed to keep the special units busy.

I'm afraid you are wrong . The myth that larger die sized are correlated to fixed function hardware has been already debunked , im trying to find the source , might be TPU , Anandtech , or Youtube but it might take time until i find it so i will link it here ASAP .

There is no real correlation between die size increase and fixed function as the latter eats relatively very low die space , more likely than not the higher die size in Turing is explained by the fact that it has more SMs .

This is further backed up by GA100 which has increased dies size compared to GV100 ( 826mm^2 vs 815mm^2 ) but significantly lower TensorCore count ( 432 vs 640 ) . So it is pretty obvious that fixed function hardware is not responsible for the die size expansion !

The other reason being because they can, and to make more money, it's not rocket science just business, people should have chosen with their wallet's.

This was exactly my point , the only tangible argument that justifies higher prices for Turing ( other than the increased silicon size ) is because the lack of competition allows them to do so .
 

M2B

Joined
Jun 2, 2017
Messages
284 (0.11/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
. These aren't exactly Nvidia's brightest hours in terms of chip design

These are exactly Nvidia's brightest hours in terms of chip design.
The A100 packs 54 billion transistors, 2.5 times as much as a V100, and those transistors aren't there for nothing.
You can't just compare SM counts and base stupid assumptions upon that. The A100 is clearly a much more efficient solution for what it's been designed for.
 
Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
I'm afraid you are wrong . The myth that larger die sized are correlated to fixed function hardware has been already debunked , im trying to find the source , might be TPU , Anandtech , or Youtube but it might take time until i find it so i will link it here ASAP .

There is no real correlation between die size increase and fixed function as the latter eats relatively very low die space , more likely than not the higher die size in Turing is explained by the fact that it has more SMs .

This is further backed up by GA100 which has increased dies size compared to GV100 ( 826mm^2 vs 815mm^2 ) but significantly lower TensorCore count ( 432 vs 640 ) . So it is pretty obvious that fixed function hardware is not responsible for the die size expansion !



This was exactly my point , the only tangible argument that justifies higher prices for Turing ( other than the increased silicon size ) is because the lack of competition allows them to do so .
We disagree , so be it.
 
Joined
Jan 8, 2017
Messages
8,926 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
These are exactly Nvidia's brightest hours in terms of chip design.

Why is almost 20% of the chip disabled then ? That's great design, right ?

You can't just compare SM counts and base stupid assumptions upon that.

Comparing SM counts and power is a totally legit way of inferring efficiency, how else would you do it? The SMs aren't same, but that's the point, efficiency wouldn't come just from the node.

those transistors aren't there for nothing.

Guess what buddy, some of them are for nothing, I'd say about 8-9 billion give or take.

Let's face reality, they couldn't enable the entire chip because of power constraints. Making a chip like that isn't desirable, it's painfully obvious they've missed their target by miles.
 
Last edited by a moderator:

M2B

Joined
Jun 2, 2017
Messages
284 (0.11/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
Why is almost 20% of the chip disabled then ? That's great design, right ?



Comparing SM counts and power is a totally legit way of inferring efficiency, how else would you do it, smart ass ? The SMs aren't same, but that's the point, efficiency wouldn't come just from the node.



Guess what buddy, some of them are for nothing, I'd say about 9 billion give or take.

Look at this clueless person acting like he really knows how to design GPUs better than a 200$ billion company which have been designing GPUs for ages.
So, based on your logic the Vega 56 is a more efficient GPU than AMD's latest and greatest 5700 XT, because it has more TFLOPS and much more compute units, and consumes similar amounts of power, right?
Based on the density figures, I think Nvidia is using TSMC's high-density version of their 7nm node, not the high-performance one, and that was not the case with previous generations.
They could just use the normal high performance version and scale up the GV100 chip, but they clearly needed more density for their design goals.
What I'm saying is that you have to see how the chip performs in applications that actully matter and base efficiency figures upon that, not just some raw numbers.
 
Last edited:
Joined
Jan 8, 2017
Messages
8,926 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Look at this clueless person acting like he really knows how to design GPUs better a 200$ billion company which have been designing GPUs for ages.
So, based on your logic the Vega 56 is a more efficient GPU than AMD's latest and greatest 5700 XT, because it has more TFLOPS and much more compute units, and consumes similiar amounts of power,

:roll:

Because you do know how to design a GPU, right ? Sorry your GPU architect badge must have fallen off.

So, based on your logic the Vega 56 is a more efficient GPU than AMD's latest and greatest 5700 XT, because it has more TFLOPS and much more compute units, and consumes similiar amounts of power,

Nope, that's based on your logic. Your understanding of what I said was obviously severely limited.

First of all Vega 56 uses more power, and runs at lower clocks. A legendary GPU architect like yourself would know that a larger processor at lower clocks runs more efficiently because shaders scale relativity linearly with power whereas a change in clocks incurs a change in voltage which isn't linear. In other words if let's say we have a GPU with N/2 shaders at 2 Ghz it will generally consume more power than a GPU with N shaders at 1 Ghz.

Let's compile that with how Navi works : RX 5700XT runs at a considerably higher voltages and clocks and has way less shaders and yet it generates a similar amount of FP32 compute with less power. It's obviously way more efficient architecturally, but as I already mentioned I am sure a world renowned GPU architect as yourself knew all that.

On the other hand, Volta and Ampere run at pretty much the same frequency and likely similar voltages since TSMC's 7nm doesn't seem to change that in any significant manner (in fact all 7nm CPU/GPU up until know seem to run at the same or even higher voltages), GA100 has 20% more shaders compared to V100 but also consumes 60% more power. It doesn't take much to see that efficiency isn't that great. It's not that hard to infer these things, don't overestimate their complexity.

Yes, I am sure when you factor in Nvidia's novel floating point formats it looks great, but if you look just at FP32, it's doesn't look great. It's rather mediocre. Do you not find it strange that our boy Jensen never once mentioned FP32 performance ?

I never said I knew how to design it better, stop projecting made up staff onto me. I said it was obvious they failed to do what they originally set out to do, hence why a considerable porton of the chip is fused off. They've done it in the past too.
 
Last edited:
Joined
Oct 22, 2014
Messages
13,210 (3.81/day)
Location
Sunshine Coast
System Name Black Box
Processor Intel Xeon E3-1260L v5
Motherboard MSI E3 KRAIT Gaming v5
Cooling Tt tower + 120mm Tt fan
Memory G.Skill 16GB 3600 C18
Video Card(s) Asus GTX 970 Mini
Storage Kingston A2000 512Gb NVME
Display(s) AOC 24" Freesync 1m.s. 75Hz
Case Corsair 450D High Air Flow.
Audio Device(s) No need.
Power Supply FSP Aurum 650W
Mouse Yes
Keyboard Of course
Software W10 Pro 64 bit
By the way I've just noticed the power :), 400W, that's 150W over V100. Ouch, 7nm hasn't been kind, I was right that this is a power hungry monster.
Plot twist.
Jensen wasn't baking it in his oven, he used them to heat his oven.
 
Joined
Nov 23, 2010
Messages
313 (0.06/day)
I think this is exactly what data center customers want and have been asking for, these will sell like hot cakes to the big cloud operators.
 
Joined
Dec 18, 2015
Messages
142 (0.05/day)
System Name Avell old monster - Workstation T1 - HTPC
Processor i7-3630QM\i7-5960x\Ryzen 3 2200G
Cooling Stock.
Memory 2x4Gb @ 1600Mhz
Video Card(s) HD 7970M \ EVGA GTX 980\ Vega 8
Storage SSD Sandisk Ultra li - 480 GB + 1 TB 5400 RPM WD - 960gb SDD + 2TB HDD
I'm afraid you are wrong . The myth that larger die sized are correlated to fixed function hardware has been already debunked , im trying to find the source , might be TPU , Anandtech , or Youtube but it might take time until i find it so i will link it here ASAP .

There is no real correlation between die size increase and fixed function as the latter eats relatively very low die space , more likely than not the higher die size in Turing is explained by the fact that it has more SMs .

This is further backed up by GA100 which has increased dies size compared to GV100 ( 826mm^2 vs 815mm^2 ) but significantly lower TensorCore count ( 432 vs 640 ) . So it is pretty obvious that fixed function hardware is not responsible for the die size expansion !



This was exactly my point , the only tangible argument that justifies higher prices for Turing ( other than the increased silicon size ) is because the lack of competition allows them to do so .

How do these huge tensor cores do not take up space and increase the die size ? Maybe this will help to understand the relationship between die size, yields and GPU cost.

https://www.reddit.com/r/nvidia/comments/99r2x3
 
Joined
Mar 26, 2009
Messages
175 (0.03/day)
Very unimpressive FP32 and FP64 performance, I was way off in my estimations. Again, it's a case of optimizing for way too many things. So much silicon is dedicated to non traditional performance metrics that I wonder if it makes sense trying to shove everything in one package.
GA 100 is 20X faster than V100 in AI workloads and 2.5X in FP64 workloads, that's a generational leap like no other. This is an AI optimized chip, it has no RT cores, no encoders and no display connectors, it's focus is mainly on AI training and inference, for which it provides stellar performance that crushes any hope of competition in the near future. And you are comparing regular crap like FP32 and FP64?

Alright A100 provides 156 TF FP32 compared to only 15 TF in V100. That alone is 10X increase in FP32 compute power without the need to change any code. They can extend that lead to 20X through sparse network optimizations to 312 TF of FP32 without code change.

In FP16 the increase is also 2.5X in non optimzied code, and 6X in optimized code, same for INT8 and INT4 numbers, so A100 is really several orders of magnitude faster than V100 in any AI workload.

1589531819074.png


Also the 400w of power consumption is nothing relative to the size of this monster, you have 40GB of HBM2, loads of NVLink connections, loads of tensor cores that take up die area, heat and power, the chip is also cut down (which means lost power consumption), also the trend in data centers and AI is to open power consumption up to allow for more comfortable performance, V100 reached 350W in it's second iteration and 450W in its third iteration.

You seem to lack any ounce of data center experience, so I just suggest you stick to the of analysis consumer GPUs. This isn't your area.
 
Last edited:
Joined
Jan 8, 2017
Messages
8,926 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
You seem to lack any ounce of data center experience, so I just suggest you stick to the of analysis consumer GPUs.

I'll stick with whatever the hell I want, thanks. You, copy pasting boiler plate from Nvidia's website can be considered anything but an "analysis". What are you, a sales man ? You're barking at the wrong tree buddy.

GA 100 is 20X faster than V100 in AI workloads and 2.5X in FP64 workloads

It turns out I overestimated your ability to copy paste information, you can't even do that :

d.png


9.7 / 7.8 = 1.24X (FP64)

Or maybe Jensen did a good job deceiving the less tech literate with their fine print by mixing together FP64 with FP64 TF.

Nice paint skills by the way.
 
Last edited:
Joined
Oct 15, 2010
Messages
208 (0.04/day)
Figuring out how they get 40 GB from 6 HBM stacks is a little confusing.
They cheaped out, instead of offering 6 modules of 8 gb, for a total of 48 gb, they wen for higher margins. Will offer a better improved version with full 48 gb memory, 25 mhz more, on core and memory, for 5000 dollar more. Dunn worry about it.

Is so typical of nvidia.

Figuring out how they get 40 GB from 6 HBM stacks is a little confusing.
They cheaped out, instead of offering 6 modules of 8 gb, for a total of 48 gb, they wen for higher margins. Will offer a better improved version with full 48 gb memory, 25 mhz more, on core and memory, for 5000 dollar more. Dunn worry about it.

Is so typical of nvidia.
 
Joined
Mar 26, 2009
Messages
175 (0.03/day)
9.7 / 7.8 = 1.24X (FP64)

Or maybe Jensen did a good job deceiving the less tech literate with their fine print.
Hey genius, I already provided you with a chart explaining all the metrics, good to know you can't read.

FP64 from Tensor cores is 19.5TF. Which is a 2.5X increase over V100. FP64 from CUDA cores is 9.7TF. If you can use both at the same time you will get about 30TF of FP64 for AI actually.

You, copy pasting boiler plate from Nvidia's website can be considered anything but an "analysis"
It's much more meaningful than the ignorant job you did, analysing regular FP32/FP64 in an AI GPU. Talk about an extreme case of stuff that are way over your head.
 
Joined
Jan 8, 2017
Messages
8,926 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Hey genius, I already provided you with a chart explaining all the metrics, good to know you can't read.

FP64 from Tensor cores is 19.5TF. Which is a 2.5X increase over V100. FP64 from CUDA cores is 9.7TF. If you can use both at the same time you will get about 30TF of FP64 for AI actually.

You're so cute when you try to explain your utter lack of understanding about these metrics.

You wrote "FP64 workloads", you genius. That's pure FP64 not tensor ops, you're clueless and stubborn.
 
Top