• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

NVIDIA GA100 Scalar Processor Specs Sheet Released

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,695 (7.42/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
NVIDIA today unveiled the GTC 2020, online event, and the centerpiece of it all is the GA100 scalar processor GPU, which debuts the "Ampere" graphics architecture. Sifting through a mountain of content, we finally found the slide that matters the most - the specifications sheet of GA100. The GA100 is a multi-chip module that has the 7 nm GPU die at the center, and six HBM2E memory stacks at its either side. The GPU die is built on the TSMC N7P 7 nm silicon fabrication process, measures 826 mm², and packing an unfathomable 54 billion transistors - and we're not even counting the transistors on the HBM2E stacks of the interposer.

The GA100 packs 6,912 FP32 CUDA cores, and independent 3,456 FP64 (double-precision) CUDA cores. It has 432 third-generation tensor cores that have FP64 capability. The three are spread across a gargantuan 108 streaming multiprocessors. The GPU has 40 GB of total memory, across a 6144-bit wide HBM2E memory interface, and 1.6 TB/s total memory bandwidth. It has two interconnects: a PCI-Express 4.0 x16 (64 GB/s), and an NVLink interconnect (600 GB/s). Compute throughput values are mind-blowing: 19.5 TFLOPs classic FP32, 9.7 TFLOPs classic FP64, and 19.5 TFLOPs tensor cores; TF32 156 TFLOPs single-precision (312 TFLOPs with neural net sparsity enabled); 312 TFLOPs BFLOAT16 throughout (doubled with sparsity enabled); 312 TFLOPs FP16; 624 TOPs INT8, and 1,248 TOPS INT4. The GPU has a typical power draw of 400 W in the SXM form-factor. We also found the architecture diagram that reveals GA100 to be two almost-independent GPUs placed on a single slab of silicon. We also have our first view of the "Ampere" streaming multiprocessor with its FP32 and FP64 CUDA cores, and 3rd gen tensor cores. The GeForce version of this SM could feature 2nd gen RT cores.



View at TechPowerUp Main Site
 
Figuring out how they get 40 GB from 6 HBM stacks is a little confusing. Some sites are reporting one or more stacks are disabled. Seems weird. My guess is that the stacks have different memory amounts. For instance, 4 stacks could have 8 GB each and 2 stacks could have 4 GB each.

Edit: Is it possible to have 1.6 GB per wafer in a stack?

Edit #2: Oh and on an unrelated note, it was just released that Nvidia is dropping Xeon and going with Epyc for its complete server solutions.
 
Last edited:
If this translate to consumers GPUs, then that next gen might be comparabale to what happened with the 8800 gtx... with another price bump if AMD can't answer.
 
Can't wait to see what a FP16 and FP32 optimised SM will look like . This might be the biggest generational leap we have ever experienced ! So the rumors of 3060 class of GPU running circles around 2080Ti especially in RTX might not be that off .
 
Figuring out how they get 40 GB from 6 HBM stacks is a little confusing. Some sites are reporting one or more stacks are disabled. Seems weird. My guess is that the stacks have different memory amounts. For instance, 4 stacks could have 8 GB each and 2 stacks could have 4 GB each.

Edit: Is it possible to have 1.6 GB per wafer in a stack?

Edit #2: Oh and on an unrelated note, it was just released that Nvidia is dropping Xeon and going with Epyc for its complete server solutions.

I wondering that myself. Maybe just a typo and it should've said 48GB?
 
Why would anyone benefit from not having healthy competition ?
Are you willing to pay $1500 for a potential RTX 3080 Ti or would you prefer if AMD releases a $700 Navi 21 that is as fast as RTX 3080 Ti ?

Even better AMD releases a $300 Navi 21 that is as fast as RTX 3080 Ti ........ yeah let's not dream too much !

Jokes aside ofc i get what you mean and agree 100% competition is always good , but let's wait and see what each company has to offer this time around before making unrealistic expectations .
 
Very unimpressive FP32 and FP64 performance, I was way off in my estimations. Again, it's a case of optimizing for way too many things. So much silicon is dedicated to non traditional performance metrics that I wonder if it makes sense trying to shove everything in one package.

Here's hoping that Ampere as is in this instance wont power any consumer graphics because the outlook would be grim, we would be looking at another barley incremental performance increase.
 
Last edited:
Can't wait to see what a FP16 and FP32 optimised SM will look like . This might be the biggest generational leap we have ever experienced ! So the rumors of 3060 class of GPU running circles around 2080Ti especially in RTX might not be that off .

Having ASICs/Shaders dedicated to specific tasks was not the mistake that led to the absurd prices of the Turing gen ? Let's see how much Nvidia can keep this interesting for gamers, with huge and expensive dies with low yields/waffer. It is interesting that amd's strategy with RDNA2 is precisely the opposite. :)
 
Figuring out how they get 40 GB from 6 HBM stacks is a little confusing. Some sites are reporting one or more stacks are disabled. Seems weird. My guess is that the stacks have different memory amounts. For instance, 4 stacks could have 8 GB each and 2 stacks could have 4 GB each.

Edit: Is it possible to have 1.6 GB per wafer in a stack?

Edit #2: Oh and on an unrelated note, it was just released that Nvidia is dropping Xeon and going with Epyc for its complete server solutions.

1 is probably disabled because it seems to be 5120bit, this equals to be 1,555GB/s bandwidth @ 1215MHz
 
Very unimpressive FP32 and FP64 performance, I was way off in my estimations. Again, it's a case of optimizing for way too many things.

Here's hoping that Ampere as is in this instance wont power any consumer graphics because the outlook would be grim, we would be looking at another barley incremental performance increase.
It will:
 
It will:

Sad reacts only, all those "RTX 3060 as fast as a 2080ti" seem out of this world right know. I am still hoping they'll actually increase the shader count and not just cut away the HPC oriented stuff. But damn, it just doesn't look all that enticing.

By the way I've just noticed the power :), 400W, that's 150W over V100. Ouch, 7nm hasn't been kind, I was right that this is a power hungry monster.
 
Last edited:
Wonder whether Nvidia will introduce A100 as a standalone Titan. As nice as DGX, it is way out of budget for any single research lab. Now a few Titan A100 might be a good option for researchers to tinker with.
 
7nm shrink of 2080Ti +tensor a 330mm2 chip, 3080 10GB +20%clock + 10% IPC.
 
Having ASICs/Shaders dedicated to specific tasks was not the mistake that led to the absurd prices of the Turing gen ?

For starter having fixed function silicon dedicated to specific task was everything but a mistake ! Today Nvidia is leading in raytracing and they are managing to alleviate a big part of the penalties that come with it ( reduced frame-rates ) thanks to techniques such as DLSS 2.0 which count on said fixed function silicon . This is the future of graphics and if anything else we should expect more/improved fixed function silicon with upcoming gens not the opposite , so qualifying them as a ''mistake'' means your are not understanding what the future of GPUs is , despite it being right in-front of your eyes !

About the absurd prices of Turing ( which we all agree they are ) nothing indicates that fixed function silicon is the cause for it . Quite the oposite considering from what we know fixed function silicon takes very little die space , so what makes you objectively believe this is the reason of said absurd prices ? The only objective reason is lack of competition from AMD !

It is interesting that amd's strategy with RDNA2 is precisely the opposite. :)

:shadedshu: Sorry what ? RDNA2 is going to follow the same route by implementing fixed function silicon , what are you talking about ?
 
For starter having fixed function silicon dedicated to specific task was everything but a mistake ! Today Nvidia is leading in raytracing and they are managing to alleviate a big part of the penalties that come with it ( reduced frame-rates ) thanks to techniques such as DLSS 2.0 which count on said fixed function silicon . This is the future of graphics and if anything else we should expect more/improved fixed function silicon with upcoming gens not the opposite , so qualifying them as a ''mistake'' means your are not understanding what the future of GPUs is , despite it being right in-front of your eyes !

About the absurd prices of Turing ( which we all agree they are ) nothing indicates that fixed function silicon is the cause for it . Quite the oposite considering from what we know fixed function silicon takes very little die space , so what makes you objectively believe this is the reason of said absurd prices ? The only objective reason is lack of competition from AMD !



:shadedshu: Sorry what ? RDNA2 is going to follow the same route by implementing fixed function silicon , what are you talking about ?
While I have no coin in the argument against fixed function hardware, it will be tested by time but can certainly do the job, and efficiently.
Some of what your saying is wrong ,it takes up quite a lot of die space relatively hence Nvidia's large die sizes which are added to by the requirements of extra cache resources and hardware needed to keep the special units busy.

The other reason being because they can, and to make more money, it's not rocket science just business, people should have chosen with their wallet's.

Any talk of their competition's version is hearsay at this point ,no argument required.
 
I think we'd all prefer this, however, Navi 21 isn't competing with (performance wise) an RTX 3080 Ti. It will be lucky to best the 2080 Ti by a worthy (10%) margin.

Yeah, it would be nice to have a 2080 Ti competitor first.
 
I hope we all can agree that this "leak" was a bunch of BS at least.

Nvidia-Ampere-leak-768x433.png
 
I hope we all can agree that this "leak" was a bunch of BS at least.

Nvidia-Ampere-leak-768x433.png
Yeah GA100 is exclusive to servers\AI
GA102 will be the gaming\consumer version
 
I hope we all can agree that this "leak" was a bunch of BS at least.

Nvidia-Ampere-leak-768x433.png
They're not releasing ga100 to consumers in any Rtx form though that's for sure.
They could be legit future specs still If you count cuda core's equally (cuda64 and cuda32 cores), perhaps.
 
Yeah this is pure HPC, AI focused, and regarding that leak... TBF the full GA100 does have 8192 FP32 CUDA cores, so not a bad guess. No RT cores at all though, so again not aimed at gamers...

 
For starter having fixed function silicon dedicated to specific task was everything but a mistake ! Today Nvidia is leading in raytracing and they are managing to alleviate a big part of the penalties that come with it ( reduced frame-rates ) thanks to techniques such as DLSS 2.0 which count on said fixed function silicon . This is the future of graphics and if anything else we should expect more/improved fixed function silicon with upcoming gens not the opposite , so qualifying them as a ''mistake'' means your are not understanding what the future of GPUs is , despite it being right in-front of your eyes !

About the absurd prices of Turing ( which we all agree they are ) nothing indicates that fixed function silicon is the cause for it . Quite the oposite considering from what we know fixed function silicon takes very little die space , so what makes you objectively believe this is the reason of said absurd prices ? The only objective reason is lack of competition from AMD !



:shadedshu: Sorry what ? RDNA2 is going to follow the same route by implementing fixed function silicon , what are you talking about ?

In fact, big dies reduce the yields (functional chips/wafer) so it increases the cost of each chip and Nvidia tries to maintain profit margins to satisfy shareholders, this combination leads to high prices. I know Nvidia has managed to take advantage of the error itself, but that's questionable, as for now the results are not always consistent.

What I understand is that raytracing on RDNA2 uses a part of regular shaders. I honestly haven't seen anything about dedicated hardware.
 
From others this is the A100 not GA100

The GA100 is the full fat 8192 GPU.
 
Back
Top