NVIDIA GA100 Scalar Processor Specs Sheet Released

btarunr · May 14, 2020

NVIDIA today unveiled the GTC 2020, online event, and the centerpiece of it all is the GA100 scalar processor GPU, which debuts the "Ampere" graphics architecture. Sifting through a mountain of content, we finally found the slide that matters the most - the specifications sheet of GA100. The GA100 is a multi-chip module that has the 7 nm GPU die at the center, and six HBM2E memory stacks at its either side. The GPU die is built on the TSMC N7P 7 nm silicon fabrication process, measures 826 mm², and packing an unfathomable 54 billion transistors - and we're not even counting the transistors on the HBM2E stacks of the interposer.

The GA100 packs 6,912 FP32 CUDA cores, and independent 3,456 FP64 (double-precision) CUDA cores. It has 432 third-generation tensor cores that have FP64 capability. The three are spread across a gargantuan 108 streaming multiprocessors. The GPU has 40 GB of total memory, across a 6144-bit wide HBM2E memory interface, and 1.6 TB/s total memory bandwidth. It has two interconnects: a PCI-Express 4.0 x16 (64 GB/s), and an NVLink interconnect (600 GB/s). Compute throughput values are mind-blowing: 19.5 TFLOPs classic FP32, 9.7 TFLOPs classic FP64, and 19.5 TFLOPs tensor cores; TF32 156 TFLOPs single-precision (312 TFLOPs with neural net sparsity enabled); 312 TFLOPs BFLOAT16 throughout (doubled with sparsity enabled); 312 TFLOPs FP16; 624 TOPs INT8, and 1,248 TOPS INT4. The GPU has a typical power draw of 400 W in the SXM form-factor. We also found the architecture diagram that reveals GA100 to be two almost-independent GPUs placed on a single slab of silicon. We also have our first view of the "Ampere" streaming multiprocessor with its FP32 and FP64 CUDA cores, and 3rd gen tensor cores. The GeForce version of this SM could feature 2nd gen RT cores.

View at TechPowerUp Main Site

Daven · May 14, 2020

Figuring out how they get 40 GB from 6 HBM stacks is a little confusing. Some sites are reporting one or more stacks are disabled. Seems weird. My guess is that the stacks have different memory amounts. For instance, 4 stacks could have 8 GB each and 2 stacks could have 4 GB each.

Edit: Is it possible to have 1.6 GB per wafer in a stack?

Edit #2: Oh and on an unrelated note, it was just released that Nvidia is dropping Xeon and going with Epyc for its complete server solutions.

dicktracy · May 14, 2020

RDNA2 should just stick to consoles.

dyonoctis · May 14, 2020

If this translate to consumers GPUs, then that next gen might be comparabale to what happened with the 8800 gtx... with another price bump if AMD can't answer.

RH92 · May 14, 2020

Can't wait to see what a FP16 and FP32 optimised SM will look like . This might be the biggest generational leap we have ever experienced ! So the rumors of 3060 class of GPU running circles around 2080Ti especially in RTX might not be that off .

Breit · May 14, 2020

Mark Little said:
Figuring out how they get 40 GB from 6 HBM stacks is a little confusing. Some sites are reporting one or more stacks are disabled. Seems weird. My guess is that the stacks have different memory amounts. For instance, 4 stacks could have 8 GB each and 2 stacks could have 4 GB each.

Edit: Is it possible to have 1.6 GB per wafer in a stack?

Edit #2: Oh and on an unrelated note, it was just released that Nvidia is dropping Xeon and going with Epyc for its complete server solutions.

I wondering that myself. Maybe just a typo and it should've said 48GB?

ARF · May 14, 2020

dicktracy said:
RDNA2 should just stick to consoles.

Why would anyone benefit from not having healthy competition ?
Are you willing to pay $1500 for a potential RTX 3080 Ti or would you prefer if AMD releases a $700 Navi 21 that is as fast as RTX 3080 Ti ?

RH92 · May 14, 2020

ARF said:
Why would anyone benefit from not having healthy competition ?
Are you willing to pay $1500 for a potential RTX 3080 Ti or would you prefer if AMD releases a $700 Navi 21 that is as fast as RTX 3080 Ti ?

Even better AMD releases a $300 Navi 21 that is as fast as RTX 3080 Ti ........ yeah let's not dream too much !

Jokes aside ofc i get what you mean and agree 100% competition is always good , but let's wait and see what each company has to offer this time around before making unrealistic expectations .

Vya Domus · May 14, 2020

Very unimpressive FP32 and FP64 performance, I was way off in my estimations. Again, it's a case of optimizing for way too many things. So much silicon is dedicated to non traditional performance metrics that I wonder if it makes sense trying to shove everything in one package.

Here's hoping that Ampere as is in this instance wont power any consumer graphics because the outlook would be grim, we would be looking at another barley incremental performance increase.

Dante Uchiha · May 14, 2020

RH92 said:
Can't wait to see what a FP16 and FP32 optimised SM will look like . This might be the biggest generational leap we have ever experienced ! So the rumors of 3060 class of GPU running circles around 2080Ti especially in RTX might not be that off .

Having ASICs/Shaders dedicated to specific tasks was not the mistake that led to the absurd prices of the Turing gen ? Let's see how much Nvidia can keep this interesting for gamers, with huge and expensive dies with low yields/waffer. It is interesting that amd's strategy with RDNA2 is precisely the opposite.

T4C Fantasy · May 14, 2020

Mark Little said:
Figuring out how they get 40 GB from 6 HBM stacks is a little confusing. Some sites are reporting one or more stacks are disabled. Seems weird. My guess is that the stacks have different memory amounts. For instance, 4 stacks could have 8 GB each and 2 stacks could have 4 GB each.

Edit: Is it possible to have 1.6 GB per wafer in a stack?

Edit #2: Oh and on an unrelated note, it was just released that Nvidia is dropping Xeon and going with Epyc for its complete server solutions.

1 is probably disabled because it seems to be 5120bit, this equals to be 1,555GB/s bandwidth @ 1215MHz

dyonoctis · May 14, 2020

Vya Domus said:
Very unimpressive FP32 and FP64 performance, I was way off in my estimations. Again, it's a case of optimizing for way too many things.

Here's hoping that Ampere as is in this instance wont power any consumer graphics because the outlook would be grim, we would be looking at another barley incremental performance increase.

It will:

NVIDIA "Ampere" Designed for both HPC and GeForce/Quadro

NVIDIA CEO Jensen Huang in a pre-GTC press briefing stressed that the upcoming "Ampere" graphics architecture will spread across both the company's compute-accelerator and commercial graphics product lines. The architecture makes its debut later today with the Tesla A100 HPC processor for...

www.techpowerup.com

Vya Domus · May 14, 2020

dyonoctis said:
It will:

NVIDIA "Ampere" Designed for both HPC and GeForce/Quadro

NVIDIA CEO Jensen Huang in a pre-GTC press briefing stressed that the upcoming "Ampere" graphics architecture will spread across both the company's compute-accelerator and commercial graphics product lines. The architecture makes its debut later today with the Tesla A100 HPC processor for...

www.techpowerup.com

Sad reacts only, all those "RTX 3060 as fast as a 2080ti" seem out of this world right know. I am still hoping they'll actually increase the shader count and not just cut away the HPC oriented stuff. But damn, it just doesn't look all that enticing.

By the way I've just noticed the power

, 400W, that's 150W over V100. Ouch, 7nm hasn't been kind, I was right that this is a power hungry monster.

Deleted member 50521 · May 14, 2020

Wonder whether Nvidia will introduce A100 as a standalone Titan. As nice as DGX, it is way out of budget for any single research lab. Now a few Titan A100 might be a good option for researchers to tinker with.

EarthDog · May 14, 2020

ARF said:
would you prefer if AMD releases a $700 Navi 21 that is as fast as RTX 3080 Ti ?

I think we'd all prefer this, however, Navi 21 isn't competing with (performance wise) an RTX 3080 Ti. It will be lucky to best the 2080 Ti by a worthy (10%) margin.

ppn · May 14, 2020

7nm shrink of 2080Ti +tensor a 330mm2 chip, 3080 10GB +20%clock + 10% IPC.

RH92 · May 14, 2020

Dante Uchiha said:
Having ASICs/Shaders dedicated to specific tasks was not the mistake that led to the absurd prices of the Turing gen ?

For starter having fixed function silicon dedicated to specific task was everything but a mistake ! Today Nvidia is leading in raytracing and they are managing to alleviate a big part of the penalties that come with it ( reduced frame-rates ) thanks to techniques such as DLSS 2.0 which count on said fixed function silicon . This is the future of graphics and if anything else we should expect more/improved fixed function silicon with upcoming gens not the opposite , so qualifying them as a ''mistake'' means your are not understanding what the future of GPUs is , despite it being right in-front of your eyes !

About the absurd prices of Turing ( which we all agree they are ) nothing indicates that fixed function silicon is the cause for it . Quite the oposite considering from what we know fixed function silicon takes very little die space , so what makes you objectively believe this is the reason of said absurd prices ? The only objective reason is lack of competition from AMD !

Dante Uchiha said:
It is interesting that amd's strategy with RDNA2 is precisely the opposite.

Sorry what ? RDNA2 is going to follow the same route by implementing fixed function silicon , what are you talking about ?

TheoneandonlyMrK · May 14, 2020

RH92 said:
For starter having fixed function silicon dedicated to specific task was everything but a mistake ! Today Nvidia is leading in raytracing and they are managing to alleviate a big part of the penalties that come with it ( reduced frame-rates ) thanks to techniques such as DLSS 2.0 which count on said fixed function silicon . This is the future of graphics and if anything else we should expect more/improved fixed function silicon with upcoming gens not the opposite , so qualifying them as a ''mistake'' means your are not understanding what the future of GPUs is , despite it being right in-front of your eyes !

About the absurd prices of Turing ( which we all agree they are ) nothing indicates that fixed function silicon is the cause for it . Quite the oposite considering from what we know fixed function silicon takes very little die space , so what makes you objectively believe this is the reason of said absurd prices ? The only objective reason is lack of competition from AMD !

Sorry what ? RDNA2 is going to follow the same route by implementing fixed function silicon , what are you talking about ?

While I have no coin in the argument against fixed function hardware, it will be tested by time but can certainly do the job, and efficiently.
Some of what your saying is wrong ,it takes up quite a lot of die space relatively hence Nvidia's large die sizes which are added to by the requirements of extra cache resources and hardware needed to keep the special units busy.

The other reason being because they can, and to make more money, it's not rocket science just business, people should have chosen with their wallet's.

Any talk of their competition's version is hearsay at this point ,no argument required.

Fluffmeister · May 14, 2020

EarthDog said:
I think we'd all prefer this, however, Navi 21 isn't competing with (performance wise) an RTX 3080 Ti. It will be lucky to best the 2080 Ti by a worthy (10%) margin.

Yeah, it would be nice to have a 2080 Ti competitor first.

TheLostSwede · May 14, 2020

I hope we all can agree that this "leak" was a bunch of BS at least.

T4C Fantasy · May 14, 2020

TheLostSwede said:
I hope we all can agree that this "leak" was a bunch of BS at least.

Yeah GA100 is exclusive to servers\AI
GA102 will be the gaming\consumer version

TheoneandonlyMrK · May 14, 2020

TheLostSwede said:
I hope we all can agree that this "leak" was a bunch of BS at least.

They're not releasing ga100 to consumers in any Rtx form though that's for sure.
They could be legit future specs still If you count cuda core's equally (cuda64 and cuda32 cores), perhaps.

Fluffmeister · May 14, 2020

Yeah this is pure HPC, AI focused, and regarding that leak... TBF the full GA100 does have 8192 FP32 CUDA cores, so not a bad guess. No RT cores at all though, so again not aimed at gamers...

NVIDIA Ampere Architecture In-Depth | NVIDIA Technical Blog

Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU architecture. This post gives you a look…

devblogs.nvidia.com

Dante Uchiha · May 14, 2020

RH92 said:
For starter having fixed function silicon dedicated to specific task was everything but a mistake ! Today Nvidia is leading in raytracing and they are managing to alleviate a big part of the penalties that come with it ( reduced frame-rates ) thanks to techniques such as DLSS 2.0 which count on said fixed function silicon . This is the future of graphics and if anything else we should expect more/improved fixed function silicon with upcoming gens not the opposite , so qualifying them as a ''mistake'' means your are not understanding what the future of GPUs is , despite it being right in-front of your eyes !

About the absurd prices of Turing ( which we all agree they are ) nothing indicates that fixed function silicon is the cause for it . Quite the oposite considering from what we know fixed function silicon takes very little die space , so what makes you objectively believe this is the reason of said absurd prices ? The only objective reason is lack of competition from AMD !

Sorry what ? RDNA2 is going to follow the same route by implementing fixed function silicon , what are you talking about ?

In fact, big dies reduce the yields (functional chips/wafer) so it increases the cost of each chip and Nvidia tries to maintain profit margins to satisfy shareholders, this combination leads to high prices. I know Nvidia has managed to take advantage of the error itself, but that's questionable, as for now the results are not always consistent.

What I understand is that raytracing on RDNA2 uses a part of regular shaders. I honestly haven't seen anything about dedicated hardware.

TheoneandonlyMrK · May 14, 2020

From others this is the A100 not GA100

The GA100 is the full fat 8192 GPU.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

Processor	AMD Ryzen 3700x
Motherboard	asus ROG Strix B-350I Gaming
Cooling	Deepcool LS520 SE
Memory	crucial ballistix 32Gb DDR4
Video Card(s)	RTX 3070 FE
Storage	WD sn550 1To/WD ssd sata 1To /WD black sn750 1To/Seagate 2To/WD book 4 To back-up
Display(s)	LG GL850
Case	Dan A4 H2O
Audio Device(s)	sennheiser HD58X
Power Supply	Corsair SF600
Mouse	MX master 3
Keyboard	Master Key Mx
Software	win 11 pro

Processor	RYZEN 7 5800X3D
Motherboard	Aorus B-550I Pro AX
Cooling	HEATKILLER IV PRO , EKWB Vector FTW3 3080/3090 , Barrow res + Xylem DDC 4.2, SE 240 + Dabel 20b 240
Memory	Viper Steel 4000 PVS416G400C6K
Video Card(s)	EVGA 3080Ti FTW3
Storage	XPG SX8200 Pro 512 GB NVMe + Samsung 980 1TB
Display(s)	ROG Strix OLED XG27AQDMG
Case	NR 200
Power Supply	CORSAIR SF750
Mouse	Logitech G PRO
Keyboard	Meletrix Zoom 75 GT Silver
Software	Windows 11 22H2

System Name	Blackbird
Processor	AMD Threadripper 3960X 24-core
Motherboard	Gigabyte TRX40 Aorus Master
Cooling	Full custom-loop water cooling, mostly Aqua Computer and EKWB stuff!
Memory	4x 16GB G.Skill Trident-Z RGB @3733-CL14
Video Card(s)	Nvidia RTX 3090 FE
Storage	Samsung 950PRO 512GB, Crusial P5 2TB, Samsung 850PRO 1TB
Display(s)	LG 38GN950-B 38" IPS TFT, Dell U3011 30" IPS TFT
Case	CaseLabs TH10A
Audio Device(s)	Edifier S1000DB
Power Supply	ASUS ROG Thor 1200W (SeaSonic)
Mouse	Logitech MX Master
Keyboard	SteelSeries Apex M800
Software	MS Windows 10 Pro for Workstation
Benchmark Scores	A lot.

Processor	RYZEN 7 5800X3D
Motherboard	Aorus B-550I Pro AX
Cooling	HEATKILLER IV PRO , EKWB Vector FTW3 3080/3090 , Barrow res + Xylem DDC 4.2, SE 240 + Dabel 20b 240
Memory	Viper Steel 4000 PVS416G400C6K
Video Card(s)	EVGA 3080Ti FTW3
Storage	XPG SX8200 Pro 512 GB NVMe + Samsung 980 1TB
Display(s)	ROG Strix OLED XG27AQDMG
Case	NR 200
Power Supply	CORSAIR SF750
Mouse	Logitech G PRO
Keyboard	Meletrix Zoom 75 GT Silver
Software	Windows 11 22H2

NVIDIA GA100 Scalar Processor Specs Sheet Released

btarunr

Editor & Senior Moderator

Daven

dicktracy

dyonoctis

RH92

Breit

ARF

RH92

Vya Domus

Dante Uchiha

T4C Fantasy

CPU & GPU DB Maintainer

dyonoctis

NVIDIA "Ampere" Designed for both HPC and GeForce/Quadro

Vya Domus

NVIDIA "Ampere" Designed for both HPC and GeForce/Quadro

Deleted member 50521

Guest

EarthDog

ppn

RH92

TheoneandonlyMrK

Fluffmeister

TheLostSwede

News Editor

T4C Fantasy

CPU & GPU DB Maintainer

TheoneandonlyMrK

Fluffmeister

NVIDIA Ampere Architecture In-Depth | NVIDIA Technical Blog

Dante Uchiha

TheoneandonlyMrK

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Avell old monster - Workstation T1 - HTPC
Processor	i7-3630QM\i7-5960x\Ryzen 3 2200G
Cooling	Stock.
Memory	2x4Gb @ 1600Mhz
Video Card(s)	HD 7970M \ EVGA GTX 980\ Vega 8
Storage	SSD Sandisk Ultra li - 480 GB + 1 TB 5400 RPM WD - 960gb SDD + 2TB HDD

System Name	Whaaaat Kiiiiiiid!
Processor	Intel Core i9-14900K @ Default
Motherboard	Gigabyte Z690 AORUS Elite AX DDR4
Cooling	Corsair H150i AIO Cooler
Memory	Corsair Dominator Platinum 128GB DDR4-3200
Video Card(s)	EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage	Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s)	27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case	Thermaltake Core X9 Snow
Audio Device(s)	Logitech G PRO X 2 Lightspeed
Power Supply	SeaSonic Platinum 1050W Snow Silent
Mouse	Logitech G903 Lightspeed
Keyboard	Logitech G915 X Lightspeed
Software	Windows 11 Pro
Benchmark Scores	FFXV: 19329

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

Processor	AMD Ryzen 7 5700X3D
Motherboard	MSI MAG B550 TOMAHAWK
Cooling	Thermalright Peerless Assassin 120 SE
Memory	Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s)	Sapphire AMD Radeon RX 9070 XT NITRO+
Storage	Kingston A2000 1TB + Seagate HDD workhorse
Display(s)	Hisense 55" U7K 4K@144Hz
Case	Thermaltake Ceres 500 TG ARGB
Power Supply	Seasonic Focus GX-850
Mouse	Razer Deathadder Chroma
Keyboard	Logitech UltraX
Software	Windows 11

System Name	Overlord Mk MLI
Processor	AMD Ryzen 7 7800X3D
Motherboard	Gigabyte X670E Aorus Master
Cooling	Noctua NH-D15 SE with offsets
Memory	32GB Team T-Create Expert DDR5 6000 MHz @ CL30-34-34-68
Video Card(s)	Gainward GeForce RTX 4080 Phantom GS
Storage	1TB Solidigm P44 Pro, 2 TB Corsair MP600 Pro, 2TB Kingston KC3000
Display(s)	Acer XV272K LVbmiipruzx 4K@160Hz
Case	Fractal Design Torrent Compact
Audio Device(s)	Corsair Virtuoso SE
Power Supply	be quiet! Pure Power 12 M 850 W
Mouse	Logitech G502 Lightspeed
Keyboard	Corsair K70 Max
Software	Windows 10 Pro
Benchmark Scores	https://valid.x86.fr/yfsd9w