• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Ampere A100 vs A102

Joined
Mar 2, 2019
Messages
166 (0.09/day)
System Name My PC
Processor AMD 2700 x @ 4.1Ghz
Motherboard Gigabyte X470 Aorus Gaming
Cooling Zalman CNPS20X
Memory Corsair Vengeance LPX Black 32GB DDR4
Video Card(s) Sapphire Radeon RX 570 PULSE
Storage Adata Ultimate SU800
Case Phanteks Eclipse P500A
Audio Device(s) Logitech G51
Power Supply Seasonic Focus GX, 80+ Gold, 550W
Keyboard Roccat Vulcan 121
A100 has 6912 Shaders while A102 has 10496.

I really can not explain why the larger chip has less shaders and it is made on a more advanced node.


 
Last edited:
Joined
Dec 18, 2017
Messages
18 (0.01/day)
Location
Somewhere in Melbourne Australia
System Name Main PC, Project Many Drives.
Processor i5-4690 @ stock, Core 2 Quad QX6800 @ 3.2GHZ
Motherboard Gigabyte GA-Z97X-UD3H, Intel D975XBX1
Cooling Cooler Master Hyper 212 evo, Cooler Master TX3
Memory 4 x 4GB DDR3 Kingston blue stuff with heatsinks, 4x 2gb Random DDR2 800
Video Card(s) EVGA Geforce GTX 770 2GB ACX OC, Modded Sapphire Radeon 7850 with a heatsink from a GV-NX96T512HP
Storage Samsung 120GB, WD 2TB, Sansumg 1.5TB, TOSHIBA 1TB. 3x WD 7.2k, 5x Seagate 7.2k, 2x Maxtor 10K, 1 SSD
Display(s) HP W2371d, HP20wd. 23" 1680x1050 Proview thing, 19" 1280x1024 dell ultrasharp thing
Case Antec P180 (or P182 idk), some boring coolermaster case with 3 120mm fans
Audio Device(s) Soundblaster XFI XtremeGamer Fatal1ty Pro, built in audio
Power Supply SilverStone 1KW 80+ silver thing, Antec HCG-750 non modular. (runs hot and loud AF)
Mouse Logitech G400S, Microsoft optical Mouse
Keyboard Microsoft Multimedia Keyboard 1.0A, IBM SK-8820
Software Too much software to be listed here.
Benchmark Scores Fast enough for me XD
Maybe each CUDA core is massivly scaled up compared to GA102's CUDA cores? (and in turn, the rest of the Ampere GPU's)
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
7,460 (2.33/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃B550-I Strix
Cooling PA120+T30┃AXP120x67
Memory 64GB 6000CL30┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Display(s) 43" QN90B / 32" M32Q / 27" S2721DGF
Case Caselabs S3┃Lone Industries L5
Power Supply Corsair HX1000┃HDPlex
It's not even intended to output video. The A100 is marketed for deep learning. Roughly 30% more tensor cores that allegedly matter a lot in AI deep learning and other compute workloads, 75% more ROPs and, well, HBM2E. GA100 and GA102 are probably only architecturally related in the loosest possible sense of the word "architecture".

Also, I don't think GA100 has any RT hardware on die. For good reason, given its branding and intended usage/customer base, as well the need to cram in more things that matter (tensor cores, tensor cores, and tensor cores). Nvidia really seems to be driving the point home that its new tensor cores can do just about anything with the new instructions they've worked into them.
 
Joined
Mar 18, 2008
Messages
5,717 (0.98/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
A100 is heavily emphasized on the Tensor cores, that is why.
 
Joined
Mar 2, 2019
Messages
166 (0.09/day)
System Name My PC
Processor AMD 2700 x @ 4.1Ghz
Motherboard Gigabyte X470 Aorus Gaming
Cooling Zalman CNPS20X
Memory Corsair Vengeance LPX Black 32GB DDR4
Video Card(s) Sapphire Radeon RX 570 PULSE
Storage Adata Ultimate SU800
Case Phanteks Eclipse P500A
Audio Device(s) Logitech G51
Power Supply Seasonic Focus GX, 80+ Gold, 550W
Keyboard Roccat Vulcan 121
If you click on the links you will see A100 has 432 Vs 328. Not a big difference.
 
Joined
Jan 25, 2006
Messages
1,470 (0.22/day)
Processor Ryzen 1600AF @4.2Ghz 1.35v
Motherboard MSI B450M PRO-A-MAX
Cooling Deepcool Gammaxx L120t
Memory 16GB Team Group Dark Pro Sammy-B-die 3400mhz 14.15.14.30-1.4v
Video Card(s) XFX RX 5600 XT THICC II PRO
Storage 240GB Brave eagle SSD/ 2TB Seagate Barracuda
Display(s) Dell SE2719HR
Case MSI Mag Vampiric 011C AMD Ryzen Edition
Power Supply EVGA 600W 80+
Software Windows 10 Pro
5120bit hbm2 memory bus also o_O
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
7,460 (2.33/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃B550-I Strix
Cooling PA120+T30┃AXP120x67
Memory 64GB 6000CL30┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Display(s) 43" QN90B / 32" M32Q / 27" S2721DGF
Case Caselabs S3┃Lone Industries L5
Power Supply Corsair HX1000┃HDPlex
If you click on the links you will see A100 has 432 Vs 328. Not a big difference.

I don't think GA102 has any of the FP64 hardware that GA100 has. From these simplistic SM diagrams it's hard to tell exactly how much space everything takes up, but since the RT core has since Turing been located outside of the "immediate" SM, separated by L1 cache and doesn't affect real estate inside of the actual "core", I imagine Nvidia found a way to fill up the space vacated by removing FP64.

931-sm-diagram.jpg 813-sm-diagram.jpg

If you contrast it to TU102, it honestly doesn't look that different than what we expect GA102 looks like. RT core sits on the other side of L1 cache, and memory controllers lie on the edges of the die.

It's a little misleading how the naming scheme makes it sound like GA100 and GA102 belong to the same product family, as they appear to differ from the SM on up, and generally GPUs of one family share the same SM layout as part of their "architecture".

Long story short, Nvidia calls FP32 units "CUDA cores", so they probably removed all the FP64 hardware and filled it with more FP32 and called them "cores". We'll know for sure once the consumer Amperes have their SM structure released.
 
Last edited:
Joined
Jan 8, 2017
Messages
8,863 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
A100 has 6912 Shaders while A102 has 10496.

I really can not explain why the larger chip has less shaders and it is made on a more advanced node.

It's more accurate to look at the SM count, which doesn't differ that much when you put it into perspective according to size and transistor count. The only thing a CUDA cores means is a FP32 unit, that's it. There was a time when GPUs could only do FP32 and had little or not support for FP64 so they called those "CUDA cores" or "stream processors". But times changed, there are a lot more different units with different ratios inside an SM and it means almost nothing what the "CUDA core" count is.

And the thing is they aren't really cores anyway, it's just a marketing thing, functionally it's the SM which is the "core" in a GPU. I wish they'd stop calling them like that and use the SM count instead which doesn't mean much either but it's a bit better. It's getting really stupid, they'll soon need scientific notation for this stuff.

they probably removed all the FP64 hardware

Nah, they'll keep some for compatibility purposes so that it doesn't become completely crippled, maybe a 1 to 32 ratio of FP64 units. Every generation had support for it since Fermi.
 
Last edited:
Joined
Mar 2, 2019
Messages
166 (0.09/day)
System Name My PC
Processor AMD 2700 x @ 4.1Ghz
Motherboard Gigabyte X470 Aorus Gaming
Cooling Zalman CNPS20X
Memory Corsair Vengeance LPX Black 32GB DDR4
Video Card(s) Sapphire Radeon RX 570 PULSE
Storage Adata Ultimate SU800
Case Phanteks Eclipse P500A
Audio Device(s) Logitech G51
Power Supply Seasonic Focus GX, 80+ Gold, 550W
Keyboard Roccat Vulcan 121
I have not said what i wanted to say in my initial post. I wil lsay it now.

BUT ... Nvidia is doing some magic counting with the No. of shaders in the 3000 series.

3090's 10496 are in fact half 5248 equivalent with A100 and 2000 series as it was leaked by the Gainward slides https://videocardz.com/newz/gainward-confirms-geforce-rtx-3090-and-rtx-3080-phoenix-graphics-cards. There are just saying they act as double the performance - no tested yet in actual gaming performance.

Most analysts are arriving to the same conclusion but are not saying it out loud yet. When card are tested more will come out.
 

newtekie1

Semi-Retired Folder
Joined
Nov 22, 2005
Messages
28,472 (4.25/day)
Location
Indiana, USA
Processor Intel Core i7 10850K@5.2GHz
Motherboard AsRock Z470 Taichi
Cooling Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory 32GB DDR4-3600
Video Card(s) RTX 2070 Super
Storage 500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s) Acer Nitro VG280K 4K 28"
Case Fractal Design Define S
Audio Device(s) Onboard is good enough for me
Power Supply eVGA SuperNOVA 1000w G3
Software Windows 10 Pro x64
If you click on the links you will see A100 has 432 Vs 328. Not a big difference.


A100 also has 48MB of L2 Cache compared to 6MB L2 on A102.
 
Joined
May 12, 2006
Messages
1,554 (0.24/day)
Location
The Gulag Casino
System Name ROG 7900X3d By purecain
Processor AMD Ryzen 7 7900X3D
Motherboard ASUS Crosshair X670E Hero
Cooling Noctua NH U12A
Memory 64Gb G.Skill Trident Z5 neo RGB 6400@6000mhz@1.41v
Video Card(s) Aorus RTX4090 Extreme Waterforce
Storage 990Pro2Tb-1TbSamsung Evo M.2/ 2TbSamsung QVO/ 1TbSamsung Evo780/ 120gbKingston Now
Display(s) LG 65UN85006LA 65" Smart 4K Ultra HD HDR LED TV
Case Thermaltake CoreX71 Limited Edition Etched Tempered Glass Door
Audio Device(s) On board/NIcomplete audio 6
Power Supply Seasonic FOCUS 1000w 80+
Mouse M65 RGB Elite
Keyboard K95 RGB Platinum
Software Windows11pro
Benchmark Scores [url=https://valid.x86.fr/gtle1y][img]https://valid.x86.fr/cache/banner/gtle1y-6.png[/img][/url]
im shocked, i thought the A100 would of been the daddy chip like what we've seen in the past.
I'm glad though because i was disappointed with the core count. :toast:
 
Joined
Jan 8, 2017
Messages
8,863 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
I have not said what i wanted to say in my initial post. I wil lsay it now.

BUT ... Nvidia is doing some magic counting with the No. of shaders in the 3000 series.

3090's 10496 are in fact half 5248 equivalent with A100 and 2000 series as it was leaked by the Gainward slides https://videocardz.com/newz/gainward-confirms-geforce-rtx-3090-and-rtx-3080-phoenix-graphics-cards. There are just saying they act as double the performance - no tested yet in actual gaming performance.

Most analysts are arriving to the same conclusion but are not saying it out loud yet. When card are tested more will come out.

I think the count is correct but there is something else. Contrary to popular belief adding FP32 execution units isn't that expensive silicon wise, in fact they occupy very little space. See, these GPU don't execute instructions independently, that's the trick. They execute them in groups of 32 which simplifies things tremendously, that's why GPUs have many more cores (SMs) than CPUs.

But control logic is still expensive so they do the following trick : they put more units (not just FP32) in one SM to share the control logic and the more units you have sharing that the more constraints you have. Ever wondered why for instance P100 had 64 not 128 shaders per SM like every other gaming chip ? That's why, they used the cheaper less capable design for gaming products and the more complex one for compute cards. I bet you they did the same here, it wouldn't surprise me if they went back to 128 units per SM or the SM itself has more constraints.

The performance hasn't increased almost linearly like it usually does, so clearly there is a regression in some areas.
 
Last edited:
Joined
May 12, 2006
Messages
1,554 (0.24/day)
Location
The Gulag Casino
System Name ROG 7900X3d By purecain
Processor AMD Ryzen 7 7900X3D
Motherboard ASUS Crosshair X670E Hero
Cooling Noctua NH U12A
Memory 64Gb G.Skill Trident Z5 neo RGB 6400@6000mhz@1.41v
Video Card(s) Aorus RTX4090 Extreme Waterforce
Storage 990Pro2Tb-1TbSamsung Evo M.2/ 2TbSamsung QVO/ 1TbSamsung Evo780/ 120gbKingston Now
Display(s) LG 65UN85006LA 65" Smart 4K Ultra HD HDR LED TV
Case Thermaltake CoreX71 Limited Edition Etched Tempered Glass Door
Audio Device(s) On board/NIcomplete audio 6
Power Supply Seasonic FOCUS 1000w 80+
Mouse M65 RGB Elite
Keyboard K95 RGB Platinum
Software Windows11pro
Benchmark Scores [url=https://valid.x86.fr/gtle1y][img]https://valid.x86.fr/cache/banner/gtle1y-6.png[/img][/url]
Perfectly stated Vya Domus. Its going to be an interesting few months in our world of hardware!
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
7,460 (2.33/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃B550-I Strix
Cooling PA120+T30┃AXP120x67
Memory 64GB 6000CL30┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Display(s) 43" QN90B / 32" M32Q / 27" S2721DGF
Case Caselabs S3┃Lone Industries L5
Power Supply Corsair HX1000┃HDPlex
Like I said, pretty much just filling in with CUDA cores the empty spaces from removing the dedicated FP64 hardware in A100.

1599254668255.png


I guess the remaining FP64 hardware isn't numerous enough to show on the diagram, like they did with omitting altogether the few remaining FP64 cores on GP104 from the SM diagram, whereas with compute dies like GP100 and A100 all the FP64 is included visually. And/or Nvidia is using the Tensor cores to pick up the slack with FP64 needs, something they've been bragging about with their new Tensor cores for some time now.
 
Joined
Jan 8, 2017
Messages
8,863 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Another thing would be that I bet this chip doesn't have anywhere near the amount of cache A100 has, so there is more room for SMs. Data locality is always important but especially for ML, so that's why there is so much of it on A100.
 
Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Seems to me like any game using Int extensively might reduce perceived shader performance by 25% , how much is Int used in game's?.
 

ParhamXT

New Member
Joined
Sep 19, 2020
Messages
1 (0.00/day)
GA102 has 6 MB of L2 cache compared to GA100s 48 MB.
The most transistors in every chip is used by the cache and the cores use less transistors.
That's why GA100 has 54200 million transistors compared to GA102s 28400 million.
 
Joined
Jan 6, 2016
Messages
69 (0.02/day)
Location
Algeria
System Name HyPerioN
Processor 5900X
Motherboard Asus X570 Strix-F
Cooling Arctic Freezer II 360 Rev 5
Memory 32GB 3000Mhz G.Skill Rijaw V
Video Card(s) GTX 2070
Storage Intel 760p+Samsung 860 EVO+2TB Toshiba 7200rpm HDD+2TB Seagate Barracuda
Display(s) LG UltraGear GL850
Case Asus Tuf G501
Power Supply EVGA 850W G6
Mouse Razer Viper Mini
Keyboard Corsair K70
so more than one year after the release of the new RTX 3000 series what do you guys think about this whole topic?
yes this is an old thread and i am sorry for reviving it, but i would rather revive this one than making a new post about it.
thank you and sorry for the post revive.
 
Joined
Oct 7, 2022
Messages
38 (0.07/day)
I got your answer here:
1669513061771.png

1669512471719.png
 
Last edited:
Top