NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance

btarunr · Sep 21, 2022

Yesterday, NVIDIA launched its GeForce RTX 40-series, based on the "Ada" graphics architecture. We're yet to receive a technical briefing about the architecture itself, and the various hardware components that make up the silicon; but NVIDIA on its website gave us a first look at what's in store with the key number-crunching components of "Ada," namely the Ada CUDA core, 4th generation Tensor core, and 3rd generation RT core. Besides generational IPC and clock speed improvements, the latest CUDA core benefits from SER (shader execution reordering), an SM or GPC-level feature that reorders execution waves/threads to optimally load each CUDA core and improve parallelism.

Despite using specialized hardware such as the RT cores, the ray tracing pipeline still relies on CUDA cores and the CPU for a handful tasks, and here NVIDIA claims that SER contributes to a 3X ray tracing performance uplift (the performance contribution of CUDA cores). With traditional raster graphics, SER contributes a meaty 25% performance uplift. With Ada, NVIDIA is introducing its 4th generation of Tensor core (after Volta, Turing, and Ampere). The Tensor cores deployed on Ada are functionally identical to the ones on the Hopper H100 Tensor Core HPC processor, featuring the new FP8 Transformer Engine, which delivers up to 5X the AI inference performance over the previous generation Ampere Tensor Core (which itself delivered a similar leap by leveraging sparsity).

The third-generation RT Core being introduced with Ada offers twice the ray-triangle intersection performance over the "Ampere" RT core, and introduces two new hardware components—Opacity Micromap (OMM) Engine, and Displaced Micro-Mesh (DMM) Engine. OMM accelerates alpha textures often used for elements such as foliage, particles, and fences; while the DMM accelerates BVH build times by a stunning 10X. DLSS 3 will be exclusive to Ada as it relies on the 4th Gen Tensor cores, and the Optical Flow Accelerator component on Ada GPUs, to deliver on the promise of drawing new frames purely using AI, without involving the main graphics rendering pipeline.

We'll give you a more detailed run-down of the Ada architecture as soon as we can.

View at TechPowerUp Main Site

Vayra86 · Sep 21, 2022

Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'

ratirt · Sep 21, 2022

Vayra86 said:
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'

and excessive power consumption requirements.

bug · Sep 21, 2022

Vayra86 said:
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'

With traditional raster graphics, SER contributes a meaty 25% performance uplift.

Vayra86 · Sep 21, 2022

bug said:

Oh they gained 25% raster IPC you think? Per halved shader compared to the past? Interesting!

Garrus · Sep 21, 2022

Vayra86 said:
Oh they gained 25% raster IPC you think? Per halved shader compared to the past? Interesting!

so they say, nvidia says a lot of things, most of which should be ignored

if it was true they could have made a simple demo to prove it... they didn't

tomo82 · Sep 21, 2022

Wow! Look at how many extra picture tiles they had to add to 3rd Gen Full Stack Inventions to make it bigger than the others! I am very impressed with 3rd Gen and the marketing department in general, much Wow /s

pavle · Sep 21, 2022

Raytracing till you plotz. :rolleyes:

bug · Sep 21, 2022

Garrus said:
so they say, nvidia says a lot of things, most of which should be ignored

if it was true they could have made a simple demo to prove it... they didn't

How would you demo that? Build a special Ada GPU that didn't include SER? Because you can't control that from software any more than you can control micro-op reordering on an Intel or AMD CPU.
The way to judge that is to wait for reviews and see where performance lands. (Yes, I'm not taking 25% at face value either. But even if inflated, it still points to some beefy generational improvements that @Vayra86 claimed were nowhere to be seen.)

Jimmy_ · Sep 21, 2022

Deleted member 185088 · Sep 21, 2022

More waste of silicon, why can't they just create gaming GPUs without all the nonsense, and professional ones or bring back Titan series with all these technologies.

Daven · Sep 21, 2022

There seems to be some ambiguity around the ROP count. Is there an official number yet?

Bomby569 · Sep 21, 2022

saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.

edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.

edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb

GALAX confirms AD102-300, AD103-300 and AD104-400 GPUs for GeForce RTX 4090/4080 series - VideoCardz.com

Galax shows their China-only GPU series The company confirmed the full specs of the new RTX 40 series. As we mentioned yesterday, NVIDIA tends to omit some specifications when revealing new cards. This is how we ended up learning the exact die size of AD102 through a PNY website. Although the...

videocardz.com

mb194dc · Sep 21, 2022

No surprise at all. There's mountains of 3 series stock still. Hence crazy 4 series prices.

Nvidia got to try sell 4 series somehow so making features 4 series exclusive an obvious move. That technical explanation likely to be BS.

Personally doubt it'll work. Not enough consumer care enough to pay the price.

Probably see Nvidia rip their guidance up in a few months.

trsttte · Sep 21, 2022

Owen1982 said:
Wow! Look at how many extra picture tiles they had to add to 3rd Gen Full Stack Inventions to make it bigger than the others! I am very impressed with 3rd Gen and the marketing department in general, much Wow /s

Haha you're not wrong, though the "3rd gen full stack" (whatever the hell that means :confused:

) also includes everything from the previous generations so the extra pictures could be replaced with that, oh well whatever

Bomby569 said:
View attachment 262475

saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.

View attachment 262476

edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.

edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb

GALAX confirms AD102-300, AD103-300 and AD104-400 GPUs for GeForce RTX 4090/4080 series - VideoCardz.com

Galax shows their China-only GPU series The company confirmed the full specs of the new RTX 40 series. As we mentioned yesterday, NVIDIA tends to omit some specifications when revealing new cards. This is how we ended up learning the exact die size of AD102 through a PNY website. Although the...

videocardz.com

They want to force 4000 series sales but will only further contribute to the demise of DLSS. Why invest in optimizing for a technology that only a couple percent of the market can use when the competitors FSR and (soonTM) XeSS support almost the entire market!?

Regarding the 12gb 4080, it's a 4070 with extra marketing shenanigans on top :nutkick:

ModEl4 · Sep 21, 2022

Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)

Steevo · Sep 21, 2022

I’m interested to see what the mesh and opacity hardware can do, the biggest fail of RT is the lack of realistic lighting on objects that has been done on traditional hardware with ease for years now. Not everything is a perfect mirror and RT calculations are compute intensive enough that without dedicated hardware “prebaked goods” opacity/translucency it still doesn’t look real. How many more Ms does it add since it is still another step in the pipeline, or is it a shared resource that is stored and a lookup is all that is required?

dyonoctis · Sep 21, 2022

Vayra86 said:
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'

Maybe the engineers have actually reached a block ? DLSS doesn't seem like a small R&D thing, and everyone ended up doing a similar tech (even Apple with metal FX)

Fasola · Sep 21, 2022

ModEl4 said:
Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)

I'm curious, but what's to stop Nvidia (or AMD and Intel with their future equivalents) from just pumping up the frame rate numbers artificially? The 4090 is up to 4 times faster apparently with DLSS 3.

Legacy-ZA · Sep 21, 2022

Bomby569 said:
View attachment 262475

saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.

View attachment 262476

edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.

edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb

GALAX confirms AD102-300, AD103-300 and AD104-400 GPUs for GeForce RTX 4090/4080 series - VideoCardz.com

Galax shows their China-only GPU series The company confirmed the full specs of the new RTX 40 series. As we mentioned yesterday, NVIDIA tends to omit some specifications when revealing new cards. This is how we ended up learning the exact die size of AD102 through a PNY website. Although the...

videocardz.com

Suffice it to say, I don't believe a word of it. They have lied so many times before, take G-Sync as an example, lol... <insert french voice> *a few drivers later....*

Hope someone can come up with a solution.

trsttte · Sep 21, 2022

Legacy-ZA said:
Suffice it to say, I don't believe a word of it. They have lied so many times before, take G-Sync as an example, lol... <insert french voice> *a few drivers later....*

Hope someone can come up with a solution.

Their argument seems to be that it will run slower than on newer cards, so an old card is running new games with new technologies slower, what else is new!?

tehehe · Sep 21, 2022

ModEl4 said:
Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)

If it works how I think it does then it is just an useless gimmick. I think it needs at least two frames worth of buffer to generate fake frames - you need to generate fake frames from something and you will get less artifacts this way, because it is realtively easy to generate intermediate frames between known frames as compared to prediction how future frame will look like when you only have one frame to work with. You will have 300 interpolated fps but input latency of 30 fps (assuming that is what gpu can do natively without fps interpolation) because additional frames are generated outside of game engine and thus outside input and world state update loop. Incerase of performance comes not only from frames but also from reduced latency. Nvidia only gives you increased frames without lower latency. Would love to be wrong about this of course.

ModEl4 · Sep 22, 2022

Fasola said:
I'm curious, but what's to stop Nvidia (or AMD and Intel with their future equivalents) from just pumping up the frame rate numbers artificially? The 4090 is up to 4 times faster apparently with DLSS 3.

latency increase, independent testing regarding image quality/stability for example.
DLSS 3.0 is pumping the frame rate artificially (I know you didn't mean it like that), there is no rendering or CPU involved, it just takes information from previous frame history, motion vector etc and using the trained tensor cores, generating a make-up image lol. (I expect first gen related issues but eventually it will all pan out)

tehehe said:
If it works how I think it does then it is just an useless gimmick. I think it needs at least two frames worth of buffer to generate fake frames - you need to generate fake frames from something and you will get less artifacts this way, because it is realtively easy to generate intermediate frames between known frames as compared to prediction how future frame will look like when you only have one frame to work with. You will have 300 interpolated fps but input latency of 30 fps (assuming that is what gpu can do natively without fps interpolation) because additional frames are generated outside of game engine and thus outside input and world state update loop. Incerase of performance comes not only from frames but also from reduced latency. Nvidia only gives you increased frames without lower latency. Would love to be wrong about this of course.

I don't know exactly the implementation and how Nvidia combat incurred latency, so I will wait for the white paper, but if I understood it seems that use of DLSS 3.0 necessitate use of reflex also.
I expect some issues as this is first gen implementation, but eventually it will pan out.
Also regarding adoption it will be prebuilt in Unreal and Unity so in time we are going to see possibly more and more games using it.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	Bro2
Processor	Ryzen 5800X
Motherboard	Gigabyte X570 Aorus Elite
Cooling	Corsair h115i pro rgb
Memory	32GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s)	Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage	M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s)	LG 27UD69 UHD / LG 27GN950
Case	Fractal Design G
Audio Device(s)	Realtec 5.1
Power Supply	Seasonic 750W GOLD
Mouse	Logitech G402
Keyboard	Logitech slim
Software	Windows 10 64 bit

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance

btarunr

Editor & Senior Moderator

Vayra86

ratirt

bug

Vayra86

Garrus

tomo82

pavle

bug

Jimmy_

Deleted member 185088

Guest

Daven

Bomby569

GALAX confirms AD102-300, AD103-300 and AD104-400 GPUs for GeForce RTX 4090/4080 series - VideoCardz.com

mb194dc

trsttte

GALAX confirms AD102-300, AD103-300 and AD104-400 GPUs for GeForce RTX 4090/4080 series - VideoCardz.com

ModEl4

Steevo

dyonoctis

Fasola

Legacy-ZA

GALAX confirms AD102-300, AD103-300 and AD104-400 GPUs for GeForce RTX 4090/4080 series - VideoCardz.com

trsttte

tehehe

ModEl4

System Name	Jedi Survivor Gaming PC
Processor	AMD Ryzen 7800X3D
Motherboard	Asus TUF B650M Plus Wifi
Cooling	ThermalRight CPU Cooler
Memory	G.Skill 32GB DDR5-5600 CL28
Video Card(s)	MSI RTX 3080 10GB
Storage	2TB Samsung 990 Pro SSD
Display(s)	MSI 32" 4K OLED 240hz Monitor
Case	Asus Prime AP201
Power Supply	FSP 1000W Platinum PSU
Mouse	Logitech G403
Keyboard	Asus Mechanical Keyboard

System Name	Computer!
Processor	i7-6700K
Motherboard	AsRock Z170 Extreme 7+
Cooling	EKWB on CPU & GPU, 240 slim and 360 Monsta, Aquacomputer Aquabus D5, Aquaaero 6 Pro.
Memory	32Gb Kingston Hyper-X 3Ghz
Video Card(s)	Asus 980 Ti Strix
Storage	2 x 950 Pro
Display(s)	Old Acer thing
Case	NZXT 440 Modded
Audio Device(s)	onboard
Power Supply	Seasonic PII 600W Platinum
Mouse	Razer Deathadder Chroma
Keyboard	Logitech G15
Software	Win 10 Pro

Processor	i7-7700k @5ghz
Motherboard	Asus strix Z270-F
Cooling	EK AIO 240mm
Memory	Hyper-X ( 16 GB - XMP )
Video Card(s)	RTX 2080 super OC
Storage	512GB - WD(Nvme) + 1TB WD SDD
Display(s)	Acer Nitro 165Hz OC
Case	Deepcool Mesh 55
Audio Device(s)	Razer Karken X
Power Supply	Asus TUF gaming 650W brozen
Mouse	Razer Mamba Wireless & Glorious Model D Wireless
Keyboard	Cooler Master K70
Software	Win 10

Processor	Ryzen 5 5700x
Motherboard	B550 Elite
Cooling	Thermalright Perless Assassin 120 SE
Memory	32GB Fury Beast DDR4 3200Mhz
Video Card(s)	Gigabyte 3060 ti gaming oc pro
Storage	Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s)	LG 27gp850 1440p 165Hz 27''
Case	Lian Li Lancool II performance
Power Supply	MSI 750w
Mouse	G502

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

Processor	AMD Ryzen 3700x
Motherboard	asus ROG Strix B-350I Gaming
Cooling	Deepcool LS520 SE
Memory	crucial ballistix 32Gb DDR4
Video Card(s)	RTX 3070 FE
Storage	WD sn550 1To/WD ssd sata 1To /WD black sn750 1To/Seagate 2To/WD book 4 To back-up
Display(s)	LG GL850
Case	Dan A4 H2O
Audio Device(s)	sennheiser HD58X
Power Supply	Corsair SF600
Mouse	MX master 3
Keyboard	Master Key Mx
Software	win 11 pro

Processor	AMD Ryzen 9 5900X
Motherboard	ASUS ROG STRIX B550-F GAMING (WI-FI)
Cooling	Noctua NH-D15 G2
Memory	32GB G.Skill DDR4 3600Mhz CL18
Video Card(s)	ASUS RTX 5070Ti OC TUF
Storage	SAMSUNG 990 PRO 2TB
Display(s)	Dell S3220DGF
Case	Corsair iCUE 4000X
Audio Device(s)	ASUS Xonar D2X
Power Supply	Corsair AX760 Platinum
Mouse	Razer DeathAdder V2 - Wireless
Keyboard	Corsair K70 PRO - OPX Linear Switches
Software	Microsoft Windows 11 - Enterprise (64-bit)