• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,668 (7.43/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Yesterday, NVIDIA launched its GeForce RTX 40-series, based on the "Ada" graphics architecture. We're yet to receive a technical briefing about the architecture itself, and the various hardware components that make up the silicon; but NVIDIA on its website gave us a first look at what's in store with the key number-crunching components of "Ada," namely the Ada CUDA core, 4th generation Tensor core, and 3rd generation RT core. Besides generational IPC and clock speed improvements, the latest CUDA core benefits from SER (shader execution reordering), an SM or GPC-level feature that reorders execution waves/threads to optimally load each CUDA core and improve parallelism.

Despite using specialized hardware such as the RT cores, the ray tracing pipeline still relies on CUDA cores and the CPU for a handful tasks, and here NVIDIA claims that SER contributes to a 3X ray tracing performance uplift (the performance contribution of CUDA cores). With traditional raster graphics, SER contributes a meaty 25% performance uplift. With Ada, NVIDIA is introducing its 4th generation of Tensor core (after Volta, Turing, and Ampere). The Tensor cores deployed on Ada are functionally identical to the ones on the Hopper H100 Tensor Core HPC processor, featuring the new FP8 Transformer Engine, which delivers up to 5X the AI inference performance over the previous generation Ampere Tensor Core (which itself delivered a similar leap by leveraging sparsity).



The third-generation RT Core being introduced with Ada offers twice the ray-triangle intersection performance over the "Ampere" RT core, and introduces two new hardware components—Opacity Micromap (OMM) Engine, and Displaced Micro-Mesh (DMM) Engine. OMM accelerates alpha textures often used for elements such as foliage, particles, and fences; while the DMM accelerates BVH build times by a stunning 10X. DLSS 3 will be exclusive to Ada as it relies on the 4th Gen Tensor cores, and the Optical Flow Accelerator component on Ada GPUs, to deliver on the promise of drawing new frames purely using AI, without involving the main graphics rendering pipeline.

We'll give you a more detailed run-down of the Ada architecture as soon as we can.

View at TechPowerUp Main Site
 
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
 
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
and excessive power consumption requirements.
 
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
With traditional raster graphics, SER contributes a meaty 25% performance uplift.
:wtf:
 
Oh they gained 25% raster IPC you think? Per halved shader compared to the past? Interesting!

so they say, nvidia says a lot of things, most of which should be ignored

if it was true they could have made a simple demo to prove it... they didn't
 
Wow! Look at how many extra picture tiles they had to add to 3rd Gen Full Stack Inventions to make it bigger than the others! I am very impressed with 3rd Gen and the marketing department in general, much Wow /s
japan people GIF
 
so they say, nvidia says a lot of things, most of which should be ignored

if it was true they could have made a simple demo to prove it... they didn't
How would you demo that? Build a special Ada GPU that didn't include SER? Because you can't control that from software any more than you can control micro-op reordering on an Intel or AMD CPU.
The way to judge that is to wait for reviews and see where performance lands. (Yes, I'm not taking 25% at face value either. But even if inflated, it still points to some beefy generational improvements that @Vayra86 claimed were nowhere to be seen.)
 
More waste of silicon, why can't they just create gaming GPUs without all the nonsense, and professional ones or bring back Titan series with all these technologies.
 
There seems to be some ambiguity around the ROP count. Is there an official number yet?
 
szhvklbxi1p91.png


saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.

im94428ll3p91.png


edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.


edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb
 
Last edited:
No surprise at all. There's mountains of 3 series stock still. Hence crazy 4 series prices.

Nvidia got to try sell 4 series somehow so making features 4 series exclusive an obvious move. That technical explanation likely to be BS.

Personally doubt it'll work. Not enough consumer care enough to pay the price.

Probably see Nvidia rip their guidance up in a few months.
 
Wow! Look at how many extra picture tiles they had to add to 3rd Gen Full Stack Inventions to make it bigger than the others! I am very impressed with 3rd Gen and the marketing department in general, much Wow /s
japan people GIF

Haha you're not wrong, though the "3rd gen full stack" (whatever the hell that means :confused: ) also includes everything from the previous generations so the extra pictures could be replaced with that, oh well whatever :D

View attachment 262475

saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.

View attachment 262476

edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.


edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb

They want to force 4000 series sales but will only further contribute to the demise of DLSS. Why invest in optimizing for a technology that only a couple percent of the market can use when the competitors FSR and (soonTM) XeSS support almost the entire market!?

Regarding the 12gb 4080, it's a 4070 with extra marketing shenanigans on top :nutkick:
 
Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
 
I’m interested to see what the mesh and opacity hardware can do, the biggest fail of RT is the lack of realistic lighting on objects that has been done on traditional hardware with ease for years now. Not everything is a perfect mirror and RT calculations are compute intensive enough that without dedicated hardware “prebaked goods” opacity/translucency it still doesn’t look real. How many more Ms does it add since it is still another step in the pipeline, or is it a shared resource that is stored and a lookup is all that is required?
 
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
Maybe the engineers have actually reached a block ? DLSS doesn't seem like a small R&D thing, and everyone ended up doing a similar tech (even Apple with metal FX)
 
Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
I'm curious, but what's to stop Nvidia (or AMD and Intel with their future equivalents) from just pumping up the frame rate numbers artificially? The 4090 is up to 4 times faster apparently with DLSS 3.
 
View attachment 262475

saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.

View attachment 262476

edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.


edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb

Suffice it to say, I don't believe a word of it. They have lied so many times before, take G-Sync as an example, lol... <insert french voice> *a few drivers later....*

Hope someone can come up with a solution.
 
Suffice it to say, I don't believe a word of it. They have lied so many times before, take G-Sync as an example, lol... <insert french voice> *a few drivers later....*

Hope someone can come up with a solution.

Their argument seems to be that it will run slower than on newer cards, so an old card is running new games with new technologies slower, what else is new!? :D
 
Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
If it works how I think it does then it is just an useless gimmick. I think it needs at least two frames worth of buffer to generate fake frames - you need to generate fake frames from something and you will get less artifacts this way, because it is realtively easy to generate intermediate frames between known frames as compared to prediction how future frame will look like when you only have one frame to work with. You will have 300 interpolated fps but input latency of 30 fps (assuming that is what gpu can do natively without fps interpolation) because additional frames are generated outside of game engine and thus outside input and world state update loop. Incerase of performance comes not only from frames but also from reduced latency. Nvidia only gives you increased frames without lower latency. Would love to be wrong about this of course.
 
I'm curious, but what's to stop Nvidia (or AMD and Intel with their future equivalents) from just pumping up the frame rate numbers artificially? The 4090 is up to 4 times faster apparently with DLSS 3.
latency increase, independent testing regarding image quality/stability for example.
DLSS 3.0 is pumping the frame rate artificially (I know you didn't mean it like that), there is no rendering or CPU involved, it just takes information from previous frame history, motion vector etc and using the trained tensor cores, generating a make-up image lol. (I expect first gen related issues but eventually it will all pan out)

If it works how I think it does then it is just an useless gimmick. I think it needs at least two frames worth of buffer to generate fake frames - you need to generate fake frames from something and you will get less artifacts this way, because it is realtively easy to generate intermediate frames between known frames as compared to prediction how future frame will look like when you only have one frame to work with. You will have 300 interpolated fps but input latency of 30 fps (assuming that is what gpu can do natively without fps interpolation) because additional frames are generated outside of game engine and thus outside input and world state update loop. Incerase of performance comes not only from frames but also from reduced latency. Nvidia only gives you increased frames without lower latency. Would love to be wrong about this of course.
I don't know exactly the implementation and how Nvidia combat incurred latency, so I will wait for the white paper, but if I understood it seems that use of DLSS 3.0 necessitate use of reflex also.
I expect some issues as this is first gen implementation, but eventually it will pan out.
Also regarding adoption it will be prebuilt in Unreal and Unity so in time we are going to see possibly more and more games using it.
 
Last edited:
Back
Top