Wednesday, September 21st 2022

NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance

Yesterday, NVIDIA launched its GeForce RTX 40-series, based on the "Ada" graphics architecture. We're yet to receive a technical briefing about the architecture itself, and the various hardware components that make up the silicon; but NVIDIA on its website gave us a first look at what's in store with the key number-crunching components of "Ada," namely the Ada CUDA core, 4th generation Tensor core, and 3rd generation RT core. Besides generational IPC and clock speed improvements, the latest CUDA core benefits from SER (shader execution reordering), an SM or GPC-level feature that reorders execution waves/threads to optimally load each CUDA core and improve parallelism.

Despite using specialized hardware such as the RT cores, the ray tracing pipeline still relies on CUDA cores and the CPU for a handful tasks, and here NVIDIA claims that SER contributes to a 3X ray tracing performance uplift (the performance contribution of CUDA cores). With traditional raster graphics, SER contributes a meaty 25% performance uplift. With Ada, NVIDIA is introducing its 4th generation of Tensor core (after Volta, Turing, and Ampere). The Tensor cores deployed on Ada are functionally identical to the ones on the Hopper H100 Tensor Core HPC processor, featuring the new FP8 Transformer Engine, which delivers up to 5X the AI inference performance over the previous generation Ampere Tensor Core (which itself delivered a similar leap by leveraging sparsity).
The third-generation RT Core being introduced with Ada offers twice the ray-triangle intersection performance over the "Ampere" RT core, and introduces two new hardware components—Opacity Micromap (OMM) Engine, and Displaced Micro-Mesh (DMM) Engine. OMM accelerates alpha textures often used for elements such as foliage, particles, and fences; while the DMM accelerates BVH build times by a stunning 10X. DLSS 3 will be exclusive to Ada as it relies on the 4th Gen Tensor cores, and the Optical Flow Accelerator component on Ada GPUs, to deliver on the promise of drawing new frames purely using AI, without involving the main graphics rendering pipeline.

We'll give you a more detailed run-down of the Ada architecture as soon as we can.
Add your own comment

22 Comments on NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance

#1
Vayra86
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
Posted on Reply
#2
ratirt
Vayra86Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
and excessive power consumption requirements.
Posted on Reply
#3
bug
Vayra86Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
With traditional raster graphics, SER contributes a meaty 25% performance uplift.
:wtf:
Posted on Reply
#4
Vayra86
bug:wtf:
Oh they gained 25% raster IPC you think? Per halved shader compared to the past? Interesting!
Posted on Reply
#5
Garrus
Vayra86Oh they gained 25% raster IPC you think? Per halved shader compared to the past? Interesting!
so they say, nvidia says a lot of things, most of which should be ignored

if it was true they could have made a simple demo to prove it... they didn't
Posted on Reply
#6
Owen1982
Wow! Look at how many extra picture tiles they had to add to 3rd Gen Full Stack Inventions to make it bigger than the others! I am very impressed with 3rd Gen and the marketing department in general, much Wow /s
Posted on Reply
#7
pavle
Raytracing till you plotz. :rolleyes:
Posted on Reply
#8
bug
Garrusso they say, nvidia says a lot of things, most of which should be ignored

if it was true they could have made a simple demo to prove it... they didn't
How would you demo that? Build a special Ada GPU that didn't include SER? Because you can't control that from software any more than you can control micro-op reordering on an Intel or AMD CPU.
The way to judge that is to wait for reviews and see where performance lands. (Yes, I'm not taking 25% at face value either. But even if inflated, it still points to some beefy generational improvements that @Vayra86 claimed were nowhere to be seen.)
Posted on Reply
#10
Unregistered
More waste of silicon, why can't they just create gaming GPUs without all the nonsense, and professional ones or bring back Titan series with all these technologies.
#11
Daven
There seems to be some ambiguity around the ROP count. Is there an official number yet?
Posted on Reply
#12
Bomby569


saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.



edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.


edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb
videocardz.com/newz/galax-confirms-ad102-300-ad103-300-and-ad104-400-gpus-for-geforce-rtx-4090-4080-series
Posted on Reply
#13
mb194dc
No surprise at all. There's mountains of 3 series stock still. Hence crazy 4 series prices.

Nvidia got to try sell 4 series somehow so making features 4 series exclusive an obvious move. That technical explanation likely to be BS.

Personally doubt it'll work. Not enough consumer care enough to pay the price.

Probably see Nvidia rip their guidance up in a few months.
Posted on Reply
#14
trsttte
Owen1982Wow! Look at how many extra picture tiles they had to add to 3rd Gen Full Stack Inventions to make it bigger than the others! I am very impressed with 3rd Gen and the marketing department in general, much Wow /s
Haha you're not wrong, though the "3rd gen full stack" (whatever the hell that means :confused: ) also includes everything from the previous generations so the extra pictures could be replaced with that, oh well whatever :D
Bomby569

saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.



edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.


edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb
videocardz.com/newz/galax-confirms-ad102-300-ad103-300-and-ad104-400-gpus-for-geforce-rtx-4090-4080-series
They want to force 4000 series sales but will only further contribute to the demise of DLSS. Why invest in optimizing for a technology that only a couple percent of the market can use when the competitors FSR and (soonTM) XeSS support almost the entire market!?

Regarding the 12gb 4080, it's a 4070 with extra marketing shenanigans on top :nutkick:
Posted on Reply
#15
ModEl4
Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
Posted on Reply
#16
Steevo
I’m interested to see what the mesh and opacity hardware can do, the biggest fail of RT is the lack of realistic lighting on objects that has been done on traditional hardware with ease for years now. Not everything is a perfect mirror and RT calculations are compute intensive enough that without dedicated hardware “prebaked goods” opacity/translucency it still doesn’t look real. How many more Ms does it add since it is still another step in the pipeline, or is it a shared resource that is stored and a lookup is all that is required?
Posted on Reply
#17
dyonoctis
Vayra86Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
Maybe the engineers have actually reached a block ? DLSS doesn't seem like a small R&D thing, and everyone ended up doing a similar tech (even Apple with metal FX)
Posted on Reply
#18
Fasola
ModEl4Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
I'm curious, but what's to stop Nvidia (or AMD and Intel with their future equivalents) from just pumping up the frame rate numbers artificially? The 4090 is up to 4 times faster apparently with DLSS 3.
Posted on Reply
#19
Legacy-ZA
Bomby569

saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.



edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.


edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb
videocardz.com/newz/galax-confirms-ad102-300-ad103-300-and-ad104-400-gpus-for-geforce-rtx-4090-4080-series
Suffice it to say, I don't believe a word of it. They have lied so many times before, take G-Sync as an example, lol... <insert french voice> *a few drivers later....*

Hope someone can come up with a solution.
Posted on Reply
#20
trsttte
Legacy-ZASuffice it to say, I don't believe a word of it. They have lied so many times before, take G-Sync as an example, lol... <insert french voice> *a few drivers later....*

Hope someone can come up with a solution.
Their argument seems to be that it will run slower than on newer cards, so an old card is running new games with new technologies slower, what else is new!? :D
Posted on Reply
#21
tehehe
ModEl4Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
If it works how I think it does then it is just an useless gimmick. I think it needs at least two frames worth of buffer to generate fake frames - you need to generate fake frames from something and you will get less artifacts this way, because it is realtively easy to generate intermediate frames between known frames as compared to prediction how future frame will look like when you only have one frame to work with. You will have 300 interpolated fps but input latency of 30 fps (assuming that is what gpu can do natively without fps interpolation) because additional frames are generated outside of game engine and thus outside input and world state update loop. Incerase of performance comes not only from frames but also from reduced latency. Nvidia only gives you increased frames without lower latency. Would love to be wrong about this of course.
Posted on Reply
#22
ModEl4
FasolaI'm curious, but what's to stop Nvidia (or AMD and Intel with their future equivalents) from just pumping up the frame rate numbers artificially? The 4090 is up to 4 times faster apparently with DLSS 3.
latency increase, independent testing regarding image quality/stability for example.
DLSS 3.0 is pumping the frame rate artificially (I know you didn't mean it like that), there is no rendering or CPU involved, it just takes information from previous frame history, motion vector etc and using the trained tensor cores, generating a make-up image lol. (I expect first gen related issues but eventually it will all pan out)
teheheIf it works how I think it does then it is just an useless gimmick. I think it needs at least two frames worth of buffer to generate fake frames - you need to generate fake frames from something and you will get less artifacts this way, because it is realtively easy to generate intermediate frames between known frames as compared to prediction how future frame will look like when you only have one frame to work with. You will have 300 interpolated fps but input latency of 30 fps (assuming that is what gpu can do natively without fps interpolation) because additional frames are generated outside of game engine and thus outside input and world state update loop. Incerase of performance comes not only from frames but also from reduced latency. Nvidia only gives you increased frames without lower latency. Would love to be wrong about this of course.
I don't know exactly the implementation and how Nvidia combat incurred latency, so I will wait for the white paper, but if I understood it seems that use of DLSS 3.0 necessitate use of reflex also.
I expect some issues as this is first gen implementation, but eventually it will pan out.
Also regarding adoption it will be prebuilt in Unreal and Unity so in time we are going to see possibly more and more games using it.
Posted on Reply
Add your own comment
Apr 19th, 2024 20:24 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts