• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Turing GeForce RTX Technology & Architecture

it should start with sending developers the tools(cards) well before launching them to market...you don't lay down rail-tracks without having trains ...
Sadly it does not really work if you do it like that either. take tessellation for example. AMD have the hardware since DX10 generation. When tessellation becoming part of DX11 API spec did we see many games using tessellation? GPU maker in the end still need to create some hype about the new feature so consumer will be aware and want it in their games. Remember to support certain feature in their games game developer and publisher also need to gauge consumer interest first before they dedicate more R&D to support it. Else those money are most likely better spend at creating extra content for the game so they can monitize their game better.

So the 2070 really is going to use the TU106, except for the price it sounds more like a x60, using what would traditionally be a mid range chip with no SLI.
Just looking at the naming alone it should be a mid range chip. Bug from what i heard from leaks TU106 die size is very big for a mid range chip. Even bigger than the usual size for Gx104 chip. Those extra hardware cost a lot of die area.
 
Phew, all the pages are finally written up in case you guys wanted to go over it now.
 
i don't really understand why they implemented RT so fast as basically there are are only a few RT games now and a few which will be released next year...
But it will get the party going. Now there's at least a reason to utilize RT in games.
Also, they needed just a showcase of this architecture before they'll make a dedicated RT accelerator. It was the same with Tensor Core.
Sadly it does not really work if you do it like that either. take tessellation for example. AMD have the hardware since DX10 generation. When tessellation becoming part of DX11 API spec did we see many games using tessellation?
Isn't tessellation a standard part of DX11 pipeline? That would mean all games use tessellation.
Honestly, I'm more into rendering than games, so I might be missing something. Care to explain?
 
deep learning aliasing x4 looks exactly the same as TAA x4... did i miss something? was i supposed to be impressed by yet another new aliasing gimmick? ugh...

It looks the same while providing much higher fps that's what you missing !

DLSS is supposed to give you the same image quility as TAA but with much higher fps .
DLSS X2 is supposed to give you higher image quality than TAA with similar fps.
 
Last edited:
I wish @W1zzard would write more in-depth articles like this.

I also wish he would ban all the whining morons who don't bother to read the article and understand what NVIDIA is aiming for with RTX, but just jump into the comments section to post incredibly witty things like "ngreedia". Little boys, nobody cares if you think the cards are overpriced.
 
Isn't tessellation a standard part of DX11 pipeline? That would mean all games use tessellation.
The two tessellation shader stages and the geometry shader is optional while rendering, even though it's a standard feature.
 
I wish @W1zzard would write more in-depth articles like this.

I also wish he would ban all the whining morons who don't bother to read the article and understand what NVIDIA is aiming for with RTX, but just jump into the comments section to post incredibly witty things like "ngreedia". Little boys, nobody cares if you think the cards are overpriced.
You expect people like this to bring something else to the discussion NOW ? For most of them writing child-like stuff has literally been the reason for posting anything.
 
Last edited:
Well videocardz posted nvidia's perf. numbers from review guide...
If this turns out to be correct... Top cards are at +50% compared to last gen even in games that weren't optimized to use RT or Tensor cores. This means that Nvidia either managed to push some calculation into new hardware anyway (awesome) or the fewer cores manage these numbers on their own (still very good).
The two tessellation shader stages and the geometry shader is optional while rendering, even though it's a standard feature.
Well... maybe they simply aren't very good or hard to use? Or maybe AMD simply is very difficult to work with for game developers?
It's their fault either way.
 
  • Like
Reactions: ppn
Well... maybe they simply aren't very good or hard to use? Or maybe AMD simply is very difficult to work with for game developers?

It's their fault either way.
Tessellation is primarily a technique to generate smooth transitions between different levels of detail in meshes, and bring savings in memory bandwidth and vertex processing as well. It's not meant to be used like in Unigine Heaven. Let's say you model something in Blender, then you subdivide it and make finer details, but this high detail model have a ridiculously high polygon count. Then you generate a displacement map of the difference between the low and high detailed model. Then finally you can in realtime tessellate the low detail model using the displacement map to render a detailed model very close to your model in Blender, with much lower performance requirements. Additionally the tessellation engine can interpolate a smooth transition between the two detail levels. The tessellation engine is a fixed function piece of hardware that sits between the vertex shader and geometry shader in the pipeline, so the GPU processes the vertices as a low resolution mesh, then subdivide it in the tessellator, and then the high resolution mesh is discarded after rasterizing. So this high resolution model only exists for a brief moment.

But, what was the question?

Why is it not used extensively?
Because it requires the models to be crafted in a special way. It's extra effort for developers, development time is usually a constraint.

Why does AMD perform worse in tessellation?
AMD did several attempts before Direct3D 11. They had something very basic in their Direct3D 9 hardware, and a much more advanced one in their Direct3D 10 hardware. Nvidia did also have an implementation in their Direct3D 10 hardware. The performance comes down to their tessellation engine, and this is not the fault of the game developers, this is this a hardware feature, not a driver or game issue.
 
  • Like
Reactions: ppn
But it will get the party going. Now there's at least a reason to utilize RT in games.
Also, they needed just a showcase of this architecture before they'll make a dedicated RT accelerator. It was the same with Tensor Core.

Isn't tessellation a standard part of DX11 pipeline? That would mean all games use tessellation.
Honestly, I'm more into rendering than games, so I might be missing something. Care to explain?

no. developer still need to support the feature in their games. even in DX11 certain games still toggle the effect of tessellation on and off. meaning tessellation is not something that is automatically applied to the game. it is similar to DX12 async compute. you can turn it on and of even with DX12. when DX11 was first being introduced some developer decided to port their games just for the sake of increased performance using DX11 vs DX9/10.
 
So , no sli on 2070 at all? That seems a little weird. Maybe this is the wrong place to ask, but why are Nvidia still using sli/link adaptors between cards when amd haven't had to use xfire link connectors since the r9 290/290x?
Asking for a friend.....XD

The article touches on what the SLI(and Crossfire) links do up until Turing changed it. In the past, all the SLI/Crossfire link did was transfer the complete rendered frame from the secondary card/s to the primary card so the primary card could then send it out to the display.

The problem is as resolutions increased, the amount of data needed to be send over the link increased as well. AMD's crossfire link ran out of bandwidth first, it was not fast enough to transfer 4k resolution frames. So they had a choice, develop another Crossfire link(that would make their 3rd crossfire link version) or just transfer the data over the PCI-E bus on the motherboard. They decided to go with transferring the data over the PCI-E bus on the motherboard. There are pros and cons to this. Obviously the big pro is AMD saved a lot of money on R&D. The big con is when both cards aren't operating at x16/x16, transferring the crossfire data over the PCI-E bus is using bandwidth where there may not be bandwidth to spare. On the nVidia side, their SLI connector had enough bandwidth for 4k, so there was really no need to change. However, going beyond 4k, the old SLI connector doesn't have enough bandwidth. So nVidia is finally in the same situation AMD was, but they choose to go the other route and develop a new SLI connector, but they really didn't have to, because they are just re-using an already develop connection that they needed to develop for the server market anyway.

And, IMO, they were smart about it. The NVLink connector is basically a PCI-E x8 form factor, so I believe they can use off the shelf components to build the bridges. And the protocol it uses to communicate is basically PCI-E. So the two cards have a direct PCI-E x16(or x8 depending on the card) connection between each other. That is a crap ton of bandwidth, and not likely to run out any time soon. Plus, as the PCI-E versions increase, so will the bandwidth they can build into their NVLink connection.

Just looking at the naming alone it should be a mid range chip. Bug from what i heard from leaks TU106 die size is very big for a mid range chip. Even bigger than the usual size for Gx104 chip. Those extra hardware cost a lot of die area.

Yeah, just going by internal naming alone, TU106 should be in the mid-range card, but TU106 is a different design than we're used to seeing on the mid-range chip.

Traditionally the second chip in the stack is about 1/3rd smaller than the top chip, and we see that trend continue here. However, the 3rd chip, the TU106 equivalent, is usually 50% the size of the 2nd chip. In the case of TU106 though, it is instead 50% smaller than the biggest chip. So it has the same memory bus as the 2nd biggest chip(TU104), has 25% fewer shaders than TU104 though, which is about how much they usually disable to get the XX70 card.
 
Last edited:
But, what was the question?
There was no technical question. I know very well what tessellation is. :-)
I don't know why AMD solution didn't get traction. Maybe they don't cooperate well with developers?
Or maybe it's simply really bad? Do we consider such opportunity? Are there any reference values of possible performance gain?
Because it requires the models to be crafted in a special way. It's extra effort for developers, development time is usually a constraint.
So? RTRT also requires developers to spend more time. :-)
AMD did several attempts before Direct3D 11.
We could say their first hardware "attempt" was already in the beginning, since ATI already had a hardware solution. :-)
The performance comes down to their tessellation engine, and this is not the fault of the game developers, this is this a hardware feature, not a driver or game issue.
So? Again: RT and Tensor Cores are also hardware features.
Under the hood there is no "ray tracing" or "tessellation". There are only simple mathematical operations. :-)
Hardware is not good at "tessellation". It's good at particular mathematical operation. I don't understand why these "tessellators" can't be used otherwise.
Tensor cores can be used for faster AA. It doesn't happen automatically. Someone has simply done the work needed to implement it.
I'm sure we'll learn many uses for RT cores as well.

no. developer still need to support the feature in their games.
I think we're mixing 2 things (just like with RTRT).
When you say "tessellation" you mean the operation performed in rendering pipeline or the hardware implementation AMD offers?
 
I don't know why AMD solution didn't get traction. Maybe they don't cooperate well with developers?
I don't understand, are you talking about the tessellation support before Direct3D 11? In Direct3D 10 and OpenGL 3.x AMD and Nvidia created their own extensions that developers had to support, but all of these were late to the party, so no game developers bothered to utilize it.

When it comes to tessellation support in Direct3D 11, OpenGL 4 and Vulkan, it's vendor independent. If you develop a game you don't add support for AMD's hardware, the APIs are standardized, which you know is kind of the point.

The performance comes down to their tessellation engine, and this is not the fault of the game developers, this is this a hardware feature, not a driver or game issue.
So? Again: RT and Tensor Cores are also hardware features.

Under the hood there is no "ray tracing" or "tessellation". There are only simple mathematical operations. :)

Hardware is not good at "tessellation". It's good at particular mathematical operation. I don't understand why these "tessellators" can't be used otherwise.

Tensor cores can be used for faster AA. It doesn't happen automatically. Someone has simply done the work needed to implement it.

I'm sure we'll learn many uses for RT cores as well.
GPUs are more than just parallel math processors, there are numbers of dedicated hardware resources, like TMUs, ROPs, etc. Turing adds RT cores, which are fixed function raytracing cores. The process or accelerated raytracing on Turing uses these along with FPUs, ALUs and tensor cores to do the rendering, but there is dedicated raytracing hardware to do part of the heavy lifting. Similarly, GPUs have dedicated hardware to spawn extra verticies to do tessellation. This is not a purely FPU based process. You could do tessellation in software, but that would require creating the verticies from the driver side.

This is how Nvidia described it back in the day:
GeForce GTX 400 GPUs are built with up to fifteen tessellation units, each with dedicated hardware for vertex fetch, tessellation, and coordinate transformations. They operate with four parallel raster engines which transform newly tessellated triangles into a fine stream of pixels for shading. The result is a breakthrough in tessellation performance—over 1.6 billion triangles per second in sustained performance.
 
Not sure if this is posted anywhere on the forum, but apparently NVIDIA posted on their forums friday that the 2080 Ti has been delayed a week.

https://forums.geforce.com/default/...ries/geforce-rtx-2080-ti-availability-update/

ericnvidia80 said:
Hi Everyone,

Wanted to give you an update on the GeForce RTX 2080 Ti availability.

GeForce RTX 2080 Ti general availability has shifted to September 27th, a one week delay. We expect pre-orders to arrive between September 20th and September 27th.

There is no change to GeForce RTX 2080 general availability, which is September 20th.

We’re eager for you to enjoy the new GeForce RTX family! Thanks for your patience.
 
So now that there is RT cores, will Tensor Cores be used for raytracing too or just the RT cores? If just RT cores then Nvidia should prioritize more RT cores and take out the Tensor cores, for more RTX performance. Is not ideal to push both RT and Tensor at the same time.
 
RT And tensor take half the space that would have been occupied by cores.
Why We could have had 4608 cores instead of 2304 on 2070. But it is what it is. I have no use for AA and RT on 1440p, so it is really hopeless.
We get DLSS tensor core accelerated and 1 RT core per SM of 64 for 60 FPS 720P gaming now.
I read tensor core can also do accelerated denoisers to cleanup real time raytraced rendering.

I hope next gen will include 1 RT per SM of 32 CUDA 6144 in total.
 
RT And tensor take half the space that would have been occupied by cores.

Why We could have had 4608 cores instead of 2304 on 2070.
No, not actually. The chip relies heavily on power gating, all of the resources can't run at maximum speed at once. The alternative would be to not have them at all.
 
Wonder if we're gonna see a huge uplift in RTRT performance once 7nm is here and helps lift some of those power restraints.
 
RT And tensor take half the space that would have been occupied by cores.
Why We could have had 4608 cores instead of 2304 on 2070
Your dream is fairly unlikely. A chip like that would pull 300W. :-D
That isn't happening in the middle segment.
 
No, not actually. The chip relies heavily on power gating, all of the resources can't run at maximum speed at once. The alternative would be to not have them at all.

I thought that some CUDA rasterization is done, while the RT ray traicing and Tensor denoising. All at the same time, because not all the rendering is ray-traced when using RTX.

Your dream is fairly unlikely. A chip like that would pull 300W. :-D
That isn't happening in the middle segment.

The 2080 Ti has almost that amount of CUDA cores plus RT and Tensor and is 250 W.
 
Always neat to read more about the technological advances being made - I still feel the pricing is out of this world ridiculous though. I'll probably be waiting for the 3000 series before I look at my next upgrade.
 
The 2080 Ti has almost that amount of CUDA cores plus RT and Tensor and is 250 W.
300W, 250W... similarly unacceptable. They wanted to keep 2070 around 1070's 150W. 175W is already fairly high. I bet we'll see more frugal versions of this card.
It's not AMD. Nvidia really cares about power consumption and heat. You want 300W? You can get 2080Ti or whatever answer the Red team is going to give.
 
300W, 250W... similarly unacceptable. They wanted to keep 2070 around 1070's 150W. 175W is already fairly high. I bet we'll see more frugal versions of this card.
It's not AMD. Nvidia really cares about power consumption and heat. You want 300W? You can get 2080Ti or whatever answer the Red team is going to give.

Well it depends how much performance do you want however power consumption increases too, if you prefer lower power consumption then pick a slower card. If 2070 is still too hungry wait 2060.
 
So the specs I was thinking about are going to be true, and now seeing this "one click overclock", we will expect Turing in the 2150-2200 mhz range, a little jump from Pascal and Volta's 2000-2100 mhz.
Then I will translate the cuda cores thing into ROPs and TMUs for us hihi, and add the theoretical Tflops at the overclock expected of 2150 mhz and finally compare it to previous gen (at 2000mhz) if you want too ^^

rtx 2080ti: 88 ROPs 272 TMUs 18.7 Tflops
rtx 2080: 64 ROPs 184 TMUs 12.7 Tflops
rtx 2070: 64 ROPs 144 TMUs 9.9 Tflops

gtx 1080ti: 88 ROPs 224 TMUs 14.3 Tflops
gtx 1080: 64 ROPs 160 TMUs 10.2 Tflops
gtx 1070: 64 ROPs 120 TMUs 7.7 Tflops

Capture.JPG


gtx 1070 to rtx 2070: 29 % more performance
gtx 1080 to rtx 2080: 23.63 % more performance
gtx 1080 ti to rtx 2080 ti: 30.54 % more performance

Foot notes:
if the quadro is overclocked to the same level as the other rtx, it will be near the Titan V performance
 
Back
Top