NVIDIA Turing GeForce RTX Technology & Architecture

renz496 · Sep 15, 2018

laszlo said:
it should start with sending developers the tools(cards) well before launching them to market...you don't lay down rail-tracks without having trains ...

Sadly it does not really work if you do it like that either. take tessellation for example. AMD have the hardware since DX10 generation. When tessellation becoming part of DX11 API spec did we see many games using tessellation? GPU maker in the end still need to create some hype about the new feature so consumer will be aware and want it in their games. Remember to support certain feature in their games game developer and publisher also need to gauge consumer interest first before they dedicate more R&D to support it. Else those money are most likely better spend at creating extra content for the game so they can monitize their game better.

TheOne said:
So the 2070 really is going to use the TU106, except for the price it sounds more like a x60, using what would traditionally be a mid range chip with no SLI.

Just looking at the naming alone it should be a mid range chip. Bug from what i heard from leaks TU106 die size is very big for a mid range chip. Even bigger than the usual size for Gx104 chip. Those extra hardware cost a lot of die area.

VSG · Sep 15, 2018

Phew, all the pages are finally written up in case you guys wanted to go over it now.

Fluffmeister · Sep 15, 2018

jabbadap said:
Well videocardz posted nvidia's perf. numbers from review guide...

Indeed! Poor Vega...

notb · Sep 15, 2018

laszlo said:
i don't really understand why they implemented RT so fast as basically there are are only a few RT games now and a few which will be released next year...

But it will get the party going. Now there's at least a reason to utilize RT in games.
Also, they needed just a showcase of this architecture before they'll make a dedicated RT accelerator. It was the same with Tensor Core.

renz496 said:
Sadly it does not really work if you do it like that either. take tessellation for example. AMD have the hardware since DX10 generation. When tessellation becoming part of DX11 API spec did we see many games using tessellation?

Isn't tessellation a standard part of DX11 pipeline? That would mean all games use tessellation.
Honestly, I'm more into rendering than games, so I might be missing something. Care to explain?

RH92 · Sep 15, 2018

lynx29 said:
deep learning aliasing x4 looks exactly the same as TAA x4... did i miss something? was i supposed to be impressed by yet another new aliasing gimmick? ugh...

It looks the same while providing much higher fps that's what you missing !

DLSS is supposed to give you the same image quility as TAA but with much higher fps .
DLSS X2 is supposed to give you higher image quality than TAA with similar fps.

Assimilator · Sep 15, 2018

I wish @W1zzard would write more in-depth articles like this.

I also wish he would ban all the whining morons who don't bother to read the article and understand what NVIDIA is aiming for with RTX, but just jump into the comments section to post incredibly witty things like "ngreedia". Little boys, nobody cares if you think the cards are overpriced.

efikkan · Sep 15, 2018

notb said:
Isn't tessellation a standard part of DX11 pipeline? That would mean all games use tessellation.

The two tessellation shader stages and the geometry shader is optional while rendering, even though it's a standard feature.

cucker tarlson · Sep 15, 2018

Assimilator said:
I wish @W1zzard would write more in-depth articles like this.

I also wish he would ban all the whining morons who don't bother to read the article and understand what NVIDIA is aiming for with RTX, but just jump into the comments section to post incredibly witty things like "ngreedia". Little boys, nobody cares if you think the cards are overpriced.

You expect people like this to bring something else to the discussion NOW ? For most of them writing child-like stuff has literally been the reason for posting anything.

notb · Sep 15, 2018

jabbadap said:
Well videocardz posted nvidia's perf. numbers from review guide...

If this turns out to be correct... Top cards are at +50% compared to last gen even in games that weren't optimized to use RT or Tensor cores. This means that Nvidia either managed to push some calculation into new hardware anyway (awesome) or the fewer cores manage these numbers on their own (still very good).

efikkan said:
The two tessellation shader stages and the geometry shader is optional while rendering, even though it's a standard feature.

Well... maybe they simply aren't very good or hard to use? Or maybe AMD simply is very difficult to work with for game developers?
It's their fault either way.

efikkan · Sep 15, 2018

notb said:
Well... maybe they simply aren't very good or hard to use? Or maybe AMD simply is very difficult to work with for game developers?

It's their fault either way.

Tessellation is primarily a technique to generate smooth transitions between different levels of detail in meshes, and bring savings in memory bandwidth and vertex processing as well. It's not meant to be used like in Unigine Heaven. Let's say you model something in Blender, then you subdivide it and make finer details, but this high detail model have a ridiculously high polygon count. Then you generate a displacement map of the difference between the low and high detailed model. Then finally you can in realtime tessellate the low detail model using the displacement map to render a detailed model very close to your model in Blender, with much lower performance requirements. Additionally the tessellation engine can interpolate a smooth transition between the two detail levels. The tessellation engine is a fixed function piece of hardware that sits between the vertex shader and geometry shader in the pipeline, so the GPU processes the vertices as a low resolution mesh, then subdivide it in the tessellator, and then the high resolution mesh is discarded after rasterizing. So this high resolution model only exists for a brief moment.

But, what was the question?

Why is it not used extensively?
Because it requires the models to be crafted in a special way. It's extra effort for developers, development time is usually a constraint.

Why does AMD perform worse in tessellation?
AMD did several attempts before Direct3D 11. They had something very basic in their Direct3D 9 hardware, and a much more advanced one in their Direct3D 10 hardware. Nvidia did also have an implementation in their Direct3D 10 hardware. The performance comes down to their tessellation engine, and this is not the fault of the game developers, this is this a hardware feature, not a driver or game issue.

renz496 · Sep 15, 2018

notb said:
But it will get the party going. Now there's at least a reason to utilize RT in games.
Also, they needed just a showcase of this architecture before they'll make a dedicated RT accelerator. It was the same with Tensor Core.

Isn't tessellation a standard part of DX11 pipeline? That would mean all games use tessellation.
Honestly, I'm more into rendering than games, so I might be missing something. Care to explain?

no. developer still need to support the feature in their games. even in DX11 certain games still toggle the effect of tessellation on and off. meaning tessellation is not something that is automatically applied to the game. it is similar to DX12 async compute. you can turn it on and of even with DX12. when DX11 was first being introduced some developer decided to port their games just for the sake of increased performance using DX11 vs DX9/10.

newtekie1 · Sep 15, 2018

jaw shwaa said:
So , no sli on 2070 at all? That seems a little weird. Maybe this is the wrong place to ask, but why are Nvidia still using sli/link adaptors between cards when amd haven't had to use xfire link connectors since the r9 290/290x?
Asking for a friend.....XD

The article touches on what the SLI(and Crossfire) links do up until Turing changed it. In the past, all the SLI/Crossfire link did was transfer the complete rendered frame from the secondary card/s to the primary card so the primary card could then send it out to the display.

The problem is as resolutions increased, the amount of data needed to be send over the link increased as well. AMD's crossfire link ran out of bandwidth first, it was not fast enough to transfer 4k resolution frames. So they had a choice, develop another Crossfire link(that would make their 3rd crossfire link version) or just transfer the data over the PCI-E bus on the motherboard. They decided to go with transferring the data over the PCI-E bus on the motherboard. There are pros and cons to this. Obviously the big pro is AMD saved a lot of money on R&D. The big con is when both cards aren't operating at x16/x16, transferring the crossfire data over the PCI-E bus is using bandwidth where there may not be bandwidth to spare. On the nVidia side, their SLI connector had enough bandwidth for 4k, so there was really no need to change. However, going beyond 4k, the old SLI connector doesn't have enough bandwidth. So nVidia is finally in the same situation AMD was, but they choose to go the other route and develop a new SLI connector, but they really didn't have to, because they are just re-using an already develop connection that they needed to develop for the server market anyway.

And, IMO, they were smart about it. The NVLink connector is basically a PCI-E x8 form factor, so I believe they can use off the shelf components to build the bridges. And the protocol it uses to communicate is basically PCI-E. So the two cards have a direct PCI-E x16(or x8 depending on the card) connection between each other. That is a crap ton of bandwidth, and not likely to run out any time soon. Plus, as the PCI-E versions increase, so will the bandwidth they can build into their NVLink connection.

renz496 said:
Just looking at the naming alone it should be a mid range chip. Bug from what i heard from leaks TU106 die size is very big for a mid range chip. Even bigger than the usual size for Gx104 chip. Those extra hardware cost a lot of die area.

Yeah, just going by internal naming alone, TU106 should be in the mid-range card, but TU106 is a different design than we're used to seeing on the mid-range chip.

Traditionally the second chip in the stack is about 1/3rd smaller than the top chip, and we see that trend continue here. However, the 3rd chip, the TU106 equivalent, is usually 50% the size of the 2nd chip. In the case of TU106 though, it is instead 50% smaller than the biggest chip. So it has the same memory bus as the 2nd biggest chip(TU104), has 25% fewer shaders than TU104 though, which is about how much they usually disable to get the XX70 card.

notb · Sep 15, 2018

efikkan said:
But, what was the question?

There was no technical question. I know very well what tessellation is. :-)

I don't know why AMD solution didn't get traction. Maybe they don't cooperate well with developers?
Or maybe it's simply really bad? Do we consider such opportunity? Are there any reference values of possible performance gain?

Because it requires the models to be crafted in a special way. It's extra effort for developers, development time is usually a constraint.

So? RTRT also requires developers to spend more time. :-)

AMD did several attempts before Direct3D 11.

We could say their first hardware "attempt" was already in the beginning, since ATI already had a hardware solution. :-)

The performance comes down to their tessellation engine, and this is not the fault of the game developers, this is this a hardware feature, not a driver or game issue.

So? Again: RT and Tensor Cores are also hardware features.
Under the hood there is no "ray tracing" or "tessellation". There are only simple mathematical operations. :-)

Hardware is not good at "tessellation". It's good at particular mathematical operation. I don't understand why these "tessellators" can't be used otherwise.
Tensor cores can be used for faster AA. It doesn't happen automatically. Someone has simply done the work needed to implement it.
I'm sure we'll learn many uses for RT cores as well.

renz496 said:
no. developer still need to support the feature in their games.

I think we're mixing 2 things (just like with RTRT).
When you say "tessellation" you mean the operation performed in rendering pipeline or the hardware implementation AMD offers?

efikkan · Sep 15, 2018

notb said:
I don't know why AMD solution didn't get traction. Maybe they don't cooperate well with developers?

I don't understand, are you talking about the tessellation support before Direct3D 11? In Direct3D 10 and OpenGL 3.x AMD and Nvidia created their own extensions that developers had to support, but all of these were late to the party, so no game developers bothered to utilize it.

When it comes to tessellation support in Direct3D 11, OpenGL 4 and Vulkan, it's vendor independent. If you develop a game you don't add support for AMD's hardware, the APIs are standardized, which you know is kind of the point.

notb said:
efikkan said:

The performance comes down to their tessellation engine, and this is not the fault of the game developers, this is this a hardware feature, not a driver or game issue.

Click to expand...

So? Again: RT and Tensor Cores are also hardware features.

Under the hood there is no "ray tracing" or "tessellation". There are only simple mathematical operations.

Hardware is not good at "tessellation". It's good at particular mathematical operation. I don't understand why these "tessellators" can't be used otherwise.

Tensor cores can be used for faster AA. It doesn't happen automatically. Someone has simply done the work needed to implement it.

I'm sure we'll learn many uses for RT cores as well.

GPUs are more than just parallel math processors, there are numbers of dedicated hardware resources, like TMUs, ROPs, etc. Turing adds RT cores, which are fixed function raytracing cores. The process or accelerated raytracing on Turing uses these along with FPUs, ALUs and tensor cores to do the rendering, but there is dedicated raytracing hardware to do part of the heavy lifting. Similarly, GPUs have dedicated hardware to spawn extra verticies to do tessellation. This is not a purely FPU based process. You could do tessellation in software, but that would require creating the verticies from the driver side.

This is how Nvidia described it back in the day:

GeForce GTX 400 GPUs are built with up to fifteen tessellation units, each with dedicated hardware for vertex fetch, tessellation, and coordinate transformations. They operate with four parallel raster engines which transform newly tessellated triangles into a fine stream of pixels for shading. The result is a breakthrough in tessellation performance—over 1.6 billion triangles per second in sustained performance.

TheOne · Sep 16, 2018

Not sure if this is posted anywhere on the forum, but apparently NVIDIA posted on their forums friday that the 2080 Ti has been delayed a week.

https://forums.geforce.com/default/...ries/geforce-rtx-2080-ti-availability-update/

ericnvidia80 said:
Hi Everyone,

Wanted to give you an update on the GeForce RTX 2080 Ti availability.

GeForce RTX 2080 Ti general availability has shifted to September 27th, a one week delay. We expect pre-orders to arrive between September 20th and September 27th.

There is no change to GeForce RTX 2080 general availability, which is September 20th.

We’re eager for you to enjoy the new GeForce RTX family! Thanks for your patience.

WikiFM · Sep 16, 2018

So now that there is RT cores, will Tensor Cores be used for raytracing too or just the RT cores? If just RT cores then Nvidia should prioritize more RT cores and take out the Tensor cores, for more RTX performance. Is not ideal to push both RT and Tensor at the same time.

ppn · Sep 16, 2018

RT And tensor take half the space that would have been occupied by cores.
Why We could have had 4608 cores instead of 2304 on 2070. But it is what it is. I have no use for AA and RT on 1440p, so it is really hopeless.
We get DLSS tensor core accelerated and 1 RT core per SM of 64 for 60 FPS 720P gaming now.
I read tensor core can also do accelerated denoisers to cleanup real time raytraced rendering.

I hope next gen will include 1 RT per SM of 32 CUDA 6144 in total.

efikkan · Sep 16, 2018

ppn said:
RT And tensor take half the space that would have been occupied by cores.

Why We could have had 4608 cores instead of 2304 on 2070.

No, not actually. The chip relies heavily on power gating, all of the resources can't run at maximum speed at once. The alternative would be to not have them at all.

cucker tarlson · Sep 16, 2018

Wonder if we're gonna see a huge uplift in RTRT performance once 7nm is here and helps lift some of those power restraints.

notb · Sep 16, 2018

ppn said:
RT And tensor take half the space that would have been occupied by cores.
Why We could have had 4608 cores instead of 2304 on 2070

Your dream is fairly unlikely. A chip like that would pull 300W. :-D
That isn't happening in the middle segment.

WikiFM · Sep 16, 2018

efikkan said:
No, not actually. The chip relies heavily on power gating, all of the resources can't run at maximum speed at once. The alternative would be to not have them at all.

I thought that some CUDA rasterization is done, while the RT ray traicing and Tensor denoising. All at the same time, because not all the rendering is ray-traced when using RTX.

notb said:
Your dream is fairly unlikely. A chip like that would pull 300W. :-D
That isn't happening in the middle segment.

The 2080 Ti has almost that amount of CUDA cores plus RT and Tensor and is 250 W.

Deleted member 177333 · Sep 16, 2018

Always neat to read more about the technological advances being made - I still feel the pricing is out of this world ridiculous though. I'll probably be waiting for the 3000 series before I look at my next upgrade.

notb · Sep 16, 2018

WikiFM said:
The 2080 Ti has almost that amount of CUDA cores plus RT and Tensor and is 250 W.

300W, 250W... similarly unacceptable. They wanted to keep 2070 around 1070's 150W. 175W is already fairly high. I bet we'll see more frugal versions of this card.
It's not AMD. Nvidia really cares about power consumption and heat. You want 300W? You can get 2080Ti or whatever answer the Red team is going to give.

WikiFM · Sep 17, 2018

notb said:
300W, 250W... similarly unacceptable. They wanted to keep 2070 around 1070's 150W. 175W is already fairly high. I bet we'll see more frugal versions of this card.
It's not AMD. Nvidia really cares about power consumption and heat. You want 300W? You can get 2080Ti or whatever answer the Red team is going to give.

Well it depends how much performance do you want however power consumption increases too, if you prefer lower power consumption then pick a slower card. If 2070 is still too hungry wait 2060.

Pixrazor · Sep 17, 2018

So the specs I was thinking about are going to be true, and now seeing this "one click overclock", we will expect Turing in the 2150-2200 mhz range, a little jump from Pascal and Volta's 2000-2100 mhz.
Then I will translate the cuda cores thing into ROPs and TMUs for us hihi, and add the theoretical Tflops at the overclock expected of 2150 mhz and finally compare it to previous gen (at 2000mhz) if you want too ^^

rtx 2080ti: 88 ROPs 272 TMUs 18.7 Tflops
rtx 2080: 64 ROPs 184 TMUs 12.7 Tflops
rtx 2070: 64 ROPs 144 TMUs 9.9 Tflops

gtx 1080ti: 88 ROPs 224 TMUs 14.3 Tflops
gtx 1080: 64 ROPs 160 TMUs 10.2 Tflops
gtx 1070: 64 ROPs 120 TMUs 7.7 Tflops

gtx 1070 to rtx 2070: 29 % more performance
gtx 1080 to rtx 2080: 23.63 % more performance
gtx 1080 ti to rtx 2080 ti: 30.54 % more performance

Foot notes:
if the quadro is overclocked to the same level as the other rtx, it will be near the Titan V performance

Processor	AMD Ryzen 7 5700X3D
Motherboard	MSI MAG B550 TOMAHAWK
Cooling	Thermalright Peerless Assassin 120 SE
Memory	Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s)	Sapphire AMD Radeon RX 9070 XT NITRO+
Storage	Kingston A2000 1TB + Seagate HDD workhorse
Display(s)	Hisense 55" U7K 4K@144Hz
Case	Thermaltake Ceres 500 TG ARGB
Power Supply	Seasonic Focus GX-850
Mouse	Razer Deathadder Chroma
Keyboard	Logitech UltraX
Software	Windows 11

Processor	RYZEN 7 5800X3D
Motherboard	Aorus B-550I Pro AX
Cooling	HEATKILLER IV PRO , EKWB Vector FTW3 3080/3090 , Barrow res + Xylem DDC 4.2, SE 240 + Dabel 20b 240
Memory	Viper Steel 4000 PVS416G400C6K
Video Card(s)	EVGA 3080Ti FTW3
Storage	XPG SX8200 Pro 512 GB NVMe + Samsung 980 1TB
Display(s)	ROG Strix OLED XG27AQDMG
Case	NR 200
Power Supply	CORSAIR SF750
Mouse	Logitech G PRO
Keyboard	Meletrix Zoom 75 GT Silver
Software	Windows 11 22H2

System Name	Firelance.
Processor	Threadripper 3960X
Motherboard	ROG Strix TRX40-E Gaming
Cooling	IceGem 360 + 6x Arctic Cooling P12
Memory	8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s)	MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage	2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s)	Dell S3221QS(A) (32" 38x21 60Hz) + 2x AOC Q32E2N (32" 25x14 75Hz)
Case	Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply	Fractal Design Ion+ 2 Platinum 760W
Mouse	Logitech G604
Keyboard	Razer Pro Type Ultra
Software	Windows 10 Professional x64

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	Purple rain
Processor	10.5 thousand 4.2G 1.1v
Motherboard	Zee 490 Aorus Elite
Cooling	Noctua D15S
Memory	16GB 4133 CL16-16-16-31 Viper Steel
Video Card(s)	RTX 2070 Super Gaming X Trio
Storage	SU900 128,8200Pro 1TB,850 Pro 512+256+256,860 Evo 500,XPG950 480, Skyhawk 2TB
Display(s)	Acer XB241YU+Dell S2716DG
Case	P600S Silent w. Alpenfohn wing boost 3 ARGBT+ fans
Audio Device(s)	K612 Pro w. FiiO E10k DAC,W830BT wireless
Power Supply	Superflower Leadex Gold 850W
Mouse	G903 lightspeed+powerplay,G403 wireless + Steelseries DeX + Roccat rest
Keyboard	HyperX Alloy SilverSpeed (w.HyperX wrist rest),Razer Deathstalker
Software	Windows 10
Benchmark Scores	A LOT

NVIDIA Turing GeForce RTX Technology & Architecture

renz496

VSG

Editor, Reviews & News

Fluffmeister

notb

RH92

Assimilator

efikkan

cucker tarlson

notb

efikkan

renz496

newtekie1

Semi-Retired Folder

notb

efikkan

TheOne

WikiFM

ppn

efikkan

cucker tarlson

notb

WikiFM

Deleted member 177333

Guest

notb

WikiFM

Pixrazor

Processor	Intel Core i7 10850K@5.2GHz
Motherboard	AsRock Z470 Taichi
Cooling	Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory	32GB DDR4-3600
Video Card(s)	RTX 2070 Super
Storage	500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s)	Acer Nitro VG280K 4K 28"
Case	Fractal Design Define S
Audio Device(s)	Onboard is good enough for me
Power Supply	eVGA SuperNOVA 1000w G3
Software	Windows 10 Pro x64

System Name	Desktop
Processor	AMD Ryzen 5 5600X [3.7GHz/4.6GHz][6C/12T]
Motherboard	ASUS TUF Gaming X570-PRO [X570]
Cooling	Cooler Master Hyper 212 RGB Black Edition
Memory	G.SKILL Ripjaws V Series 32GB [DDR4 3600][2x16GB][16-19-19-39@1.35V]
Video Card(s)	ASUS KO GeForce RTX 3060 Ti V2 OC Edition 8GB GDDR6 [511.65]
Storage	[OS] Samsung 970 Evo 500GB \| [Storage] 980 1TB \| 860 Evo 1TB \| 850 Evo 500GB \| Seagate Firecuda 2TB
Display(s)	LG 27GL850 [27"][2560x1440@144Hz][Nano IPS][LED][G-SYNC Compatible][DP]
Case	Corsair Obsidian 750D
Audio Device(s)	Realtek ALC S1200A High Definition Audio CODEC
Power Supply	EVGA SuperNOVA 1000 G1+ [+12V: 83.3A 999.6W][80 Plus Gold]
Mouse	Logitech M570 Trackball
Keyboard	Corsair Gaming K55 RGB
Software	Microsoft Windows 10 Pro [21H1][64-bit]

System Name	N/A
Processor	Intel Core i5 3570
Motherboard	Gigabyte B75
Cooling	Coolermaster Hyper TX3
Memory	12 GB DDR3 1600
Video Card(s)	MSI Gaming Z RTX 2060
Storage	SSD
Display(s)	Samsung 4K HDR 60 Hz TV
Case	Eagle Warrior Gaming
Audio Device(s)	N/A
Power Supply	Coolermaster Elite 460W
Mouse	Vorago KM500
Keyboard	Vorago KM500
Software	Windows 10
Benchmark Scores	N/A

System Name	Righolder
Processor	intel i5-4590
Motherboard	Crap Mobo
Memory	Gskill Trident-X 2133mhz 9-11-10-28-1N @1333
Video Card(s)	r9 fury nitro 1020/500
Display(s)	Philips 227ELH
Case	Deepcool Dukase
Power Supply	Raidmax RX-1200AE
Software	Win 10 64bit