• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Extends DirectX Raytracing (DXR) Support to Many GeForce GTX GPUs

bug

Joined
May 22, 2015
Messages
13,225 (4.06/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Bringing ultra-competitive low margin automotive into this is beyond ridiculous.
And reducing the whole Turing line to the ridiculously expensive 2080Ti isn't?
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Turing seems to be far better than Pascal when it comes to Async Compute, are you sure the deficiencies of Pascal in this area, apply to Turing overall?
Remember that RTX 2060 has 10.8 billion transistors versus GTX 1080 Ti's 11.8 billion transistors so they're pretty well matched hardware wise. Running DXR, 2060 does about double 1080 Ti. The explanation for this is actually simple: 2060 can do FP16 and FP32 simultaneously where 1080 Ti can only do FP32. 1080 Ti, therefore, has to spend twice as much time to get the same result, ergo, half the FPS.

Polaris and down might experience the same problem GTX 1080 Ti does, but Vega (12.5 billion transistors) may be able to keep pace with RTX 2060 in DXR if they're able to get FP32 out of some of the cores and 2xFP16 out of other cores.

Not gonna stop with the per-game launch articles

Which game would you choose instead of Metro? and why?
My two cents: it's too early to be benchmarking DXR because all results are going to be biased towards NVIDIA RTX cards by design. We really need to see AMD's response before it's worth testing. I mean, to benchmark now is just going to state the obvious (NVIDIA's product stack makes it clear which should perform better than the next).
 
Joined
Sep 17, 2014
Messages
20,934 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
My two cents: it's too early to be benchmarking DXR because all results are going to be biased towards NVIDIA RTX cards by design. We really need to see AMD's response before it's worth testing. I mean, to benchmark now is just going to state the obvious (NVIDIA's product stack makes it clear which should perform better than the next).

:toast:
 
Joined
Mar 10, 2014
Messages
1,793 (0.48/day)
Remember that RTX 2060 has 10.8 billion transistors versus GTX 1080 Ti's 11.8 billion transistors so they're pretty well matched hardware wise. Running DXR, 2060 does about double 1080 Ti. The explanation for this is actually simple: 2060 can do FP16 and FP32 simultaneously where 1080 Ti can only do FP32. 1080 Ti, therefore, has to spend twice as much time to get the same result, ergo, half the FPS.

Polaris and down might experience the same problem GTX 1080 Ti does, but Vega (12.5 billion transistors) may be able to keep pace with RTX 2060 in DXR if they're able to get FP32 out of some of the cores and 2xFP16 out of other cores.


My two cents: it's too early to be benchmarking DXR because all results are going to be biased towards NVIDIA RTX cards by design. We really need to see AMD's response before it's worth testing. I mean, to benchmark now is just going to state the obvious (NVIDIA's product stack makes it clear which should perform better than the next).

It can do integers and fp32. I'm not sure if it can do fp32 and fp16 at the same time too? To my understanding RTX Turing's does fp16 math always through Tensor cores, so maybe it can.
 

bug

Joined
May 22, 2015
Messages
13,225 (4.06/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
My two cents: it's too early to be benchmarking DXR because all results are going to be biased towards NVIDIA RTX cards by design. We really need to see AMD's response before it's worth testing. I mean, to benchmark now is just going to state the obvious (NVIDIA's product stack makes it clear which should perform better than the next).
So the ground rule should be "start benchmarking only if AMD looks good"? Geez...
At this point, I don't think benchmarking DXR/RTX is meant to compare Nvidia and AMD (the thought never crossed my mind till I red your post), but rather give an idea about what you're getting if you're willing to foot the bill for an RTX enabled Turing card. The fact that a sample size of one isn't representative of anything should be more than obvious.
 
Joined
Apr 30, 2012
Messages
3,881 (0.89/day)
So the ground rule should be "start benchmarking only if AMD looks good"? Geez...
At this point, I don't think benchmarking DXR/RTX is meant to compare Nvidia and AMD (the thought never crossed my mind till I red your post), but rather give an idea about what you're getting if you're willing to foot the bill for an RTX enabled Turing card. The fact that a sample size of one isn't representative of anything should be more than obvious.

Isn't that already covered in individual Game Reviews. GPU comparisons where the majority don't have that feature is introducing a alternative visual setting.

Its more work for @W1zzard but If hes going that route might as well pick a game that can do DX11/DX12/DXR and compare all 3 and introduce the % lows. Give a bit more insight on testing scene run and duration.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
It can do integers and fp32. I'm not sure if it can do fp32 and fp16 at the same time too? To my understanding RTX Turing's does fp16 math always through Tensor cores, so maybe it can.
It can't do INT32 and FP32 simultaneously presumably because the INT32 units aide the FP32 units when performing operations. FP16 math on RTX cards is done in the tensor cores. FP16 math in non-RTX Turing is done on FP16 units in each SM (replacing tensors). All Turing cards can do FP16 + FP32 or FP16 + INT32.

So the ground rule should be "start benchmarking only if AMD looks good"? Geez...
No, "start benchmarking when there's something meaningful to test." If someone actually values RTX, they're going to buy the best RTX card they can afford from NVIDIA's product stack knowing the more they spend, the better job it will do. It also goes without saying that non-RTX cards, do a pretty terrible job at RTX so if RTX is really your aim, then RTX is what you should be buying. An RTX benchmark at this point is like testing if water is wet.

The reason why benchmarks are important in DX11/DX12/Vulkan/OGL is because there are multiple, competing product stacks and there's no way to know which performs better unless it's tested. Until that is also true of DXR/VRT, I fail to see a point in it.

Isn't that already covered in individual Game Reviews. GPU comparisons where the majority don't have that feature is introducing a alternative visual setting.
Also that. Not many games support RTX and the game review itself can differentiate the cards in terms of RTX performance. In a review of many cards on a variety of games, RTX really has no value because so few cards are even worth trying.
 
Last edited:
Joined
Aug 2, 2011
Messages
1,451 (0.31/day)
Processor Ryzen 9 7950X3D
Motherboard MSI X670E MPG Carbon Wifi
Cooling Custom loop, 2x360mm radiator,Lian Li UNI, EK XRes140,EK Velocity2
Memory 2x16GB G.Skill DDR5-6400 @ 6400MHz C32
Video Card(s) EVGA RTX 3080 Ti FTW3 Ultra OC Scanner core +750 mem
Storage MP600 2TB,960 EVO 1TB,XPG SX8200 Pro 1TB,Micron 1100 2TB,1.5TB Caviar Green
Display(s) Acer X34S, Acer XB270HU
Case LianLi O11 Dynamic White
Audio Device(s) Logitech G-Pro X Wireless
Power Supply EVGA P3 1200W
Mouse Logitech G502 Lightspeed
Keyboard Logitech G512 Carbon w/ GX Brown
VR HMD HP Reverb G2 (V2)
Software Win 11
It can't do INT32 and FP32 simultaneously presumably because the INT32 units aide the FP32 units when performing operations. FP16 math on RTX cards is done in the tensor cores. FP16 math in non-RTX Turing is done on FP16 units in each SM (replacing tensors). All Turing cards can do FP16 + FP32 or FP16 + INT32.


No, "start benchmarking when there's something meaningful to test." If someone actually values RTX, they're going to buy the best RTX card they can afford from NVIDIA's product stack knowing the more they spend, the better job it will do. It also goes without saying that non-RTX cards, do a pretty terrible job at RTX so if RTX is really your aim, then RTX is what you should be buying. An RTX benchmark at this point is like testing if water is wet.

The reason why benchmarks are important in DX11/DX12/Vulkan/OGL is because there are multiple, competing product stacks and there's no way to know which performs better unless it's tested. Until that is also true of DXR/VRT, I fail to see a point in it.


Also that. Not many games support RTX and the game review itself can differentiate the cards in terms of RTX performance. In a review of many cards on a variety of games, RTX really has no value because so few cards are even worth trying.

https://www.nvidia.com/content/dam/...eforce-rtx-gtx-dxr-one-metro-exodus-frame.png

It can do INT32 and FP32 concurrently.

Give this a read and watch. Tony does a good job explaining the architecture and how it is suited to accelerate real time ray tracing.

https://www.nvidia.com/en-us/geforce/news/geforce-gtx-dxr-ray-tracing-available-now/
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
AnandTech article starts off describing them as exclusive but then in a separate section below, it mirrors what NVIDIA says. Pretty shoddy journalism, that.

Keep in mind that all GPU architectures have INT32 units for addressing memory. The only thing unique about Turing is that they're directly addressable. What I find very interesting about that PNG you referenced is how the INT32 units aren't very tasked when the RT core is enabled but are when it is not. Obviously they're doing a lot of RT operations in INT32 which begs the question: is RT core really just a dense integer ASIC with intersection detection? Integer math explains the apparent performance boost from such a tiny part of the silicon. Also explains why Radeon Rays has much lower performance: it uses FP32 or FP16 (Vega) math. It also explains why RTX has such a bad noise problem: their rays are imprecise.

Considering all of this, it's impossible to know what approach AMD will take with DXR. NVIDIA is cutting so many corners and AMD has never been a fan of doing that. I think it's entirely possible AMD will just ramp up the FP16 capabilities and forego exposing the INT32 addressability. I don't know that they'll do it via tensor cores though. AMD has always been in favor of bringing sledge hammers to fistfights. Why? Because a crapload of FP16 units can do all sorts of things. Tensor cores and RT cores are fixed function.
 
Last edited:
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
AnandTech article starts off describing them as exclusive but then in a separate section below, it mirrors what NVIDIA says. Pretty shoddy journalism, that.
Did you consider Nvidia being fairly open at what they are doing with cards technically which means even an objective story will either use their own slides or reproduce imagery and text that will end up very closely following the Nvidia marketing line? This is true for not only Nvidia, by the way.

Keep in mind that all GPU architectures have INT32 units for addressing memory. The only thing unique about Turing is that they're directly addressable. What I find very interesting about that PNG you referenced is how the INT32 units aren't very tasked when the RT core is enabled but are when it is not. Obviously they're doing a lot of RT operations in INT32 which begs the question: is RT core really just a dense integer ASIC with intersection detection? Integer math explains the apparent performance boost from such a tiny part of the silicon. Also explains why Radeon Rays has much lower performance: it uses FP32 or FP16 (Vega) math. It also explains why RTX has such a bad noise problem: their rays are imprecise.
Considering all of this, it's impossible to know what approach AMD will take with DXR. NVIDIA is cutting so many corners and AMD has never been a fan of doing that. I think it's entirely possible AMD will just ramp up the FP16 capabilities and forego exposing the INT32 addressability. I don't know that they'll do it via tensor cores though. AMD has always been in favor of bringing sledge hammers to fistfights. Why? Because a crapload of FP16 units can do all sorts of things. Tensor cores and RT cores are fixed function.
- AGUs are usually less equipped in terms of operations in addition to direct exposure.
- The same slide clearly shows INT32 units being intermittently tasked throughout the frame. RT is computation heavy and is fairly lenient on what type of compute is used so INT32 cores are more effective than usually. Note that FP compute is also very heavy and consistent during RT part of the frame.
- RT core is a dense specialized ASIC. According to Nvidia (and at least indirectly confirmed by devs and operations exposed in APIs) RT cores do ray triangle intersection and BVH traversal.
- RT is not only INT work, it involves both INT and FP. The share of each depends on a bunch of things, algorithm, which part of the RT is being done etc. RT Cores in Turing are more specialized than simply generic INT compute. That is actually very visible empirically from the same frame rendering comparison.
- Radeon Rays have selectable precision. FP16 is implemented for it because it has a very significant speed increase over FP32. In terms of RTRT (or otherwise quick RT) precision has little meaning when rays are sparse and are denoised anyway. Denoising algorithm along with ray placement play a much larger role here.
- As for AMDs approach, this is not easy to say. The short term solution would be Radeon Rays implemented for DXR. When and if AMD wants to come out with that is in question but I suppose than answer is when it is inevitable. Today, AMD has no reason to get into this as DXR and RTRT is too new and with too few games/demos. This matches what they have said along with the fact that AMD only has Vegas that are likely to be effective enough for it (RX5x0 lacks RPM - FP16). Long term - I am speculating here but I am willing to bet that AMD will also do implementation with specialized hardware.
 
Last edited:
Joined
Jul 9, 2015
Messages
3,413 (1.06/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
And reducing the whole Turing line to the ridiculously expensive 2080Ti isn't?
2080Ti is not the only ridiculously expensive card in the line-up, as skyrocketed income of NVDA hints.

GSync module would not physically fit in a notebook. Besides, isn't using established standards exactly what is encouraged? :)
Gsync notebooks using years old eDP to do the "gsync" in "gsync" notebooks is the shortest way to describe what "gsync" really was.
 
Joined
Feb 3, 2017
Messages
3,481 (1.32/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
Gsync notebooks using years old eDP to do the "gsync" in "gsync" notebooks is the shortest way to describe what "gsync" really was.
In notebooks Nvidia used the standard eDP Adaptive Sync for their Mobile GSync implementation. By the way, this was in 2015.
Adaptive Sync on desktop was different. There was no standard.

Timeline:
- October 2013 at Nvidia Montreal Event: GSync was announced with availability in Q1 2014.
- January 2014 at CES: Freesync was announced and demonstrated on Toshiba laptops using eDP.
- May 2014: DisplayPort 1.2a specification got the addition of Adaptive Sync. DisplayPort 1.2a spec was from January 2013 but did not include Adaptive Sync until then.
- June 2014 at Computex: Freesync prototype monitors were demoed.
- Nov 2014 at AMD's Future of Computing Event: Freesync monitors announced with availability in Q1 2015.

Yeah, Nvidia is evil for doing proprietary stuff and not pushing for a standard. However, you must admit it makes sense from business perspective. They control the availability and quality of the product latter of which was a serious problem in early Freesync monitors. Gsync lead Freesync by an entire year on the market. Freesync was a clear knee-jerk reaction from AMD. There was simply no way they could avoid responding.
 
Top