Raster/ray traced performance

ravenhold · Jun 28, 2022

I don't understand why GPU vendors don't treat raster and ray tracing performance the same way. For example, if GPU is bound with frame number at 1080p, the same frame number would have with raytracing ON.
So it would have the exact number of ray traced/shader cores or any kind of cores dedicated to ray tracing as main ones.

My point is with lower end side of GPUs, you would get ray tracing performance at that scale as it is raster.

cvaldes · Jun 28, 2022

GPU manufacturers design chips to function in a variety of usage cases. They don't design them for one person's usage case. It's not your mother in the kitchen cooking you your favorite breakfast.

All of these things are a balance of features, compromises, etc. Remember that it's a finite amount of silicon wafer space and they need to consider how frequently any given type of transistor is going to be used -- raster cores, ray tracing cores, machine learning cores, etc. -- in a wide variety of real world situations, not just one random guy living in his mother's basement running benchmarks.

If running applications that benefits from the presence of ray tracing cores becomes more popular I'm guessing that these GPU manufacturers will include more of them. There's not much incentive for them to be included if most of the time they go unused. That was AMD's philosophy until RDNA2 and unsurprisingly ray tracing performance on the RDNA2 generation GPUs is inferior to NVIDIA's Ampere generation GPUs.

It's not like Jensen or Dr. Su can wave a magic wand and add 5x RT cores to a die for free.

Adding more ray tracing cores to a given GPU die means subtracting other transistors elsewhere. That might mean lesser raster performance in exchange for better ray tracing performance. Is that something you'd be interested in?

ravenhold · Jun 28, 2022

What about new GPU architecture doctrine where raster and ray traced cores are modulary driven, designers could add more raytracing cores without affecting raster performance.

eazen · Jun 28, 2022

That’s way too soon, first we need efficient code to deal with RT in games and software first then all GPUs can transition to full or more RT hardware and you can get what you want. Essentially the long time goal is full RT but we’re still far away from it.

Vayra86 · Jun 29, 2022

ravenhold said:
What about new GPU architecture doctrine where raster and ray traced cores are modulary driven, designers could add more raytracing cores without affecting raster performance.

They're radically different kinds of calculations.

If you look at the papers on Ampere and Turing you will see that INT (integer) functions were added and the new cores carry those functions, because they're built to handle them faster. Efficiency in GPUs is often obtained by reducing the functionality of cores, making them more single-purpose. Example of that is how Nvidia don't offer high precision floating point on Geforce, and did on Titan, but eventually killed that too.

But in the history of GPU development there is always a shift back and forth, as new functionality is added, and refined, and at some point becomes a known quantity. There's always a phase where the resources a GPU gets are not perfectly aligned with what games want. That's why there are differences in performance between engines/GPU families/games.

Pascal to Turing: less performance per shader, lower perf per clock, too, AND lower clocks, but the functionality was expanded. Ampere iterated on that with a further refinement of shader count, and the sacrifice was made in TDP to remain competitive in raster performance.

Either way, yes, I do agree the ideal situation is one where you don't waste die space on cores that are going to be idle at any point in time. We have yet to see if that is feasible.

eazen · Jun 29, 2022

Vayra86 said:
Pascal to Turing: less performance per shader, lower perf per clock, too, AND lower clocks, but the functionality was expanded. Ampere iterated on that with a further refinement of shader count, and the sacrifice was made in TDP to remain competitive in raster performance.

I’m curious how you would come to conclusion that Nvidia made the shaders worse with a new architecture? Turing has higher IPC than Pascal in every way, and the clocks are same more or less. The only GPU with straight lower clocks is the 2080 Ti cause it was a huge GPU (still the biggest gaming GPU ever) and had to make due with a “low” tdp of just 280W compared to its size.

ravenhold · Jul 25, 2022

Is there a future of GPU architecture where we ditch raster alltogether and use only raytracing cores? If not RTX5000 then RTX6000 series maybe?

If that happens, is it required to rewrite complete shader system/Directx/Vulkan?

ratirt · Jul 25, 2022

eazen said:
I’m curious how you would come to conclusion that Nvidia made the shaders worse with a new architecture? Turing has higher IPC than Pascal in every way, and the clocks are same more or less. The only GPU with straight lower clocks is the 2080 Ti cause it was a huge GPU (still the biggest gaming GPU ever) and had to make due with a “low” tdp of just 280W compared to its size.

If you look at the 2080 ti and 3060 ti. The 3060 has higher clocks and more cores and yet is around 15% slower. Obviously the 2080 Ti is bigger since it's been made on a 12nm node and the 3060 ti 8nm so it is smaller. By the cores and clocks it should have been faster but it isn't. Maybe the memory plays a role here but I doubt it. So there are some sort of limitations implemented I suppose.
Same goes for pascal?

Vayra86 · Aug 2, 2022

eazen said:
I’m curious how you would come to conclusion that Nvidia made the shaders worse with a new architecture? Turing has higher IPC than Pascal in every way, and the clocks are same more or less. The only GPU with straight lower clocks is the 2080 Ti cause it was a huge GPU (still the biggest gaming GPU ever) and had to make due with a “low” tdp of just 280W compared to its size.

You are correct, I mixed things up. Pascal clocks higher, overall. A good 100 mhz boost advantage is common, often more

System Name	daily driver Mac mini M2 Pro
Processor	Apple Silicon M2 Pro (6 p-cores, 4 e-cores)
Motherboard	Apple proprietary
Cooling	Apple proprietary
Memory	Apple proprietary 16GB LPDDR5 unified memory
Video Card(s)	Apple Silicon M2 Pro (16-core GPU)
Storage	Apple proprietary 512GB SSD + various external HDDs
Display(s)	LG 27UL850W (4K@60Hz IPS)
Case	Apple proprietary
Audio Device(s)	Apple proprietary
Power Supply	Apple proprietary
Mouse	Apple Magic Trackpad 2
Keyboard	Keychron K1 tenkeyless (Gateron Reds)
Software	macOS Ventura 13.6 (including latest patches)
Benchmark Scores	(My Windows daily driver is a Beelink Mini S12. I'm not interested in benchmarking.)

System Name	EA-ZEN
Processor	AMD Ryzen 7 5800X3D with -50mW UV
Motherboard	Asus X570
Cooling	Big Air
Memory	2x16 GB DDR4 3600 CL16
Video Card(s)	Asus RTX 2080 Ti Strix highly OC’ed
Storage	1 TB NVME, 500 GB SSD etc
Display(s)	2x 27”, main: curved 144Hz SVA with BLS and HDR
Case	Full Tower
Audio Device(s)	Z906 5.1 and Audeze Headphones, Shure SM7B mic
Power Supply	Enough
Mouse	Old but gold
Keyboard	Mechanical Cherry Brown
Software	Windows 10 Pro
Benchmark Scores	A lot

Processor	i7 8700k 4.6Ghz @ 1.24V
Motherboard	AsRock Fatal1ty K6 Z370
Cooling	beQuiet! Dark Rock Pro 3
Memory	16GB Corsair Vengeance LPX 3200/C16
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Fractal Design Define R5
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	XTRFY M42
Keyboard	Lenovo Thinkpad Trackpoint II
Software	W10 x64

System Name	EA-ZEN
Processor	AMD Ryzen 7 5800X3D with -50mW UV
Motherboard	Asus X570
Cooling	Big Air
Memory	2x16 GB DDR4 3600 CL16
Video Card(s)	Asus RTX 2080 Ti Strix highly OC’ed
Storage	1 TB NVME, 500 GB SSD etc
Display(s)	2x 27”, main: curved 144Hz SVA with BLS and HDR
Case	Full Tower
Audio Device(s)	Z906 5.1 and Audeze Headphones, Shure SM7B mic
Power Supply	Enough
Mouse	Old but gold
Keyboard	Mechanical Cherry Brown
Software	Windows 10 Pro
Benchmark Scores	A lot

System Name	Bro2
Processor	Ryzen 5800X
Motherboard	Gigabyte X570 Aorus Elite
Cooling	Corsair h115i pro rgb
Memory	16GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s)	Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage	M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s)	LG 27UD69 UHD / LG 27GN950
Case	Fractal Design G
Audio Device(s)	Realtec 5.1
Power Supply	Seasonic 750W GOLD
Mouse	Logitech G402
Keyboard	Logitech slim
Software	Windows 10 64 bit