• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Develops Tile-based Multi-GPU Rendering Technique Called CFR

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,670 (7.43/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
NVIDIA is invested in the development of multi-GPU, specifically SLI over NVLink, and has developed a new multi-GPU rendering technique that appears to be inspired by tile-based rendering. Implemented at a single-GPU level, tile-based rendering has been one of NVIDIA's many secret sauces that improved performance since its "Maxwell" family of GPUs. 3DCenter.org discovered that NVIDIA is working on its multi-GPU avatar, called CFR, which could be short for "checkerboard frame rendering," or "checkered frame rendering." The method is already secretly deployed on current NVIDIA drivers, although not documented for developers to implement.

In CFR, the frame is divided into tiny square tiles, like a checkerboard. Odd-numbered tiles are rendered by one GPU, and even-numbered ones by the other. Unlike AFR (alternate frame rendering), in which each GPU's dedicated memory has a copy of all of the resources needed to render the frame, methods like CFR and SFR (split frame rendering) optimize resource allocation. CFR also purportedly offers lesser micro-stutter than AFR. 3DCenter also detailed the features and requirements of CFR. To begin with, the method is only compatible with DirectX (including DirectX 12, 11, and 10), and not OpenGL or Vulkan. For now it's "Turing" exclusive, since NVLink is required (probably its bandwidth is needed to virtualize the tile buffer). Tools like NVIDIA Profile Inspector allow you to force CFR on provided the other hardware and API requirements are met. It still has many compatibility problems, and remains practically undocumented by NVIDIA.



View at TechPowerUp Main Site
 
This was explained back when Crossfire and SLI were making their debut, IIRC. It is not exactly new? Or am I missing something.

In all cases I recall, these techniques sucked because they mandated each gpu still had to render the complete scene geometry, and only helped with fill rate.
 
This was explained back when Crossfire and SLI were making their debut, IIRC. It is not exactly new? Or am I missing something.

In all cases I recall, these techniques sucked because they mandated each gpu still had to render the complete scene geometry, and only helped with fill rate.

Yeah, I too had a lot of deja vu writing this, and had a long chat with W1zzard. Maybe it's some kind of TBR extrapolation for multi-GPU which they finally got right.
 
Yeah, I too had a lot of deja vu writing this, and had a long chat with W1zzard. Maybe it's some kind of TBR extrapolation for multi-GPU which they finally got right.

I sometimes swear they are selling us the same darn tech with new buzzwords...

Maybe the matrix is just glitching again...
 
It seems they are leveraging their (single GPU) tiled-rendering hardware in the silicon to split up the image for CFR, possibly with non 50/50 splits that could possibly dynamically change during runtime to spread the load better.
 
Did either AMD or Nvidia manage to get dynamic splitting to work reliably? As far as I remember all the attempts were eventually ended because solutions came with their own set of problems primarily around uneven frame times and stuttering.

Single-GPU tiled-rendering hardware would be tiles of static size but playing around with the tile count per GPU might work?
 
I'm wondering if that new tile-based technique will introduce artifacts in the picture, just like with tearing in SFR?
 
I'm wondering if that new tile-based technique will introduce artifacts in the picture, just like with tearing in SFR?
Not when doing RTRT, which is likely the reason they're developing this (and mostly for game streaming services, not local GPUs).
Did either AMD or Nvidia manage to get dynamic splitting to work reliably? As far as I remember all the attempts were eventually ended because solutions came with their own set of problems primarily around uneven frame times and stuttering.
Well, actually this is a problem that RTRT solves automatically.
In legacy game rendering techniques the input consists of instructions that must be run. There's little control over time - GPU has to complete (almost) everything or there's no image at all.
So the rendering time is a result (not a parameter) and each frame has to wait for the last tile.

In RTRT frame rendering time (i.e. number of rays) is the primary input parameter. It's not relevant how you split the frame. This is perfectly fine:

1574328651433.png
 
So, Nvidia are now using the technique that UK based PowerVR developed, a company that Nvidia effectively forced out of the PC GPU market with their dirty tricks, in the early 2000's... :rolleyes:
Tile-based rendering is a straightforward, natural approach. It's commonly used in non-gaming rendering engines (you see it happening in Cinebench). PowerVR didn't invent it. They may have just been the first to implement it in hardware.
 
ATi implement Crossfire mode called Super Tiling back in early X800/X850 days. Though it requires a 'Master' card with dedicated compositing engine to combine the input of two cards, plus a dongle.


 
Don't overthink it. We needed SLI in Directx 12, now we have it. The trick is running render targets seperately despite having to run equal postprocess weights throughout the screen therefore it has been difficult to scale up SFR performance. Since there is no unified dynamic lighting load in RTX mode, this might work.
 
A question for the community; Would a VBIOS update be enough to enable crossfire on the 5700 cards?
 
Here's a crazy idea: Why not work with M$/AMD to optimize DX12/Vulkan? Hell Vulkan has an open source SDK, it does not even need special cooperations with anyone.
Also, back when DX12 was launched there was a lot of hype on how good it would perform with multi-GPU setups using async technologies (indipendent chips & manufacturers) https://wccftech.com/dx12-nvidia-amd-asynchronous-multigpu/
Seems like everyone forgot about it...
 
Would this have anything to do with MCM GPUs? I hope AMD beats Nvidia to an MCM (multiple GPU Chipley, not just a GPU and HBM) GPU, I'm not an AMD fanboy in the least, I just dislike Nvidia and want them to get cut down to size like Intel has been solely due to the fact that Intel getting their ass whooped has benefitted consumers and the same happening to Nvidia would probably benefit us all.
 
So, Nvidia are now using the technique that UK based PowerVR developed, a company that Nvidia effectively forced out of the PC GPU market with their dirty tricks, in the early 2000's... :rolleyes:
Wow, I thought people had forgotten about them. Nvidia was also trashing them back then and saying tile based rendering sucked...
 
Would this have anything to do with MCM GPUs?

That's exactly what i though ! I don't see them revamping SLI after killing it , to me this has more to do with future MCM designs and might be a good indication that MCM design based gaming GPU from Nvidia is closer than what most of us believe .
 
Nvidia was also trashing them back then and saying tile based rendering sucked...
:)

Edit:
For a bit of background, this was a presentation to OEMs.
Kyro was technically new and interesting but as an actual gaming GPU on desktop cards, it sucked both due to spotty support as well as lackluster performance. It definitely had its bright moments but they were too few and far between. PowerVR could not develop their tech fast enough to compete with Nvidia an ATi at the time.
PowerVR itself went along just fine, the same architecture series was (or is) a strong contender in mobile GPUs.
 
Last edited:
I see MCM as the way forward and not just another version of SLI. For one thing the cost of buying 2 cards must be more expensive than a single card with MCM that could rival the performance of multi-GPU. Granted the MCM will cost more than a regular GPU but with SLI you have to buy 2 of everything. 2 PCBs, 2 GPUs, 2 sets of VRAM, 2 of all the components on the PCB, 2 Shrouds, 2 coolers, 2 boxes, etc
 
Last edited:
They are trying to justify the rtx lineup lol
 
Also, back when DX12 was launched there was a lot of hype on how good it would perform with multi-GPU setups using async technologies (indipendent chips & manufacturers) https://wccftech.com/dx12-nvidia-amd-asynchronous-multigpu/
Seems like everyone forgot about it...
With good reason. In order for it to really work, the programmer would need to optimize every time for a specific system. If I am writing some GPGPU software for a solution that I'm selling bundled with a computer (which hardware I get to specify), it could be worth the effort. For games that can have any combination of rendering hardware? Eh, no thanks. The world is just much simpler when we just have to think about 2 exact same gpu to balance the workload equally. Even then we get cans of worms thrown at our faces from time to time.
 
It seems they are leveraging their (single GPU) tiled-rendering hardware in the silicon to split up the image for CFR, possibly with non 50/50 splits that could possibly dynamically change during runtime to spread the load better.


A good idea with complexity, how is full screen AA processed if only half the resources are on each card?

Or could this possibly be a Zen like chiplet design to save money and loss on the newest node?

NVlink as the fabric for communication, if only half the resources are actually required maybe I'm out in left field but put 12GB or 6GB for each chiplet and interleave the memory.
 
A good idea with complexity, how is full screen AA processed if only half the resources are on each card?
Or could this possibly be a Zen like chiplet design to save money and loss on the newest node?
NVlink as the fabric for communication, if only half the resources are actually required maybe I'm out in left field but put 12GB or 6GB for each chiplet and interleave the memory.
AA would probably be one of the postprocessing methods done at the end of rendering a frame.

You can't get off with shared memory like that. You are still going to need a sizable part of assets accessible by both/all GPUs. Any memory far away from GPU is evil and even a fast interconnect like NVLink won't replace local memory. GPUs are very bandwidth-constrained so sharing memory access through something like Zen2's IO die is not likely to work on GPUs at this time. With big HBM cache for each GPU, maybe, but that is effectively still each GPU having its own VRAM :)

Chiplet design has been the end goal for a while and all the GPU makers have been trying their hand on this. So far, unsuccessfully. As @Apocalypsee already noted - even tiled distribution of work is not new.
 
:)

Edit:
For a bit of background, this was a presentation to OEMs.
Kyro was technically new and interesting but as an actual gaming GPU on desktop cards, it sucked both due to spotty support as well as lackluster performance. It definitely had its bright moments but they were too few and far between. PowerVR could not develop their tech fast enough to compete with Nvidia an ATi at the time.
PowerVR itself went along just fine, the same architecture series was (or is) a strong contender in mobile GPUs.

It wasn't that bad, I tested the cards myself at the time, in fact, I'm still on good terms with Imagination Technologies PR director, who at the time used to come by the office with things to test. But yes, they did have drivers issues, which was one of the big flaws, but performance wasn't as terrible as that old Nvidia presentation makes it out to be.
 
It wasn't that bad, I tested the cards myself at the time, in fact, I'm still on good terms with Imagination Technologies PR director, who at the time used to come by the office with things to test. But yes, they did have drivers issues, which was one of the big flaws, but performance wasn't as terrible as that old Nvidia presentation makes it out to be.

Of course nv will commit libel just like intel.
 
Back
Top