NVIDIA has published a research paper on
DLSS version 4, its AI rendering technology for real-time graphics performance. The system integrates advancements in frame generation, ray reconstruction, and latency reduction. The flagship Multi-Frame Generation feature generates three additional frames for every native frame. The DLSS 4 later on brings the best looking frames to the user quickly to make is seem like a real rendering. At the core of DLSS 4 is a shift from convolutional neural networks to transformer models. These new AI architectures excel at capturing spatial-temporal dependencies, improving ray-traced affect quality by 30-50% according to NVIDIA's benchmarks. The technology processes each AI-generated frame in just 1 ms on RTX 5090 GPUs—significantly faster than the 3.25 ms required by DLSS 3. For competitive gaming, the new Reflex Frame Warp feature reduces input latency by up to 75%, achieving 14 ms in THE FINALS and under 3 ms in VALORANT, according to NVIDIA's own benchmarks.
DLSS 4's implementation leverages Blackwell-specific architecture capabilities, including FP8 tensor cores and fused CUDA kernels. The optimized pipeline incorporates vertical layer fusion and memory optimizations that keep computational overhead manageable despite using transformer models, which are twice as large as previous CNN implementations. This efficiency enables real-time performance even with the substantially more complex AI processing. The unified AI pipeline reduces manual tuning requirements for ray-traced effects, allowing studios to implement advanced path tracing across diverse hardware configurations. The design also addresses gaming challenges like interpolating fast-moving UI elements and particle effects and reducing artifacts in high-motion scenes. NVIDIA's hardware flip metering and Blackwell-induced display engine integration ensure precise frame pacing of newly generated frames for smooth, high-refresh-rate gaming, with accurate imagery.
To ensure DLSS works as intended and that the neural networks produce quality results, NVIDIA has used a secret weapon:
a dedicated supercomputer that has been continuously improving DLSS for the past six years. The supercomputer's primary task involves analyzing failures in DLSS performance, such as ghosting, flickering, or blurriness across hundreds of games. When issues are identified, the system augments its training data sets with new examples of optimal graphics and challenging scenarios that DLSS needs to address. That way, DLSS learns what games look like and generates realistic frames like a game engine would, without any artifacts.
62 Comments on NVIDIA Details DLSS 4 Design: A Complete AI-Driven Rendering Technology
DLSS3 was 'better than native' and now they're showing how DLSS4-T is closer to reference images in select cases.
It's a funny message about the future.
Imho A.I pipeline render is the definition of polishing a turd, which doesn't change the fact that it's still a turd.
UE5, TAA, and RT introduced a host of problems—atrocious performance, blurry visuals, ghosting, and more. Nvidia then stepped in, conveniently selling the "solution" to the very issues they had a hand in creating in the first place, wielding either a carrot or a stick depending on how you view their approach.
Unfortunately all those companies only care about maximizing profits now. They're not pushing the limits anymore and are just relying on AI to do all the work (that they don't want to do because they're lazy). Look at the Industry as a whole, there have been huge are layoffs everywhere since 2021 and it's still going. Also the people staying are working for 2-3 jobs at once but barely make more money than before, it's like "modern slavery" and AI is only going to replace more and more people anyway.
Lovelace was a lot better than Ampere due to a much better process node and a lot more CUDA Cores (at least for the 4090), much higher Clock speeds and a lot more L2 Cache. But in terms of raw performance clock for clock and core for core, Blackwell/Lovelace/Ampere seem to be pretty much on par.
I was expecting the RTX 50s to have much better RT/PT performance but no, I guess they're just keeping that for the RTX 6090 and even more GPU Generated Frames... :mad: As much as I hate to say this, AMD should have gone with some AI upscaling from the beginning, because now only RNDA 4 can use FSR 4 :'(
This stuff is unreal, it's that real time ray traced graphics at super high resolutions and super high frame rates is every bit of the "holy grail" that Jensen Huang calls it and then some. It's insanely, extremely advanced technology that will still take years to achieve.