NVIDIA GeForce Ampere Architecture, Board Design, Gaming Tech & Software 61

NVIDIA GeForce Ampere Architecture, Board Design, Gaming Tech & Software

Thermals & Physical Design »

The new Ampere RT Core and Tensor Core

With Ampere, NVIDIA introduces its 2nd generation RT core that aims to improve raytracing acceleration, as well as new effects, such as raytraced motion blur. An RT core is a fixed-function hardware component that handles two of the most challenging tasks for SIMD programmable shaders, bounding volume hierarchy (BVH) traversal and intersection; i.e., calculating the exact point where a ray collides with a surface, so its next course can be charted. Typical raytracing workloads in a raster+raytracing hybrid rendering path involve calculating steps of traversal and intersection across the BVH and bounding-box/triangle intersections, which is a very unsuitable workload for typical GPUs because of the nature of memory accesses involved. This kind of pointer chasing doesn't scale well with SIMD architectures (read: programmable shaders) and is better suited to special fixed-function hardware, like the MIMD RT cores.


Without taking names, NVIDIA pointed out that a minimalist approach toward raytracing (possibly what AMD is up to with RDNA2) has a performance impact due to overreliance on SIMD stream processors. NVIDIA's RT cores offer a completely hardware-based BVH traversal stack, a purpose-built MIMD execution unit, and inherently lower latency from the hardware stack. The 2nd generation RT core being introduced with Ampere adds one more hardware component.


Ampere introduces a new logic block that interpolates triangle positions along a time scale, in coordination with the triangle intersection unit. NVIDIA tells us that this is useful in generating motion blur effects in real-time raytracing. Our take on this is that NVIDIA is, rather, implementing this as performance optimization for raytracing. As very little will likely change in two frames, there is no need to recalculate all the results for the following frame after all the ray intersections for the current frame have been calculated—the player moved or changed the camera, and objects in the world are positioned only ever so slightly differently. We suspect NVIDIA paired a motion-estimation algorithm with RTX that remembers the last intersections as "good candidates" and checks them early on in the whole process, which can lead to a valid result early in the test and means many entries in the BVH don't have to be processed at all.

3rd Generation Tensor Cores


The new 3rd generation tensor core is largely carried over from the A100 Tensor Core processor NVIDIA introduced this spring, which is purpose-built for AI deep-learning work. To improve performance, Ampere tensor cores are designed to leverage sparsity in deep learning neural nets. Sparsity is a phenomenon where a dense matrix can be trimmed without affecting its accuracy—kind of like how the goal in Jenga is to keep the column intact despite pulling out pieces from the middle. Sparse matrices increase AI inference performance by an order of magnitude.
Next Page »Thermals & Physical Design
View as single page
May 1st, 2024 16:03 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts