Wednesday, March 1st 2017

On NVIDIA's Tile-Based Rendering

Looking back on NVIDIA's GDC presentation, perhaps one of the most interesting aspects approached was the implementation of tile-based rendering on NVIDIA's post-Maxwell architectures. This has been an adaptation of typically mobile approaches to graphics rendering which keeps their specific needs for power efficiency in mind - and if you'll "member", "Maxwell" was NVIDIA's first graphics architecture publicly touted for its "mobile first" design.

This approach essentially divides the screen into tiles, and then rasterizes the entire frame in a per-tile basis. 16×16 and 32×32 pixels are the usual tile sizes, but both Maxwell and Pascal can dynamically assess the required tile size for each frame, changing it on-the-fly as needed and according to the complexity of the scene. This looks to ensure that the processed data has a much smaller footprint than that of the full image rendering - small enough that it makes it possible for NVIDIA to keep the data in a much smaller amount of memory (essentially, the L2 memory), dynamically filling and flushing the available cache as possible until the full frame has been rendered. This means that the GPU doesn't have to access larger, slower memory pools as much, which primarily reduces the load on the VRAM subsystem (increasing available VRAM for other tasks), whilst simultaneously accelerating rendering speed. At the same time, a tile-based approach also lends itself pretty well to the nature of GPUs - these are easily parallelized operations, with the GPU being able to tackle many independent tiles simultaneously, depending on the available resources.
Thanks to NVIDIA's public acknowledgement on the usage of tile-based rendering strating with its Maxwell architectures, some design decisions on the Maxwell architecture now make much more sense. Below, is a screenshot taken from NVIDIA's "5 Things You Should Know About the New Maxwell GPU Architecture". Take a look at the L2 cache size. From Kepler to Maxwell, the cache size increased 8x, from 256 KB on Kepler to the 2048 KB on Maxwell. Now, we can attribute this gigantic leap in cache size to the need for a higher-size L2 cache so as to fit the required tile-based resources for the rasterizing process, which allowed NVIDIA the leap in memory performance and power efficiency they achieved with the Maxwell architecture compared to its Kepler predecessor. Incidentally, NVIDIA's GP102 chip (which powers the GTX Titan X and the upcoming, recently announced GTX 1080 Ti, doubles that amount of L2 cache again, to a staggering 4096 KB. Whether or not Volta will continue with the scaling of L2 cache remains to be seen, but I've seen worse bets.
An interesting tangent: the Xbox 360 and Xbox One ESRAM chips (running on AMD-architectured GPUs, no less) can make for a substitute for the tile-based rasterization process that post-Maxwell NVIDIA GPUs employ.

Tile-based rendering seems to have been a key part on NVIDIA's secret-sauce towards achieving the impressive performance-per-watt ratings of their last two architectures, and it's expected that their approach to this rendering mode will only improve with time. Some differences can be seen in the tile-based rendering between Maxwell and Pascal already, with the former dividing the scene into triangles, and the later breaking a scene up into squares or vertical rectangles as needed, so this means that NVIDIA has in fact put in some measure of work into the rendering system between both these architectures.

Perhaps we have already seen some seeds of this tile-based rendering on AMD's Vega architecture sneak peek, particularly in regards to its next-generation Pixel Engine: the render back-ends now being clients of the L2 cache substitute their previous architectures' non-coherent memory access, in which the pixel engine wrote to the memory controller. This could be AMD's way of tackling the same problem, with AMD's improvements to the pixel-engine with a new-generation draw-stream binning rasterizer supposedly helping to conserve clock cycles, whilst simultaneously improving on-die cache locality and memory footprint.
David Kanter, of Real World Tech, has a pretty interesting YouTube video where he goes in some depth on NVIDIA's tile-based approach, which you can check if you're interested.


Source: NVIDIA Devblogs, Real World Tech
Add your own comment

23 Comments on On NVIDIA's Tile-Based Rendering

#1
TheLostSwede
Welcome to 2001 Nvidia...
At least it's good to see that they're finally catching up.
Posted on Reply
#3
Nokiron
TheLostSwede said:
Welcome to 2001 Nvidia...
At least it's good to see that they're finally catching up.
If Nvidia is welcomed to 2001 with tile-based rasterization, where does this leave AMD? 1995?
Posted on Reply
#4
sutyi
Ferrum Master said:
Kyro again.
PowerVR lawsuit coming in 3... 2...
Posted on Reply
#5
londiste
sutyi said:
PowerVR lawsuit coming in 3... 2...
no it isn't.

the way it is done by nvidia (and amd) is different enough that the original set of patents are unlikely to cover any of this.

also, a lot of mobile gpus do tiled rendering and there has not really been a wave of lawsuits.
Posted on Reply
#6
Brusfantomet
londiste said:
no it isn't.

the way it is done by nvidia (and amd) is different enough that the original set of patents are unlikely to cover any of this.

also, a lot of mobile gpus do tiled rendering and there has not really been a wave of lawsuits.
Lots of mobile GPUs are based on the PowerVR design
Posted on Reply
#7
londiste
yup, powervr is significant in mobile space.
but not only their gpus do tiles, perhaps most notably arm's mali and qualcomm's adreno should both do tile-based rendering.
Posted on Reply
#8
TheLostSwede
Nokiron said:
If Nvidia is welcomed to 2001 with tile-based rasterization, where does this leave AMD? 1995?
PowerVR did tiled based rendering with their Kyro chips around 2001. AMD has as far as I'm aware been doing some form of tile based rendering for quite some time.
Posted on Reply
#9
Nokiron
TheLostSwede said:
PowerVR did tiled based rendering with their Kyro chips around 2001. AMD has as far as I'm aware been doing some form of tile based rendering for quite some time.
I don't think AMD has ever used it in a desktop-product. The Adreno-products did.
Posted on Reply
#10
Solidstate89
TheLostSwede said:
PowerVR did tiled based rendering with their Kyro chips around 2001. AMD has as far as I'm aware been doing some form of tile based rendering for quite some time.
You would be wrong. Which makes your nVidia comment all the more hilarious as they are the first manufacturer to implement this on GPU chips outside of the mobile environment.
Posted on Reply
#11
Steevo
Solidstate89 said:
You would be wrong. Which makes your nVidia comment all the more hilarious as they are the first manufacturer to implement this on GPU chips outside of the mobile environment.
Your special kid.

"An interesting tangent: the Xbox 360 and Xbox One ESRAM chips (running on AMD-architectured GPUs, no less) can make for a substitute for the tile-based rasterization process that post-Maxwell NVIDIA GPUs employ."
Posted on Reply
#12
ZoneDymo
So is this more proof these gpu makers are sitting on a bunch of tech they COULD put in their new gpu and send us light years ahead in tech but dont because feeding it to the public in piecemeal portions means more money?
Posted on Reply
#13
Nokiron
Steevo said:
Your special kid.

"An interesting tangent: the Xbox 360 and Xbox One ESRAM chips (running on AMD-architectured GPUs, no less) can make for a substitute for the tile-based rasterization process that post-Maxwell NVIDIA GPUs employ."
That's not proper tile-based rasterization though. And is it says, it's a substitute which is way slower.
Posted on Reply
#14
Solidstate89
Steevo said:
Your special kid.

"An interesting tangent: the Xbox 360 and Xbox One ESRAM chips (running on AMD-architectured GPUs, no less) can make for a substitute for the tile-based rasterization process that post-Maxwell NVIDIA GPUs employ."
You're**
Posted on Reply
#15
TheLostSwede
Solidstate89 said:
You would be wrong. Which makes your nVidia comment all the more hilarious as they are the first manufacturer to implement this on GPU chips outside of the mobile environment.
Right, I guess they did it for mobile, but never desktop. Interesting. Trident did it (not sure the chip ever went into mass production though), but I guess no-one remembers them any more...
It also looks like PowerVR did it all the way back in 1996 when they started doing GPUs.

So welcome to 1996 Nvidia...
Posted on Reply
#16
prtskg
Nokiron said:
That's not proper tile-based rasterization though. And is it says, it's a substitute which is way slower.
Where is the slower part written? May be I need to read the article again!
Posted on Reply
#17
Nokiron
prtskg said:
Where is the slower part written? May be I need to read the article again!
The ESRAM in the Xbox One is inherently extremely slow compared to the low level cache found in a desktop GPU. That should really speak for itself.

They didn't say the last part though, i did. Not a native speaker.
Posted on Reply
#19
efikkan
Tiled rendering is one of several techniques which helps improve the efficiency of Maxwell/Pascal.
Generally this gives two great benefits:
- Tiles are rendered completely, instead of the screen rendering each pixel partially several times. This saves the data from taking several round-trips between the GPU and memory, which saves a lot of memory bandwidth.
- Lower risk of data hazards (multiple sections needing the same texture), so less stalls, improving GPU efficiency.
- Being cache local, reducing stalls, again improving GPU efficiency.

BTW; I recommend watching the referenced Youtube video in the article, it's visuals are good so even the non-programmers among you should be able to get the idea.
Posted on Reply
#20
erixx
Then don't forget the good ole voxels! (Efficient Sparse Voxel Octrees) Novalogic was so futuristic with that.
Posted on Reply
#21
Steevo
Solidstate89 said:
You're**
Gotta love auto correct on mobile devices.

Nokiron said:
The ESRAM in the Xbox One is inherently extremely slow compared to the low level cache found in a desktop GPU. That should really speak for itself.

They didn't say the last part though, i did. Not a native speaker.
The PS4 is a regular chip with caches, almost identical to the XBox1, just GDDR and caches. They are relatively quick, mostly held back by CPU cores that were not zen.

It's been known for awhile, at least since August of last year.

https://www.extremetech.com/gaming/232771-targeted-testing-reveals-secrets-of-nvidia-maxwell-pascal-power-efficiency


It's possible the tile based rendering will explain some of the artifacts they produce when running certain effects.
Posted on Reply
#22
Super XP
Nokiron said:
If Nvidia is welcomed to 2001 with tile-based rasterization, where does this leave AMD? 1995?
It leaves AMD ahead of the game, in 2018,
Posted on Reply
#23
Kanan
erixx said:
Then don't forget the good ole voxels! (Efficient Sparse Voxel Octrees) Novalogic was so futuristic with that.
Don't forget the epic Voxel game "Outcast", it was completely processed by the CPU in a time where 3D graphics were the newest and greatest shit :D
Posted on Reply
Add your own comment