NVIDIA GeForce RTX 3080 PCI-Express Scaling 82

NVIDIA GeForce RTX 3080 PCI-Express Scaling



I've been doing PCI-Express performance scaling articles for over ten years now—the first one was with the Radeon HD 5870 in September 2009. Until today, the conclusion has always been the same: no reason to worry about PCIe bandwidth unless you plan on running very high FPS or have a severely constrained interface configuration, think "x4", or "1.1". Today, this conclusion still stands, but a little asterisk has been added.

If we take a look at the average performance results, you'll realize that we're not even close to saturating the PCI-Express 3.0 bandwidth. The difference between 3.0 and 4.0 is 1%, which is not far from the margin of error in these tests—definitely something you'll not be able to notice in normal gameplay. It's also interesting that the gap remains constant no matter whether you play at 1080p or 4K, which is kinda counterintuitive. Shouldn't the PCIe bandwidth go up as you increase the resolution? Yes, it will, but not 1:1. Game engines don't transfer every single frame over the PCIe bus to the CPU for post-processing. These effects are handled in shaders nowadays. If something is copied over the bus, it's either resolution-independent or a lower-resolution representation of the framebuffer. Generally, copying data over the PCI-Express bus is an extremely expensive operation compared to the speeds of today's GPUs, which is why game developers do everything in their power to generate the whole frame on the GPU, while GPU designers keep adding new features and capabilities to their hardware, too, which address the same problem.

Now, of course some data must move over the bus. For example, changes in player positioning, viewing direction, new spawns, enemy positions, etc., all need to be sent to the graphics card from the CPU, via the PCIe bus. These updates are resolution-independent though; player positioning is X, Y, Z, just a few bytes. The same goes for other kinds of information. What's big is texture and geometry streaming, but those are independent of resolution. While the latter load as you traverse the map, at a rate that's independent of the frame rate—your movement speed doesn't change with the framerate—the former depends on the framerate. At higher FPS, the world updates more quickly, or you would experience serious input lag. This behavior answers why the performance hit at lower PCIe bandwidths is higher at 1080p than 1440p or 4K—framerate drives PCIe bandwidth usage.

Due to the immense graphics horsepower of the GeForce RTX 3080, it is the first time in this series of articles that we see a clear indication of the CPU bottleneck in the PCIe scaling article. It's kinda logical—if a game is bottlenecked by the CPU (i.e., the CPU can not process the game's render loop as fast as the GPU), it's expected that differences in PCIe bandwidth will not play a role either—the bottleneck remains with the "CPU processing power". A good example of that is Borderlands 3. At 1080p, there's virtually no difference between the various PCIe bandwidth settings we tested. But when you go up to 1440p and 4K, some differences emerge, contrary to what we've been telling you for years. The reason is that (in this game) the CPU bottleneck gradually disappears at higher resolution, which can have the next slowest thing become the bottleneck, in this case the PCIe bus, especially at the super slow PCIe x8 1.1 setting. That's the asterisk I mentioned at the start of this conclusion.

As of this writing, there are no PC games that use either RTX-IO or DirectStorage—a feature that lets GPUs pull compressed data directly from SSDs and decompress them on GPU, so the CPU overhead for the storage IO is reduced. Given DirectStorage is a key feature of the Xbox Series X, there's a fair chance this feature won't end up as vaporware. We'll have to revisit RTX-IO in a future review. In theory, though, RTX-IO should begin taxing the PCIe bus for the GPU—by up to 7 GB/s in case of M.2 NVMe Gen 4 SSDs—eating into its bandwidth that's also in use by the rest of the game. PCIe Gen 4 probably helps here, but we have no way of telling.

So if you're on a desktop platform with PCIe Gen 3 x16, relax. There's little to no performance lost at any resolution. PCIe generational scaling used to matter when multi-GPU was relevant and there was a likelihood of running a graphics card at x8 bandwidth—not so anymore. You're on a PCIe Gen 4 platform, you get the freedom to run the RTX 3080 at Gen 4 x8 with no performance lost (since it is the same bandwidth as Gen 3 x16), and may drop a sexy NVMe RAID card into the second slot.
Discuss(82 Comments)
View as single page