ConclusionOur last PCI-Express scaling article was close to 20 months ago, with the GeForce GTX 1080 "Pascal," which we admit isn't exactly the predecessor of the RTX 2080 Ti, but was the fastest graphics card you could buy then. The GTX 1080 did not saturate PCI-Express 3.0 x16 by a long shot, and we observed no changes in performance between gen 3.0 x16 and gen 3.0 x8 at any resolution.
We are happy to report that the RTX 2080 Ti is finally able to overwhelm PCIe gen 3.0 x8, posting a small but tangible 2%–3% performance gain when going from gen 3.0 x8 to gen 3.0 x16, across resolutions. Granted, these are single-digit percentage differences, and you won't be able to notice them in regular gameplay, but graphics card makers expect you to pay like $100 premiums for factory overclocks that fetch essentially that much more performance out of the box. The performance difference isn't nothing, just like with those small out-of-the-box performance gains, but such small differences are impossible to notice in regular gameplay.
This should also mean PCI-Express 2.0 x16 (in case of people still clinging on to platforms like "Sandy Bridge-E" or AMD FX) can impose a (small) platform bottleneck with the RTX 2080 Ti. On top of that, the weaker CPU performance can also cause a bottleneck in some games, especially at high FPS rates.
The performance differences between PCIe bandwidth configurations are more pronounced at lower resolutions than 4K, which isn't new. We saw this in every previous PCI-Express scaling test. The underlying reason is that the framerate is the primary driver of PCIe bandwidth, not the resolution. Bus transfers are fairly constant for a given scene for each frame, independent of the resolution. The final rendered image never moves across the bus except in render engines that do post-processing on the CPU, which has gotten much more common since we last looked at PCIe scaling. Yet even so, the reduction in FPS due to a higher resolution is still bigger than the increase in pixel data.
Some titles seemingly show the opposite: all cards bunched up against a wall at 1080p, and the differences get bigger at higher resolution. These cases, like GTA V, are CPU limited at lower resolutions; i.e. the per-frame game logic (on the CPU) can't run any faster and is thus limiting the frame rate even though the GPU could run faster. When the resolution is higher, the FPS rate goes down, which takes some load off the CPU, moving the bottleneck to the GPU, which makes it possible for PCIe to become a bottleneck too.
The performance takes an even bigger hit as you lower bandwidth to PCIe gen 3.0 x4 (comparable to gen 2.0 x8), though still not by the double-digit percentages we were expecting to see. You lose 9% performance compared to gen 3.0 x16 at 1080p, 8% at 1440p, and, surprisingly, just 6% at 4K.
Don't take our PCIe gen 3.0 x4 numbers as a green light for running your RTX 2080 Ti in the bottom-most PCIe x16 slot on your motherboard, which tends to be x4 electrically. That slot is most likely wired to your motherboard chipset instead of the CPU. Using it for graphics cards would be saturating the chipset bus, the connection between the chipset and the CPU, which other bandwidth-heavy components in your machine rely on, such as network adapters and SATA SSDs, and all of these components share the bandwidth of that x4 link.
We also decided to test PCIe gen 2.0 x4 purely for academic reasons, just because we tested bus widths as low as x1 in the past. Don't try this at home. Performance drops like a rock across resolutions, by up to 22% at 1080p.
What do these numbers spell for you? For starters, installing the RTX 2080 Ti in the topmost x16 slot of your motherboard while sharing half its PCIe bandwidth with another device in the second slot, such as an M.2 PCIe SSD, will come with performance penalties, even if they're small. These penalties didn't exist with older-generation GPUs because those were slower and didn't need as much bandwidth. Again, you're looking at 3%, which may or may not be worth the convenience of being able to run another component; that's your decision.
For the first time since the introduction of PCIe gen 3.0 (circa 2011), 2-way SLI on a mainstream-desktop platform, such as Intel Z370 or AMD X470, could be slower than on an HEDT platform, such as Intel X299 or AMD X399, because mainstream-desktop platforms split one x16 link between two graphics cards, while HEDT platforms (not counting some cheaper Intel HEDT processors), provide uncompromising gen 3.0 x16 bandwidth for up to two graphics cards. Numbers for gen 3.0 x8 and gen 3.0 x4 also prove that PCI-Express gen 2.0 is finally outdated, so it's probably time you considered an upgrade for your 7-year old "Sandy Bridge-E" rig.
By this time next year, we could see the first desktop platforms and GPUs implementing PCI-Express gen 4.0 in the market. If only "Turing" supported PCIe gen 4.0, you would have had the luxury to run it at gen 4.0 x8 without worrying about any performance loss. Exactly this is the promise of PCIe gen 4.0, not more bandwidth per device, but each device working happily with a lower number of lanes, so processor makers aren't required to add more lanes.