• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Article: Just How Important is GPU Memory Bandwidth?

Joined
Apr 19, 2012
Messages
12,062 (2.52/day)
Location
Gypsyland, UK
System Name HP Omen 17
Processor i7 7700HQ
Memory 16GB 2400Mhz DDR4
Video Card(s) GTX 1060
Storage Samsung SM961 256GB + HGST 1TB
Display(s) 1080p IPS G-SYNC 75Hz
Audio Device(s) Bang & Olufsen
Power Supply 230W
Mouse Roccat Kone XTD+
Software Win 10 Pro
****
Holy crap I am tired and this is all probably totally wrong
You can see all my original data here:
https://www.dropbox.com/sh/v3vqnglktagj8tr/AADvMQeqR-nxETkn4PwJKlZBa?dl=0
****
Introduction

The main reason for running into this kind of article was with the recent “exclamations” about the GTX 960’s 128bit wide memory interface. The GPU offers a 112GB/s memory bandwidth, and many believe that this narrow interface will not provide enough memory bandwidth for games. This card is primarily aimed at the midrange crowd, wanting to run modern titles (both AAA and independent), at a native resolution of 1080p.

Memory bandwidth usage is actually incredibly difficult to measure, but it’s the only way of making known once and for all, what the real 1080p requirement is for memory bandwidth. Typically using GPU-Z, what we have available to us is “Memory Controller Load”. This is a percentage figure does not accurately measure the total GB/s bandwidth that is being used. The easiest way to explain it is it acts similar to the percentage CPU utilisation Task Manager shows. Another example would be GPU Load, wherein various types of load can cause the same percentage figure measurement, but can have very different power usage readings, leading us to assume one 97% load can be much more intensive than another. Something else that only NVidia cards allow measurements of is PCIe Bus usage. AMD has yet to allow such a measurement, and thanks to @W1zzard for throwing me a test build of GPU-Z, I could run some Bus usage benchmarks. I had a fair few expectations from the figures, but the results I got were a little less than expected.

Something I need to make clear before you read on, my memory bandwidth usage figures (GB/s) are not 100% accurate. They have been estimated and extrapolated using performance percentages of the benchmark figures I’ve got, as such, most of this article will be relying largely on those estimations. Only a fool would consider it as fact. NVidia has said themselves that Bus usage is wholly inaccurate, and most of us are aware that Memory Controller Load (%) cannot represent the exact bandwidth usage (GB/s) with total precision. All loads are different.

All of the following benchmarks were run 4 times for each game on each resolution for accuracy. Every preset is set to High where Very High is unavailable. The only graphical alteration to my video settings was turning off VSync and Motion Blur.

Choices of Games

I’ve chosen to run with 4 games which I felt represented a fair array of game types. For CPU orientated, I’ve run with Insurgency. This is Source engine based, highly CPU intensive, and should cover most games running that sort of requirement. It has a reasonable VRAM requirement, but is overall quite light on general GPU usage, so it should stress the memory somewhat.

To represent the independent games, while also holding a high VRAM requirement, I’ve run with Starpoint Gemini II. This game has massive VRAM requirements, and is quite a GPU heavy game.

I’ve chosen two other games for the AAA area, one very generalised game, and one that boasted massive 4GB VRAM requirements for general high res play. Far Cry 4 felt like a good representative for the AAA genre that has balance in both general performance of the CPU, GPU, and moderate VRAM requirements. Middle Earth: Shadow of Mordor was my choice for the AAA genre to slaughter my VRAM and hopefully put my GPU memory controller and VRAM to the test.
*****

1440p – Overall Correlations

I’ve started off with benchmarks running on 1440p to clearly identify what kind of GPU power is required for this resolution. I understand that the 112GB/s bandwidth we’re aiming for is designed to cope with 1080p, but hopefully you’ll see just what you need.

First off, we’ll take a look at all four games, and the performance of the GPU Core(%), Memory Controller Load(%), and VRAM Usage(MB). (The following data has been sorted by “Largest to Smallest” PCIe Bus Usage).

Bus-Controller-VRAM.png

Bus-Controller-VRAM909.png

Bus-Controller-VRAM588.png

Bus-Controller-VRAM395.png


What I expected to see was the Memory Controller Load to be in direct correlation with VRAM usage. What we can clearly see here is that Memory Controller Load is in absolute correlation with the GPU Load. VRAM usage seems to make little difference to the way either performs except in edge cases.

Next up, we’ll look directly at the correlation between PCIe-Bus Usage(%) and VRAM usage(MB).

Bus-VRAM.png

Bus-VRAM198.png

Bus-VRAM075.png

Bus-VRAM723.png


Besides the Insurgency graph, it appears that there is no direct correlation between the PCIe Bus and VRAM. I had to run these benchmarks multiple times, as I was a little confused that the PCIe Bus usage was always so low, or in some cases, idle.

Next let’s look at the overall correlation between Memory Controller Load (%) and the PCIe Bus usage (%)

Bus-Controller.png

Bus-Controller519.png

Bus-Controller464.png

Bus-Controller509.png


You can see there’s literally no particular change in PCIe Bus usage overall. When the Memory Controller Load peaks, the data for the PCIe Bus shows no reaction to the change.

Finally let’s take a look at the individual Memory Bandwidth Usage (GB/s) figures overall. Note, these figures are not 100% accurate, and follow the 100% = 224GB/s rule.

EstimatedbandwidthusageALL.png

EstimatedbandwidthusageALL421.png

EstimatedbandwidthusageALL973.png

EstimatedbandwidthusageALL660.png


We can see in most cases the Memory Bandwidth usage (GB/s) is actually extremely erratic over the period. Shadow of Mordor showed the only real case where the usage was relatively persistent throughout the benchmark. You’ll also probably notice that it hits a rather high figure at peak load.

Let’s look at what these figures equate to overall. For this I’ve used the 95th percentile rule to remove freak results from both the low and high end of the scale. Note, these figures indicate bandwidth with Maxwell compression methods (~30%) in mind.

EstimatedbandwidthusageAFTER577.png

EstimatedbandwidthusageAFTER396.png

EstimatedbandwidthusageAFTER103.png

EstimatedbandwidthusageAFTER947.png


We’ll see most of these figures are relatively high, though none manage to reach the limit of my 970’s 224GB/s bandwidth available at any time. The only exception is Starpoint Gemini II, which despite eating VRAM when available, didn’t appear to put much load on the Memory Controller. If we took the Memory Controller Load figure as a good representation of actual bandwidth usage, the 970 is never really in danger of being overwhelmed. We can clearly see however that the peak figures would be too much for a 960’s 112GB/s available bandwidth. If we ran by the average figures instead, the 960 could cope with a couple of the games, but it would still choke on the big titles during average gameplay. We can’t discount the peak figures though, so you’d certainly see issues at the 1440p resolution.

For the sake of estimation and sheer curiosity, here is what the estimated Memory Bandwidth Usage would be if Maxwell was exactly 30% efficient at compression, without the compression.

EstimatedbandwidthusageBEFORE528.png

EstimatedbandwidthusageBEFORE827.png

EstimatedbandwidthusageBEFORE931.png

EstimatedbandwidthusageBEFORE997.png


The 970 would still cope, except in peak cases during Shadow of Mordor, where the required bandwidth exceeds that of the available 224GB/s. Obviously all these figures are mere estimates, so the actual cases may vary in real world examples.

*****

1080p – Overall Correlations

These are the main benchmarks we’ll be looking at for our 112GB/s bandwidth limit on the 960. The card is aimed at this resolution, so hopefully we’ll see some post-Maxwell compression figures dropping us in that area.

Let’ take a look at the overall figures for this, and look for similarities between 1440p correlation (or lack of). The previous charts showed Memory Controller Load linked with GPU Load and not VRAM Usage.

Bus-Controller-VRAM298.png

Bus-Controller-VRAM411.png

Bus-Controller-VRAM993.png

Bus-Controller-VRAM027.png


This surprised me a little bit. If you look relatively closed at the peaks and drops, all three measurements appear to correlate rather well at this resolution. The VRAM drops actually appear to associate with the drops in Memory Controller Load as well as GPU Usage. Certainly an interesting turn of events.

Next let’s take a look at the PCIe Bus usage and VRAM. There were no direct correlations in the 1440p benchmarks.

Bus-VRAM686.png

Bus-VRAM004.png

Bus-VRAM645.png

Bus-VRAM414.png


This time things look a little more interesting, but unexplained. Far Cry 4 shows no real correlation at all. The rest of the games however seem to show a drop in PCIe Bus usage every time there’s a drop in VRAM usage, before the VRAM usage steadily rises before dropping again.

Next up is the Bus and Memory Controller figures.

Bus-Controller878.png

Bus-Controller123.png

Bus-Controller026.png

Bus-Controller705.png


This time again, no real correlation. A similar result to the 1440p benchmark. No unexpected surprises there.

Here are the figures you’re more interested in however. Let’s take a look at the overall Memory Controller Usage over the benchmarks. This should show us approximate (again inaccurately) how much bandwidth 1080p seems to scream for.

EstimatedbandwidthusageALL816.png

EstimatedbandwidthusageALL333.png

EstimatedbandwidthusageALL280.png

EstimatedbandwidthusageALL350.png


This time Shadow of Mordor follows suit and starts to become a little more erratic along with the rest. We can see some interesting peaks in usage, as well as a general idea of what the average is overall. The plateau at the beginning of Far Cry 4 is particularly interesting.

Next, here are those overall figures in a more pleasant representation. Here we can see exactly what the figures are. Again, using the 95th percentile rule for these results to remove the serious spikes, these results are not 100% accurate.

EstimatedbandwidthusageAFTER668.png

EstimatedbandwidthusageAFTER049.png

EstimatedbandwidthusageAFTER415.png

EstimatedbandwidthusageAFTER569.png


Shadow ofMordor slaughters all, even in the average benchmark. Far Cry 4 scrapes the barrel in the average figures, but again, the peak proves to be above the 112GB/s mark. The Source engine game as well as SPG2 however prove to be completely viable solutions.

Here’s what the results would look like without the estimated ~30% Maxwell compression.

EstimatedbandwidthusageBEFORE786.png

EstimatedbandwidthusageBEFORE977.png

EstimatedbandwidthusageBEFORE859.png

EstimatedbandwidthusageBEFORE031.png


Shadow of Mordor peaks within percentile points of the available bandwidth on a 770 (224GB/s), but all other games remain below to 200GB/s mark.

Conclusion

Something you have to bear in mind when looking at these figures (besides the fact they are most certainly not 100% accurate), is that it’s plausible memory bandwidth acts similar to VRAM. There are many occasions where people can see VRAM usages in an average game hit a certain mark, let’s say 1800MB on a 2GB card. Other people, running the same settings, but with a 4GB card may see usages above and beyond 2GB, almost as though the game is using the available VRAM simply because it can. Is it possible that games utilise memory bandwidth in a similar fashion? Possibly, but we don’t really know. It could be possible that the same benchmark, when run on a 770 which shares identical bandwidth with the 970 (224GB/s) may provide higher results due to the lack of compression, but prove to be less than the 30% assumption. Maybe the video card wouldn’t “stretch it’s legs” and would be more conservative with bandwidth usage if it had less available. It’d be an interesting benchmark to see.

If we treated these bandwidth figures as a reference (which you most certainly should not), we could then assume that the GTX 960’s 128bit wide memory interface simply does not provide enough bandwidth to play AAA titles at Very High (or High where not available) and Ultra Presets on 1080p. If we went by average figures, it would get by OK, but struggle at peak loads. In terms of Independent titles, along with Source engine games, it’d do just fine. It may be the case that at 1080p turning off a little eye candy would put the game within the 112GB/s limit and remove that bottleneck in AAA titles.

The main issue is that more and more AAA titles may follow the example of games like Shadow of Mordor and require more and more VRAM and eat up more bandwidth. If things plateau at that sort of figure, perhaps the 112GB/s would cope. In the event AAA titles became more advanced in their fidelity, the 960 might find itself quickly outpaced by rivals offering a more sensible bandwidth ceiling.

Finally, I’ll leave you again with the same bold statement, that the (GB/s) figures in these benchmarks are merely estimates of a largely inaccurate form of extrapolating memory bandwidth usage figures. By no means should you base a purchase on these, as the percentage representation of memory bandwidth is open to extremely broad interpretation.

If anyone would be so kind as to run a benchmark of these games on a 770 and send the log over to me, I can more accurately show bandwidth usage BEFORE Maxwell compression. I’d also be delighted to see user’s benchmarks on GTX 960’s to prove these estimates horribly wrong.
 
Last edited:
Score one for HBM? Maybe that's why AMD is bidding its time waiting for HBM to become marketable.
 
this needs to be a proper front page article.


in my personal experience, too low a memory bus can cripple a card for sure - i've been hit with models in the past that had double the ram but half the bandwidth and their performance was miserable.
 
Wow, just wow! You outdid yourself. That's quile alot of work, and some interesting results.

It is just estimates, which you reiterate numerous times, of the 960's abilities. I tend to think NVIDIA's engineers knew what they were doing when they implemented a 128-bit bus, and thusly, that it probably will perform a bit better than your estimates at 1080p.

Probably it will only be able to do mostly High settings though, with Ultra out of the question, and Very High in alot of games only with AA and tesselation turned down. For the vast majority of average gamers out there who just buy a mid-range card every year, I bet it will be good enough.
 
a couple of the images seem confusing - text is the same, but different results.

Capture348.jpg
 
Score one for HBM? Maybe that's why AMD is bidding its time waiting for HBM to become marketable.
If my results are correct (which they aren't), I think NVidia has put too much hope in Maxwell compression. There are events at which Maxwell compression goes beyond 30%, but in contrast, there are occasions when it is less than 30%
this needs to be a proper front page article.
Not my call, and it's not 100% accurate information, merely educated extrapolation. W1zzard could have done this himself quite easily, but he'd have to give up a few days to get it done (probably far prettier than I have done too)
It is just estimates, which you reiterate numerous times, of the 960's abilities. I tend to think NVIDIA's engineers knew what they were doing when they implemented a 128-bit bus, and thusly, that it probably will perform a bit better than your estimates at 1080p.
Yeah I wanted to reiterate that, because they are not accurate. NVidia said the Bus monitoring was not accurate, and W1zzard explained how memory controller load was by proxy memory bandwidth usage, but not a 1:1 represenation.
a couple of the images seem confusing - text is the same, but different results.

Capture348.jpg
Those are the results for each game in order. I forgot to Title each graph to each game.
All graphs in order are
Far Cry 4
Insurgency
Shadow of Mordor
SPG2

Let me eat and I'll reup the images with titles in each case I've missed a game title.
 
Last edited:
yeah all it needs is the titles to make sense.
 
Score one for HBM? Maybe that's why AMD is bidding its time waiting for HBM to become marketable.
Huh? , this shows the oposite of what would expect .
While all manufactures are going to go to 3d ram , it seems for mid range right now you don't need gobs of BW yet .
At least on current Nvidia cards as they don't go above 368 bus (GM2xx) .
That said I was thinking a 192bus for 960 would of been better, maybe the 960ti will be that .
 
amd:nutkick:nvidia

amd powers the game systems and proves the worth of a architecture years old scaling from entry level to high end and nvidia can't boast any real performance improvement on a brand new architecture outside of efficiency
 
Last edited:
Very nice article!!! Its good to have figures like this to at least help alleviate alot of the theoreticals and "Ifs" surrounding memory bandwidth, memory usage, etc. Though I am a bit shocked by some of the results as I did not expect it to be so demanding at 1080p though I guess its safe to say this is thanks to new games and the ever changing realm with higher graphics and fidelity.

Nice article!
 
Wow, thanks for doing that! Lots of work!

I have a question about the protocol.... this is a 970, correct? And you are measuring memory controller load with the 970 running full tilt at 1440p and 1080p?

The first thing that occurs to me is that the 960 will run at slower framerates than the 970, and not because the bus is limiting it... all the specs are reduced. What you've shown is that the 970 would be memory bus limited if it was cut in half, but since the 960 will be running slower fps anyway, it might not have this issue. As a rough guess we could scale it by shaders and say we'd expect the 960 to run ~1024/1664 or 62% of the 970. I'd expect the memory bandwidth requirement to scale similarly.
 
Wow, thanks for doing that! Lots of work!

I have a question about the protocol.... this is a 970, correct? And you are measuring memory controller load with the 970 running full tilt at 1440p and 1080p?

The first thing that occurs to me is that the 960 will run at slower framerates than the 970, and not because the bus is limiting it... all the specs are reduced. What you've shown is that the 970 would be memory bus limited if it was cut in half, but since the 960 will be running slower fps anyway, it might not have this issue. As a rough guess we could scale it by shaders and say we'd expect the 960 to run ~1024/1664 or 62% of the 970. I'd expect the memory bandwidth requirement to scale similarly.

You are wholly correct. It's all done on a 970, and judging by the fact I discovered that memory controller load is directly correlated with GPU load, we can assume that the lower the maximum GPU load, the lower the memory bandwidth will be. That's a wild guess on my part, and in reality could be hugely wrong.
It's one of the many reasons I wanted to test a 770, as it shares a 970's 224GB/s bandwidth, but obviously has less horsepower for a backbone. It would not only show the true difference between Maxwell compression, but also the effect a lower powered GPU load has on bandwidth.
 
Lovely write up though I didn't find the conclusion very..conclusive other than that too little bandwidth = problematic for performance.
Was that ever in question?

What I find more difficult to grasp is how important the speeds of GPU memory is. Often I find little real world gain from even significant over clocks except in acute situations.
 
This was a lot of work i am sure. Thanks for bringing it up.

Its nice to see something, and I use this term loosely as you do essentially, 'concrete' on the issue. I though, like newconroer, find this 'proves' what people know already (but could never put their finger on it). I just wish we could have concrete numbers to base the data off of. Its a logical leap, but lord knows without actual/factual data to start with, if it extrapolates out to fact.

People just need to know that, regardless of the bandwidth, what the FPS say is what you will get regardless. Another way to put it, I have the same 4 cars with different motors and they all run 12s 1/4 mile... one does it N/A, one boosted with a snail, the other a screw, and the other a rotary. It doesn't matter how it gets there, just that it does. :)
 
so how does the compression work anyway? is it hardware limited to 30 percent or could they improve it with drivers?
 
I discovered that memory controller load is directly correlated with GPU load

That's a key finding right there. In that case you seem to have proved that the 960's 128bit bus will be fine at 1080p, and nearly always at 1440p. Doesn't mean it is a great card or anything, but that the 128bit bus won't be slowing it down, but rather the processor.

The big question I have, is can you say the same for the 2GB of vram? Would that scale with GPU load as well? And is there any way to tell how much vram is really needed (vs allocated) without testing identical cards with different amounts of vram?

You'll want to see this. Says the 960 sucks because of its 128bit bus, and at 4k it gets creamed by an R9 280. http://wccftech.com/nvidia-geforce-gtx-960-radeon-r9-280-4k-benchmarks/

GTX-960-Vs-R9-280-4K-Benchmarks--635x321.png



Their conclusion that it will also suffer at 1080p doesn't make sense to me.
 
What I expected to see was the Memory Controller Load to be in direct correlation with VRAM usage.

I would expect MCL to be in direct correlation with cache eviction rate regardless of vram usage.

Also, why is it surprising that MCL increases with GPU load for typical usage?
 
that is a ridiculous article. nothing at all is valid about it. no test setup listed. no multiple graphs at different settings and resolutions. not to mention 1 gpu is not enough for 4k and neither is 4gb depending on the game..
 
that is a ridiculous article.

Yep, shamefully weak. Even if the data is 100% real, conjuring an unrealistic situation where the 960 would suck just so you can knock it is... well, not very objective.

How many people will be 4k gaming with a 960 or R9 280? Who cares which one sucks a little less at that res? The proof will be what happens at 1080p.
 
Last edited:
so how does the compression work anyway? is it hardware limited to 30 percent or could they improve it with drivers?
Not all data is compressible by the same ratio, or at all in some cases. You can find out more info from the Maxwell white paper (PDF pages 10-11)
The salient points are:

1c30bzN.jpg


@RCoon
Thanks for the time and effort. Having done a few articles myself, I can appreciate how a concept quickly morphs into leviathan proportions that you possibly didn't originally imagine.

EDIT:
How many people will be 4k gaming with a 960 or R9 280?
Hey, you haven't lived (and nor will you) until you've played a FPS at 4K with a mainstream card.
bf4_3840_2160.gif
 
Last edited:
that is a ridiculous article. nothing at all is valid about it. no test setup listed. no multiple graphs at different settings and resolutions. not to mention 1 gpu is not enough for 4k and neither is 4gb depending on the game..

did you not read the article linked to?

Here is the test setup used:

  • Intel Core i7-3960X
  • MSI X79A-GD65
  • AMD Radeon R9 280 (Stock/Reference)
  • Geforce GTX 960 (Stock/Reference)
  • Windows 8.1
  • Catalyst OMEGA Drivers
  • Nvidia 347.13 Drivers
The performance is given in percentages, with the GTX 960 as the base unit (100%) for the relative scale. Now here is the thing; lets face it, most of the people buying a GTX 960 or an R9 280 are not going to be gaming at 4K. So the extraordinary difference in performance here is meant to show you only one thing: that the bus width problem is very much real. While its going to be nowhere near as defined on 1080p, it will remain a problem. The fact is no amount of software can overcome lack of hardware.

great work rcoon!

if you ever get bored, or have a few nights of insomnia i would love to know what kinda of figures something like catzilla at high res uses as i think it might be more in line with lotr than source.

but as others have said the mc will work in tandem with the core more than the vram usage as the vram is only really filled or emptied, past that all the mc does is serve data from the vram to the core as its needs it (read when its under load).
 
yes i read all of it and it barely even passes as a test setup list (to be honest i was just so baffled by the whole article when i typed that)

look at the chart itself.. a 960 is relative to 100 percent performance at 4k :wtf:

if anything they proved that you will certainly hit a vram wall with only 2gb at 4k and is a no brainier so they should have put the the 960 against the 285

im not really a fan of shrinking bus width like this thus far but to just try and bash it in this way is just silly

@HumanSmoke thanks for sharing
 
Last edited:
Nevermind I think I'm disoriented watching the SOTUA
 
Last edited:
Back
Top