• We've upgraded our forums. Please post any issues/requests in this thread.

Article: Just How Important is GPU Memory Bandwidth?

RCoon

Gaming Moderator
Staff member
Joined
Apr 19, 2012
Messages
11,365 (5.51/day)
Likes
9,492
Location
Gypsyland, UK
System Name HP Omen 17
Processor i7 7700HQ
Memory 16GB 2400Mhz DDR4
Video Card(s) GTX 1060
Storage Samsung SM961 256GB + HGST 1TB
Display(s) 1080p IPS G-SYNC 75Hz
Audio Device(s) Bang & Olufsen
Power Supply 230W
Mouse Roccat Kone XTD+
Software Win 10 Pro
#1
****
Holy crap I am tired and this is all probably totally wrong
You can see all my original data here:
https://www.dropbox.com/sh/v3vqnglktagj8tr/AADvMQeqR-nxETkn4PwJKlZBa?dl=0
****
Introduction

The main reason for running into this kind of article was with the recent “exclamations” about the GTX 960’s 128bit wide memory interface. The GPU offers a 112GB/s memory bandwidth, and many believe that this narrow interface will not provide enough memory bandwidth for games. This card is primarily aimed at the midrange crowd, wanting to run modern titles (both AAA and independent), at a native resolution of 1080p.

Memory bandwidth usage is actually incredibly difficult to measure, but it’s the only way of making known once and for all, what the real 1080p requirement is for memory bandwidth. Typically using GPU-Z, what we have available to us is “Memory Controller Load”. This is a percentage figure does not accurately measure the total GB/s bandwidth that is being used. The easiest way to explain it is it acts similar to the percentage CPU utilisation Task Manager shows. Another example would be GPU Load, wherein various types of load can cause the same percentage figure measurement, but can have very different power usage readings, leading us to assume one 97% load can be much more intensive than another. Something else that only NVidia cards allow measurements of is PCIe Bus usage. AMD has yet to allow such a measurement, and thanks to @W1zzard for throwing me a test build of GPU-Z, I could run some Bus usage benchmarks. I had a fair few expectations from the figures, but the results I got were a little less than expected.

Something I need to make clear before you read on, my memory bandwidth usage figures (GB/s) are not 100% accurate. They have been estimated and extrapolated using performance percentages of the benchmark figures I’ve got, as such, most of this article will be relying largely on those estimations. Only a fool would consider it as fact. NVidia has said themselves that Bus usage is wholly inaccurate, and most of us are aware that Memory Controller Load (%) cannot represent the exact bandwidth usage (GB/s) with total precision. All loads are different.

All of the following benchmarks were run 4 times for each game on each resolution for accuracy. Every preset is set to High where Very High is unavailable. The only graphical alteration to my video settings was turning off VSync and Motion Blur.

Choices of Games

I’ve chosen to run with 4 games which I felt represented a fair array of game types. For CPU orientated, I’ve run with Insurgency. This is Source engine based, highly CPU intensive, and should cover most games running that sort of requirement. It has a reasonable VRAM requirement, but is overall quite light on general GPU usage, so it should stress the memory somewhat.

To represent the independent games, while also holding a high VRAM requirement, I’ve run with Starpoint Gemini II. This game has massive VRAM requirements, and is quite a GPU heavy game.

I’ve chosen two other games for the AAA area, one very generalised game, and one that boasted massive 4GB VRAM requirements for general high res play. Far Cry 4 felt like a good representative for the AAA genre that has balance in both general performance of the CPU, GPU, and moderate VRAM requirements. Middle Earth: Shadow of Mordor was my choice for the AAA genre to slaughter my VRAM and hopefully put my GPU memory controller and VRAM to the test.
*****

1440p – Overall Correlations

I’ve started off with benchmarks running on 1440p to clearly identify what kind of GPU power is required for this resolution. I understand that the 112GB/s bandwidth we’re aiming for is designed to cope with 1080p, but hopefully you’ll see just what you need.

First off, we’ll take a look at all four games, and the performance of the GPU Core(%), Memory Controller Load(%), and VRAM Usage(MB). (The following data has been sorted by “Largest to Smallest” PCIe Bus Usage).






What I expected to see was the Memory Controller Load to be in direct correlation with VRAM usage. What we can clearly see here is that Memory Controller Load is in absolute correlation with the GPU Load. VRAM usage seems to make little difference to the way either performs except in edge cases.

Next up, we’ll look directly at the correlation between PCIe-Bus Usage(%) and VRAM usage(MB).






Besides the Insurgency graph, it appears that there is no direct correlation between the PCIe Bus and VRAM. I had to run these benchmarks multiple times, as I was a little confused that the PCIe Bus usage was always so low, or in some cases, idle.

Next let’s look at the overall correlation between Memory Controller Load (%) and the PCIe Bus usage (%)






You can see there’s literally no particular change in PCIe Bus usage overall. When the Memory Controller Load peaks, the data for the PCIe Bus shows no reaction to the change.

Finally let’s take a look at the individual Memory Bandwidth Usage (GB/s) figures overall. Note, these figures are not 100% accurate, and follow the 100% = 224GB/s rule.






We can see in most cases the Memory Bandwidth usage (GB/s) is actually extremely erratic over the period. Shadow of Mordor showed the only real case where the usage was relatively persistent throughout the benchmark. You’ll also probably notice that it hits a rather high figure at peak load.

Let’s look at what these figures equate to overall. For this I’ve used the 95th percentile rule to remove freak results from both the low and high end of the scale. Note, these figures indicate bandwidth with Maxwell compression methods (~30%) in mind.






We’ll see most of these figures are relatively high, though none manage to reach the limit of my 970’s 224GB/s bandwidth available at any time. The only exception is Starpoint Gemini II, which despite eating VRAM when available, didn’t appear to put much load on the Memory Controller. If we took the Memory Controller Load figure as a good representation of actual bandwidth usage, the 970 is never really in danger of being overwhelmed. We can clearly see however that the peak figures would be too much for a 960’s 112GB/s available bandwidth. If we ran by the average figures instead, the 960 could cope with a couple of the games, but it would still choke on the big titles during average gameplay. We can’t discount the peak figures though, so you’d certainly see issues at the 1440p resolution.

For the sake of estimation and sheer curiosity, here is what the estimated Memory Bandwidth Usage would be if Maxwell was exactly 30% efficient at compression, without the compression.






The 970 would still cope, except in peak cases during Shadow of Mordor, where the required bandwidth exceeds that of the available 224GB/s. Obviously all these figures are mere estimates, so the actual cases may vary in real world examples.

*****

1080p – Overall Correlations

These are the main benchmarks we’ll be looking at for our 112GB/s bandwidth limit on the 960. The card is aimed at this resolution, so hopefully we’ll see some post-Maxwell compression figures dropping us in that area.

Let’ take a look at the overall figures for this, and look for similarities between 1440p correlation (or lack of). The previous charts showed Memory Controller Load linked with GPU Load and not VRAM Usage.






This surprised me a little bit. If you look relatively closed at the peaks and drops, all three measurements appear to correlate rather well at this resolution. The VRAM drops actually appear to associate with the drops in Memory Controller Load as well as GPU Usage. Certainly an interesting turn of events.

Next let’s take a look at the PCIe Bus usage and VRAM. There were no direct correlations in the 1440p benchmarks.






This time things look a little more interesting, but unexplained. Far Cry 4 shows no real correlation at all. The rest of the games however seem to show a drop in PCIe Bus usage every time there’s a drop in VRAM usage, before the VRAM usage steadily rises before dropping again.

Next up is the Bus and Memory Controller figures.






This time again, no real correlation. A similar result to the 1440p benchmark. No unexpected surprises there.

Here are the figures you’re more interested in however. Let’s take a look at the overall Memory Controller Usage over the benchmarks. This should show us approximate (again inaccurately) how much bandwidth 1080p seems to scream for.






This time Shadow of Mordor follows suit and starts to become a little more erratic along with the rest. We can see some interesting peaks in usage, as well as a general idea of what the average is overall. The plateau at the beginning of Far Cry 4 is particularly interesting.

Next, here are those overall figures in a more pleasant representation. Here we can see exactly what the figures are. Again, using the 95th percentile rule for these results to remove the serious spikes, these results are not 100% accurate.






Shadow ofMordor slaughters all, even in the average benchmark. Far Cry 4 scrapes the barrel in the average figures, but again, the peak proves to be above the 112GB/s mark. The Source engine game as well as SPG2 however prove to be completely viable solutions.

Here’s what the results would look like without the estimated ~30% Maxwell compression.






Shadow of Mordor peaks within percentile points of the available bandwidth on a 770 (224GB/s), but all other games remain below to 200GB/s mark.

Conclusion

Something you have to bear in mind when looking at these figures (besides the fact they are most certainly not 100% accurate), is that it’s plausible memory bandwidth acts similar to VRAM. There are many occasions where people can see VRAM usages in an average game hit a certain mark, let’s say 1800MB on a 2GB card. Other people, running the same settings, but with a 4GB card may see usages above and beyond 2GB, almost as though the game is using the available VRAM simply because it can. Is it possible that games utilise memory bandwidth in a similar fashion? Possibly, but we don’t really know. It could be possible that the same benchmark, when run on a 770 which shares identical bandwidth with the 970 (224GB/s) may provide higher results due to the lack of compression, but prove to be less than the 30% assumption. Maybe the video card wouldn’t “stretch it’s legs” and would be more conservative with bandwidth usage if it had less available. It’d be an interesting benchmark to see.

If we treated these bandwidth figures as a reference (which you most certainly should not), we could then assume that the GTX 960’s 128bit wide memory interface simply does not provide enough bandwidth to play AAA titles at Very High (or High where not available) and Ultra Presets on 1080p. If we went by average figures, it would get by OK, but struggle at peak loads. In terms of Independent titles, along with Source engine games, it’d do just fine. It may be the case that at 1080p turning off a little eye candy would put the game within the 112GB/s limit and remove that bottleneck in AAA titles.

The main issue is that more and more AAA titles may follow the example of games like Shadow of Mordor and require more and more VRAM and eat up more bandwidth. If things plateau at that sort of figure, perhaps the 112GB/s would cope. In the event AAA titles became more advanced in their fidelity, the 960 might find itself quickly outpaced by rivals offering a more sensible bandwidth ceiling.

Finally, I’ll leave you again with the same bold statement, that the (GB/s) figures in these benchmarks are merely estimates of a largely inaccurate form of extrapolating memory bandwidth usage figures. By no means should you base a purchase on these, as the percentage representation of memory bandwidth is open to extremely broad interpretation.

If anyone would be so kind as to run a benchmark of these games on a 770 and send the log over to me, I can more accurately show bandwidth usage BEFORE Maxwell compression. I’d also be delighted to see user’s benchmarks on GTX 960’s to prove these estimates horribly wrong.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
20,906 (6.24/day)
Likes
10,000
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) PowerColor PCS+ 390 8 GiB DVI + HDMI
Storage Crucial MX300 275 GB, Seagate 6 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek Onboard, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
#2
Score one for HBM? Maybe that's why AMD is bidding its time waiting for HBM to become marketable.
 

Mussels

Moderprator
Staff member
Joined
Oct 6, 2004
Messages
46,100 (9.57/day)
Likes
13,528
Location
Australalalalalaia.
System Name Daddy Long Legs
Processor Ryzen R7 1700, 3.9GHz 1.375v
Motherboard MSI X370 Gaming PRO carbon
Cooling Fractal Celsius S24 (Silent fans, meh pump)
Memory 16GB 2133 generic @ 2800
Video Card(s) MSI GTX 1080 Gaming X (BIOS modded to Gaming Z - faster and solved black screen bugs!)
Storage 1TB Intel SSD Pro 6000p (60TB USB3 storage)
Display(s) Samsung 4K 40" HDTV (UA40KU6000WXXY) / 27" Qnix 2K 110Hz
Case Fractal Design R5. So much room, so quiet...
Audio Device(s) Pioneer VSX-519V + Yamaha YHT-270 / sennheiser HD595/518 + bob marley zion's
Power Supply Corsair HX 750i (Platinum, fan off til 300W)
Mouse Logitech G403 + KKmoon desk-sized mousepad
Keyboard Corsair K65 Rapidfire
Software Windows 10 pro x64 (all systems)
Benchmark Scores Laptops: i7-4510U + 840M 2GB (touchscreen) 275GB SSD + 16GB i7-2630QM + GT 540M + 8GB
#3
this needs to be a proper front page article.


in my personal experience, too low a memory bus can cripple a card for sure - i've been hit with models in the past that had double the ram but half the bandwidth and their performance was miserable.
 

rtwjunkie

PC Gaming Enthusiast
Joined
Jul 25, 2008
Messages
9,373 (2.73/day)
Likes
13,006
Location
Louisiana -Laissez les bons temps rouler!
Processor Core i7-3770k 3.5Ghz, O/C to 4.2Ghz fulltime @ 1.19v
Motherboard ASRock Fatal1ty Z68 Pro Gen3
Cooling All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax T40F CPU cooler
Memory 2x 8GB Mushkin Redline DDR-3 1866
Video Card(s) MSI GTX 980 Ti Gaming 6G LE
Storage 1x 250GB MX200 SSD; 2x 2TB WD Black; 1x4TB WD Black;1x 2TB WD Green (eSATA)
Display(s) HP 25VX 25" IPS @ 1920 x 1080
Case Fractal Design Define R4 Black w/Titanium front -windowed
Audio Device(s) Soundblaster Z
Power Supply Seasonic X-850
Mouse Logitech G500
Keyboard Logitech G610 Orion mechanical (Cherry Brown switches)
Software Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)
#4
Wow, just wow! You outdid yourself. That's quile alot of work, and some interesting results.

It is just estimates, which you reiterate numerous times, of the 960's abilities. I tend to think NVIDIA's engineers knew what they were doing when they implemented a 128-bit bus, and thusly, that it probably will perform a bit better than your estimates at 1080p.

Probably it will only be able to do mostly High settings though, with Ultra out of the question, and Very High in alot of games only with AA and tesselation turned down. For the vast majority of average gamers out there who just buy a mid-range card every year, I bet it will be good enough.
 

Mussels

Moderprator
Staff member
Joined
Oct 6, 2004
Messages
46,100 (9.57/day)
Likes
13,528
Location
Australalalalalaia.
System Name Daddy Long Legs
Processor Ryzen R7 1700, 3.9GHz 1.375v
Motherboard MSI X370 Gaming PRO carbon
Cooling Fractal Celsius S24 (Silent fans, meh pump)
Memory 16GB 2133 generic @ 2800
Video Card(s) MSI GTX 1080 Gaming X (BIOS modded to Gaming Z - faster and solved black screen bugs!)
Storage 1TB Intel SSD Pro 6000p (60TB USB3 storage)
Display(s) Samsung 4K 40" HDTV (UA40KU6000WXXY) / 27" Qnix 2K 110Hz
Case Fractal Design R5. So much room, so quiet...
Audio Device(s) Pioneer VSX-519V + Yamaha YHT-270 / sennheiser HD595/518 + bob marley zion's
Power Supply Corsair HX 750i (Platinum, fan off til 300W)
Mouse Logitech G403 + KKmoon desk-sized mousepad
Keyboard Corsair K65 Rapidfire
Software Windows 10 pro x64 (all systems)
Benchmark Scores Laptops: i7-4510U + 840M 2GB (touchscreen) 275GB SSD + 16GB i7-2630QM + GT 540M + 8GB
#5
a couple of the images seem confusing - text is the same, but different results.

 

RCoon

Gaming Moderator
Staff member
Joined
Apr 19, 2012
Messages
11,365 (5.51/day)
Likes
9,492
Location
Gypsyland, UK
System Name HP Omen 17
Processor i7 7700HQ
Memory 16GB 2400Mhz DDR4
Video Card(s) GTX 1060
Storage Samsung SM961 256GB + HGST 1TB
Display(s) 1080p IPS G-SYNC 75Hz
Audio Device(s) Bang & Olufsen
Power Supply 230W
Mouse Roccat Kone XTD+
Software Win 10 Pro
#6
Score one for HBM? Maybe that's why AMD is bidding its time waiting for HBM to become marketable.
If my results are correct (which they aren't), I think NVidia has put too much hope in Maxwell compression. There are events at which Maxwell compression goes beyond 30%, but in contrast, there are occasions when it is less than 30%
this needs to be a proper front page article.
Not my call, and it's not 100% accurate information, merely educated extrapolation. W1zzard could have done this himself quite easily, but he'd have to give up a few days to get it done (probably far prettier than I have done too)
It is just estimates, which you reiterate numerous times, of the 960's abilities. I tend to think NVIDIA's engineers knew what they were doing when they implemented a 128-bit bus, and thusly, that it probably will perform a bit better than your estimates at 1080p.
Yeah I wanted to reiterate that, because they are not accurate. NVidia said the Bus monitoring was not accurate, and W1zzard explained how memory controller load was by proxy memory bandwidth usage, but not a 1:1 represenation.
a couple of the images seem confusing - text is the same, but different results.

Those are the results for each game in order. I forgot to Title each graph to each game.
All graphs in order are
Far Cry 4
Insurgency
Shadow of Mordor
SPG2

Let me eat and I'll reup the images with titles in each case I've missed a game title.
 
Last edited:

Mussels

Moderprator
Staff member
Joined
Oct 6, 2004
Messages
46,100 (9.57/day)
Likes
13,528
Location
Australalalalalaia.
System Name Daddy Long Legs
Processor Ryzen R7 1700, 3.9GHz 1.375v
Motherboard MSI X370 Gaming PRO carbon
Cooling Fractal Celsius S24 (Silent fans, meh pump)
Memory 16GB 2133 generic @ 2800
Video Card(s) MSI GTX 1080 Gaming X (BIOS modded to Gaming Z - faster and solved black screen bugs!)
Storage 1TB Intel SSD Pro 6000p (60TB USB3 storage)
Display(s) Samsung 4K 40" HDTV (UA40KU6000WXXY) / 27" Qnix 2K 110Hz
Case Fractal Design R5. So much room, so quiet...
Audio Device(s) Pioneer VSX-519V + Yamaha YHT-270 / sennheiser HD595/518 + bob marley zion's
Power Supply Corsair HX 750i (Platinum, fan off til 300W)
Mouse Logitech G403 + KKmoon desk-sized mousepad
Keyboard Corsair K65 Rapidfire
Software Windows 10 pro x64 (all systems)
Benchmark Scores Laptops: i7-4510U + 840M 2GB (touchscreen) 275GB SSD + 16GB i7-2630QM + GT 540M + 8GB
#7
yeah all it needs is the titles to make sense.
 
Joined
Dec 14, 2006
Messages
376 (0.09/day)
Likes
53
System Name Ed-PC
Processor Intel i5-3570k
Motherboard Asus P8Z77 V-Pro
Cooling CM 212 evo
Memory Crucial Ballistix Tactical Tracer DDR3 1600 8GB
Video Card(s) Nvidia MSI 660ti PE OC
Storage WD black 500gig
Case Corsair 500R
Audio Device(s) onboard
Power Supply Corsair 650TX V2
Software Win7 Pro 64bit
#8
Score one for HBM? Maybe that's why AMD is bidding its time waiting for HBM to become marketable.
Huh? , this shows the oposite of what would expect .
While all manufactures are going to go to 3d ram , it seems for mid range right now you don't need gobs of BW yet .
At least on current Nvidia cards as they don't go above 368 bus (GM2xx) .
That said I was thinking a 192bus for 960 would of been better, maybe the 960ti will be that .
 

RCoon

Gaming Moderator
Staff member
Joined
Apr 19, 2012
Messages
11,365 (5.51/day)
Likes
9,492
Location
Gypsyland, UK
System Name HP Omen 17
Processor i7 7700HQ
Memory 16GB 2400Mhz DDR4
Video Card(s) GTX 1060
Storage Samsung SM961 256GB + HGST 1TB
Display(s) 1080p IPS G-SYNC 75Hz
Audio Device(s) Bang & Olufsen
Power Supply 230W
Mouse Roccat Kone XTD+
Software Win 10 Pro
#9
Joined
Jan 2, 2015
Messages
1,099 (1.02/day)
Likes
434
Processor FX6350@4.2ghz-i54670k@4ghz
Video Card(s) HD7850-R9290
#10
amd:nutkick:nvidia

amd powers the game systems and proves the worth of a architecture years old scaling from entry level to high end and nvidia can't boast any real performance improvement on a brand new architecture outside of efficiency
 
Last edited:

Tatty_One

Super Moderator
Staff member
Joined
Jan 18, 2006
Messages
19,747 (4.54/day)
Likes
6,014
Location
Worcestershire, UK
Processor Skylake Core i7 6700k @ 4.6gig
Motherboard MSI Z170A Tomahawk
Cooling Cooler Master Seidon 240V AIO/Viper140's
Memory 16GB Corsair Vengeance LPX 3000mhz CL14
Video Card(s) Sapphire 4gb R9 290X VaporX @1150mhz
Storage SkHynix SL308 120GB/CrucialM4/1TB WD Black
Display(s) LG 29inch 2560x1080 Curved Ultrawide IPS
Case Phanteks Enthoo Pro M Windowed - Gunmetal
Audio Device(s) Xifi Elite Pro 7.1/VideoLogic ZXR550's
Power Supply XFX Pro Black Edition 750W Gold modular
Keyboard CM Storm Octane Combo
Software Win 10 Home x64
#11
Joined
Apr 29, 2014
Messages
3,688 (2.79/day)
Likes
2,106
Location
Texas
System Name Alucard / The Reinforcer / Portable?
Processor i7 5930K @ 4.5ghz (24/7) / 2x Intel Xeon X5670 / Intel i7 3610QM
Motherboard MSI X99S Gaming 9 AC / Dell Dual Socket (R710) / MSI Stock Gaming Laptop
Cooling RX 360mm + 140mm Custom Loop in Push Pull Config. / Dell Stock / MSI Stock
Memory Corsair Vengeance DDR4 2666 16gb (4x4gb) CL 16 / 1333mhz DDR3 96gb 12 x 8gb / 12gb DDR3 3 x 4gb
Video Card(s) GTX Titan XP (2025mhz) / Asus GTX 950 (No Power Connector) / GTX 880m
Storage Samsung 840/850 512gb Raid 0, WD Velociraptor 600gb x 5 Raid 5 / 300gb 15k RPM x 8 / 2x 240gb Adata
Display(s) Acer XG270HU 1440p 144hz Freesync, Acer B286HK 4K UHD Monitor, 1 Hanns-G 27inch 1920x1080p Monitor
Case Corsair Obsidian 800D / Dell Poweredge R710 Rack Mount Case / MSI Gaming 17inch
Audio Device(s) Realtec ALC1150 (On board)
Power Supply Rosewill Lightning 1300Watt
Mouse Logitech G5
Keyboard Logitech G19S
Software Windows 10 Pro / Windows Server 2008 R2 / Windows 10 Pro
#12
Very nice article!!! Its good to have figures like this to at least help alleviate alot of the theoreticals and "Ifs" surrounding memory bandwidth, memory usage, etc. Though I am a bit shocked by some of the results as I did not expect it to be so demanding at 1080p though I guess its safe to say this is thanks to new games and the ever changing realm with higher graphics and fidelity.

Nice article!
 
Joined
Dec 29, 2014
Messages
717 (0.66/day)
Likes
198
#13
Wow, thanks for doing that! Lots of work!

I have a question about the protocol.... this is a 970, correct? And you are measuring memory controller load with the 970 running full tilt at 1440p and 1080p?

The first thing that occurs to me is that the 960 will run at slower framerates than the 970, and not because the bus is limiting it... all the specs are reduced. What you've shown is that the 970 would be memory bus limited if it was cut in half, but since the 960 will be running slower fps anyway, it might not have this issue. As a rough guess we could scale it by shaders and say we'd expect the 960 to run ~1024/1664 or 62% of the 970. I'd expect the memory bandwidth requirement to scale similarly.
 

RCoon

Gaming Moderator
Staff member
Joined
Apr 19, 2012
Messages
11,365 (5.51/day)
Likes
9,492
Location
Gypsyland, UK
System Name HP Omen 17
Processor i7 7700HQ
Memory 16GB 2400Mhz DDR4
Video Card(s) GTX 1060
Storage Samsung SM961 256GB + HGST 1TB
Display(s) 1080p IPS G-SYNC 75Hz
Audio Device(s) Bang & Olufsen
Power Supply 230W
Mouse Roccat Kone XTD+
Software Win 10 Pro
#14
Wow, thanks for doing that! Lots of work!

I have a question about the protocol.... this is a 970, correct? And you are measuring memory controller load with the 970 running full tilt at 1440p and 1080p?

The first thing that occurs to me is that the 960 will run at slower framerates than the 970, and not because the bus is limiting it... all the specs are reduced. What you've shown is that the 970 would be memory bus limited if it was cut in half, but since the 960 will be running slower fps anyway, it might not have this issue. As a rough guess we could scale it by shaders and say we'd expect the 960 to run ~1024/1664 or 62% of the 970. I'd expect the memory bandwidth requirement to scale similarly.
You are wholly correct. It's all done on a 970, and judging by the fact I discovered that memory controller load is directly correlated with GPU load, we can assume that the lower the maximum GPU load, the lower the memory bandwidth will be. That's a wild guess on my part, and in reality could be hugely wrong.
It's one of the many reasons I wanted to test a 770, as it shares a 970's 224GB/s bandwidth, but obviously has less horsepower for a backbone. It would not only show the true difference between Maxwell compression, but also the effect a lower powered GPU load has on bandwidth.
 
Joined
Jun 20, 2007
Messages
3,833 (1.00/day)
Likes
594
System Name Medusa
Processor i7 2600k @4.8ghz
Motherboard Asus P8P67 Pro
Cooling CPU : Noctua NH-L12 GPU: EK FC 1080 via Magicool 360 III PRO > Photon 170 (D5)
Memory 8gb Corsair XMS DDR3 @1600mhz
Video Card(s) GTX 1080 FE
Storage Vertex 4 256 /Crucial C300 256/ Hitachi 2TB 2x
Display(s) Tempest X270OC @ 120hz / LG W3000h
Case Fractal Define S [Antec Skeleton hanging in hall of fame]
Audio Device(s) Asus Xonar Xense with AKG K612 cans on Monacor SA-100
Power Supply Seasonic X-850
Mouse Razer Naga 2014
Software Windows 10 Pro
Benchmark Scores FFXIV ARR Benchmark 1600p score 12,098[this means nothing any more!]
#15
Lovely write up though I didn't find the conclusion very..conclusive other than that too little bandwidth = problematic for performance.
Was that ever in question?

What I find more difficult to grasp is how important the speeds of GPU memory is. Often I find little real world gain from even significant over clocks except in acute situations.
 
Joined
Dec 31, 2009
Messages
11,485 (3.96/day)
Likes
6,250
Location
Ohio
System Name Daily Driver
Processor 7900X 4.5GHz 10c/10t 1.15V.
Motherboard ASUS Prime X299 Deluxe
Cooling MCR320 + Kuplos Kryos NEXT CPU block
Memory GSkill Trident Z 4x8 GB DDR4 3600 MHz CL16
Video Card(s) EVGA GTX 1080 FTW3
Storage 512GB Patriot Hellfire, 512GB OCZ RD400, 640GB Caviar Black, 2TB Caviar Green
Display(s) Yamakasi 27" 2560x1440 IPS
Case Thermaltake P5
Power Supply EVGA 750W Supernova G2
Benchmark Scores Faster than most of you! Bet on it! :)
#16
This was a lot of work i am sure. Thanks for bringing it up.

Its nice to see something, and I use this term loosely as you do essentially, 'concrete' on the issue. I though, like newconroer, find this 'proves' what people know already (but could never put their finger on it). I just wish we could have concrete numbers to base the data off of. Its a logical leap, but lord knows without actual/factual data to start with, if it extrapolates out to fact.

People just need to know that, regardless of the bandwidth, what the FPS say is what you will get regardless. Another way to put it, I have the same 4 cars with different motors and they all run 12s 1/4 mile... one does it N/A, one boosted with a snail, the other a screw, and the other a rotary. It doesn't matter how it gets there, just that it does. :)
 
Joined
Jan 2, 2015
Messages
1,099 (1.02/day)
Likes
434
Processor FX6350@4.2ghz-i54670k@4ghz
Video Card(s) HD7850-R9290
#17
so how does the compression work anyway? is it hardware limited to 30 percent or could they improve it with drivers?
 
Joined
Dec 29, 2014
Messages
717 (0.66/day)
Likes
198
#18
I discovered that memory controller load is directly correlated with GPU load
That's a key finding right there. In that case you seem to have proved that the 960's 128bit bus will be fine at 1080p, and nearly always at 1440p. Doesn't mean it is a great card or anything, but that the 128bit bus won't be slowing it down, but rather the processor.

The big question I have, is can you say the same for the 2GB of vram? Would that scale with GPU load as well? And is there any way to tell how much vram is really needed (vs allocated) without testing identical cards with different amounts of vram?

You'll want to see this. Says the 960 sucks because of its 128bit bus, and at 4k it gets creamed by an R9 280. http://wccftech.com/nvidia-geforce-gtx-960-radeon-r9-280-4k-benchmarks/




Their conclusion that it will also suffer at 1080p doesn't make sense to me.
 
Joined
Feb 14, 2012
Messages
1,680 (0.79/day)
Likes
576
System Name msdos
Processor 8088
Motherboard mainboard
Cooling passive
Memory 640KB + 384KB extended
Video Card(s) EGA
Storage 5.25"
Display(s) 80x25
Case plastic
Audio Device(s) modchip
Power Supply 45 watts
Mouse serial
Keyboard yes
Software disk commander
Benchmark Scores still running
#19
What I expected to see was the Memory Controller Load to be in direct correlation with VRAM usage.
I would expect MCL to be in direct correlation with cache eviction rate regardless of vram usage.

Also, why is it surprising that MCL increases with GPU load for typical usage?
 
Joined
Jan 2, 2015
Messages
1,099 (1.02/day)
Likes
434
Processor FX6350@4.2ghz-i54670k@4ghz
Video Card(s) HD7850-R9290
#20
that is a ridiculous article. nothing at all is valid about it. no test setup listed. no multiple graphs at different settings and resolutions. not to mention 1 gpu is not enough for 4k and neither is 4gb depending on the game..
 
Joined
Dec 29, 2014
Messages
717 (0.66/day)
Likes
198
#21
that is a ridiculous article.
Yep, shamefully weak. Even if the data is 100% real, conjuring an unrealistic situation where the 960 would suck just so you can knock it is... well, not very objective.

How many people will be 4k gaming with a 960 or R9 280? Who cares which one sucks a little less at that res? The proof will be what happens at 1080p.
 
Last edited:
Joined
Sep 7, 2011
Messages
2,785 (1.22/day)
Likes
1,672
Location
New Zealand
System Name MoneySink
Processor 2600K @ 4.8
Motherboard P8Z77-V
Cooling AC NexXxos XT45 360, RayStorm, D5T+XSPC tank, Tygon R-3603, Bitspower
Memory 16GB Crucial Ballistix DDR3-1600C8
Video Card(s) GTX 780 SLI (EVGA SC ACX + Giga GHz Ed.)
Storage Kingston HyperX SSD (128) OS, WD RE4 (1TB), RE2 (1TB), Cav. Black (2 x 500GB), Red (4TB)
Display(s) Achieva Shimian QH270-IPSMS (2560x1440) S-IPS
Case NZXT Switch 810
Audio Device(s) onboard Realtek yawn edition
Power Supply Seasonic X-1050
Software Win8.1 Pro
Benchmark Scores 3.5 litres of Pale Ale in 18 minutes.
#22
so how does the compression work anyway? is it hardware limited to 30 percent or could they improve it with drivers?
Not all data is compressible by the same ratio, or at all in some cases. You can find out more info from the Maxwell white paper (PDF pages 10-11)
The salient points are:



@RCoon
Thanks for the time and effort. Having done a few articles myself, I can appreciate how a concept quickly morphs into leviathan proportions that you possibly didn't originally imagine.

EDIT:
How many people will be 4k gaming with a 960 or R9 280?
Hey, you haven't lived (and nor will you) until you've played a FPS at 4K with a mainstream card.
 
Last edited:
Joined
Jul 18, 2007
Messages
2,425 (0.64/day)
Likes
645
System Name panda
Processor 6700k
Motherboard sabretooth s
Cooling raystorm block<black ice stealth 240 rad<ek dcc 18w 140 xres
Memory 32gb ripjaw v
Video Card(s) 290x gamer<ntzx g10<antec 920
Storage 950 pro 250gb boot 850 evo pr0n
Display(s) QX2710LED@110hz lg 27ud68p
Case 540 Air
Audio Device(s) nope
Power Supply 750w superflower
Mouse g502
Keyboard shine 3 with grey, black and red caps
Software win 10
Benchmark Scores http://hwbot.org/user/marsey99/
#23
that is a ridiculous article. nothing at all is valid about it. no test setup listed. no multiple graphs at different settings and resolutions. not to mention 1 gpu is not enough for 4k and neither is 4gb depending on the game..
did you not read the article linked to?

Here is the test setup used:

  • Intel Core i7-3960X
  • MSI X79A-GD65
  • AMD Radeon R9 280 (Stock/Reference)
  • Geforce GTX 960 (Stock/Reference)
  • Windows 8.1
  • Catalyst OMEGA Drivers
  • Nvidia 347.13 Drivers
The performance is given in percentages, with the GTX 960 as the base unit (100%) for the relative scale. Now here is the thing; lets face it, most of the people buying a GTX 960 or an R9 280 are not going to be gaming at 4K. So the extraordinary difference in performance here is meant to show you only one thing: that the bus width problem is very much real. While its going to be nowhere near as defined on 1080p, it will remain a problem. The fact is no amount of software can overcome lack of hardware.
great work rcoon!

if you ever get bored, or have a few nights of insomnia i would love to know what kinda of figures something like catzilla at high res uses as i think it might be more in line with lotr than source.

but as others have said the mc will work in tandem with the core more than the vram usage as the vram is only really filled or emptied, past that all the mc does is serve data from the vram to the core as its needs it (read when its under load).
 
Joined
Jan 2, 2015
Messages
1,099 (1.02/day)
Likes
434
Processor FX6350@4.2ghz-i54670k@4ghz
Video Card(s) HD7850-R9290
#24
yes i read all of it and it barely even passes as a test setup list (to be honest i was just so baffled by the whole article when i typed that)

look at the chart itself.. a 960 is relative to 100 percent performance at 4k :wtf:

if anything they proved that you will certainly hit a vram wall with only 2gb at 4k and is a no brainier so they should have put the the 960 against the 285

im not really a fan of shrinking bus width like this thus far but to just try and bash it in this way is just silly

@HumanSmoke thanks for sharing
 
Last edited: