• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Article: Just How Important is GPU Memory Bandwidth?

Joined
Apr 19, 2012
Messages
12,062 (2.77/day)
Location
Gypsyland, UK
System Name HP Omen 17
Processor i7 7700HQ
Memory 16GB 2400Mhz DDR4
Video Card(s) GTX 1060
Storage Samsung SM961 256GB + HGST 1TB
Display(s) 1080p IPS G-SYNC 75Hz
Audio Device(s) Bang & Olufsen
Power Supply 230W
Mouse Roccat Kone XTD+
Software Win 10 Pro
****
Holy crap I am tired and this is all probably totally wrong
You can see all my original data here:
https://www.dropbox.com/sh/v3vqnglktagj8tr/AADvMQeqR-nxETkn4PwJKlZBa?dl=0
****
Introduction

The main reason for running into this kind of article was with the recent “exclamations” about the GTX 960’s 128bit wide memory interface. The GPU offers a 112GB/s memory bandwidth, and many believe that this narrow interface will not provide enough memory bandwidth for games. This card is primarily aimed at the midrange crowd, wanting to run modern titles (both AAA and independent), at a native resolution of 1080p.

Memory bandwidth usage is actually incredibly difficult to measure, but it’s the only way of making known once and for all, what the real 1080p requirement is for memory bandwidth. Typically using GPU-Z, what we have available to us is “Memory Controller Load”. This is a percentage figure does not accurately measure the total GB/s bandwidth that is being used. The easiest way to explain it is it acts similar to the percentage CPU utilisation Task Manager shows. Another example would be GPU Load, wherein various types of load can cause the same percentage figure measurement, but can have very different power usage readings, leading us to assume one 97% load can be much more intensive than another. Something else that only NVidia cards allow measurements of is PCIe Bus usage. AMD has yet to allow such a measurement, and thanks to @W1zzard for throwing me a test build of GPU-Z, I could run some Bus usage benchmarks. I had a fair few expectations from the figures, but the results I got were a little less than expected.

Something I need to make clear before you read on, my memory bandwidth usage figures (GB/s) are not 100% accurate. They have been estimated and extrapolated using performance percentages of the benchmark figures I’ve got, as such, most of this article will be relying largely on those estimations. Only a fool would consider it as fact. NVidia has said themselves that Bus usage is wholly inaccurate, and most of us are aware that Memory Controller Load (%) cannot represent the exact bandwidth usage (GB/s) with total precision. All loads are different.

All of the following benchmarks were run 4 times for each game on each resolution for accuracy. Every preset is set to High where Very High is unavailable. The only graphical alteration to my video settings was turning off VSync and Motion Blur.

Choices of Games

I’ve chosen to run with 4 games which I felt represented a fair array of game types. For CPU orientated, I’ve run with Insurgency. This is Source engine based, highly CPU intensive, and should cover most games running that sort of requirement. It has a reasonable VRAM requirement, but is overall quite light on general GPU usage, so it should stress the memory somewhat.

To represent the independent games, while also holding a high VRAM requirement, I’ve run with Starpoint Gemini II. This game has massive VRAM requirements, and is quite a GPU heavy game.

I’ve chosen two other games for the AAA area, one very generalised game, and one that boasted massive 4GB VRAM requirements for general high res play. Far Cry 4 felt like a good representative for the AAA genre that has balance in both general performance of the CPU, GPU, and moderate VRAM requirements. Middle Earth: Shadow of Mordor was my choice for the AAA genre to slaughter my VRAM and hopefully put my GPU memory controller and VRAM to the test.
*****

1440p – Overall Correlations

I’ve started off with benchmarks running on 1440p to clearly identify what kind of GPU power is required for this resolution. I understand that the 112GB/s bandwidth we’re aiming for is designed to cope with 1080p, but hopefully you’ll see just what you need.

First off, we’ll take a look at all four games, and the performance of the GPU Core(%), Memory Controller Load(%), and VRAM Usage(MB). (The following data has been sorted by “Largest to Smallest” PCIe Bus Usage).






What I expected to see was the Memory Controller Load to be in direct correlation with VRAM usage. What we can clearly see here is that Memory Controller Load is in absolute correlation with the GPU Load. VRAM usage seems to make little difference to the way either performs except in edge cases.

Next up, we’ll look directly at the correlation between PCIe-Bus Usage(%) and VRAM usage(MB).






Besides the Insurgency graph, it appears that there is no direct correlation between the PCIe Bus and VRAM. I had to run these benchmarks multiple times, as I was a little confused that the PCIe Bus usage was always so low, or in some cases, idle.

Next let’s look at the overall correlation between Memory Controller Load (%) and the PCIe Bus usage (%)






You can see there’s literally no particular change in PCIe Bus usage overall. When the Memory Controller Load peaks, the data for the PCIe Bus shows no reaction to the change.

Finally let’s take a look at the individual Memory Bandwidth Usage (GB/s) figures overall. Note, these figures are not 100% accurate, and follow the 100% = 224GB/s rule.






We can see in most cases the Memory Bandwidth usage (GB/s) is actually extremely erratic over the period. Shadow of Mordor showed the only real case where the usage was relatively persistent throughout the benchmark. You’ll also probably notice that it hits a rather high figure at peak load.

Let’s look at what these figures equate to overall. For this I’ve used the 95th percentile rule to remove freak results from both the low and high end of the scale. Note, these figures indicate bandwidth with Maxwell compression methods (~30%) in mind.






We’ll see most of these figures are relatively high, though none manage to reach the limit of my 970’s 224GB/s bandwidth available at any time. The only exception is Starpoint Gemini II, which despite eating VRAM when available, didn’t appear to put much load on the Memory Controller. If we took the Memory Controller Load figure as a good representation of actual bandwidth usage, the 970 is never really in danger of being overwhelmed. We can clearly see however that the peak figures would be too much for a 960’s 112GB/s available bandwidth. If we ran by the average figures instead, the 960 could cope with a couple of the games, but it would still choke on the big titles during average gameplay. We can’t discount the peak figures though, so you’d certainly see issues at the 1440p resolution.

For the sake of estimation and sheer curiosity, here is what the estimated Memory Bandwidth Usage would be if Maxwell was exactly 30% efficient at compression, without the compression.






The 970 would still cope, except in peak cases during Shadow of Mordor, where the required bandwidth exceeds that of the available 224GB/s. Obviously all these figures are mere estimates, so the actual cases may vary in real world examples.

*****

1080p – Overall Correlations

These are the main benchmarks we’ll be looking at for our 112GB/s bandwidth limit on the 960. The card is aimed at this resolution, so hopefully we’ll see some post-Maxwell compression figures dropping us in that area.

Let’ take a look at the overall figures for this, and look for similarities between 1440p correlation (or lack of). The previous charts showed Memory Controller Load linked with GPU Load and not VRAM Usage.






This surprised me a little bit. If you look relatively closed at the peaks and drops, all three measurements appear to correlate rather well at this resolution. The VRAM drops actually appear to associate with the drops in Memory Controller Load as well as GPU Usage. Certainly an interesting turn of events.

Next let’s take a look at the PCIe Bus usage and VRAM. There were no direct correlations in the 1440p benchmarks.






This time things look a little more interesting, but unexplained. Far Cry 4 shows no real correlation at all. The rest of the games however seem to show a drop in PCIe Bus usage every time there’s a drop in VRAM usage, before the VRAM usage steadily rises before dropping again.

Next up is the Bus and Memory Controller figures.






This time again, no real correlation. A similar result to the 1440p benchmark. No unexpected surprises there.

Here are the figures you’re more interested in however. Let’s take a look at the overall Memory Controller Usage over the benchmarks. This should show us approximate (again inaccurately) how much bandwidth 1080p seems to scream for.






This time Shadow of Mordor follows suit and starts to become a little more erratic along with the rest. We can see some interesting peaks in usage, as well as a general idea of what the average is overall. The plateau at the beginning of Far Cry 4 is particularly interesting.

Next, here are those overall figures in a more pleasant representation. Here we can see exactly what the figures are. Again, using the 95th percentile rule for these results to remove the serious spikes, these results are not 100% accurate.






Shadow ofMordor slaughters all, even in the average benchmark. Far Cry 4 scrapes the barrel in the average figures, but again, the peak proves to be above the 112GB/s mark. The Source engine game as well as SPG2 however prove to be completely viable solutions.

Here’s what the results would look like without the estimated ~30% Maxwell compression.






Shadow of Mordor peaks within percentile points of the available bandwidth on a 770 (224GB/s), but all other games remain below to 200GB/s mark.

Conclusion

Something you have to bear in mind when looking at these figures (besides the fact they are most certainly not 100% accurate), is that it’s plausible memory bandwidth acts similar to VRAM. There are many occasions where people can see VRAM usages in an average game hit a certain mark, let’s say 1800MB on a 2GB card. Other people, running the same settings, but with a 4GB card may see usages above and beyond 2GB, almost as though the game is using the available VRAM simply because it can. Is it possible that games utilise memory bandwidth in a similar fashion? Possibly, but we don’t really know. It could be possible that the same benchmark, when run on a 770 which shares identical bandwidth with the 970 (224GB/s) may provide higher results due to the lack of compression, but prove to be less than the 30% assumption. Maybe the video card wouldn’t “stretch it’s legs” and would be more conservative with bandwidth usage if it had less available. It’d be an interesting benchmark to see.

If we treated these bandwidth figures as a reference (which you most certainly should not), we could then assume that the GTX 960’s 128bit wide memory interface simply does not provide enough bandwidth to play AAA titles at Very High (or High where not available) and Ultra Presets on 1080p. If we went by average figures, it would get by OK, but struggle at peak loads. In terms of Independent titles, along with Source engine games, it’d do just fine. It may be the case that at 1080p turning off a little eye candy would put the game within the 112GB/s limit and remove that bottleneck in AAA titles.

The main issue is that more and more AAA titles may follow the example of games like Shadow of Mordor and require more and more VRAM and eat up more bandwidth. If things plateau at that sort of figure, perhaps the 112GB/s would cope. In the event AAA titles became more advanced in their fidelity, the 960 might find itself quickly outpaced by rivals offering a more sensible bandwidth ceiling.

Finally, I’ll leave you again with the same bold statement, that the (GB/s) figures in these benchmarks are merely estimates of a largely inaccurate form of extrapolating memory bandwidth usage figures. By no means should you base a purchase on these, as the percentage representation of memory bandwidth is open to extremely broad interpretation.

If anyone would be so kind as to run a benchmark of these games on a 770 and send the log over to me, I can more accurately show bandwidth usage BEFORE Maxwell compression. I’d also be delighted to see user’s benchmarks on GTX 960’s to prove these estimates horribly wrong.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.65/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Score one for HBM? Maybe that's why AMD is bidding its time waiting for HBM to become marketable.
 

Mussels

Freshwater Moderator
Staff member
Joined
Oct 6, 2004
Messages
58,413 (8.21/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
this needs to be a proper front page article.


in my personal experience, too low a memory bus can cripple a card for sure - i've been hit with models in the past that had double the ram but half the bandwidth and their performance was miserable.
 

rtwjunkie

PC Gaming Enthusiast
Supporter
Joined
Jul 25, 2008
Messages
13,909 (2.43/day)
Location
Louisiana -Laissez les bons temps rouler!
System Name Bayou Phantom
Processor Core i7-8700k 4.4Ghz @ 1.18v
Motherboard ASRock Z390 Phantom Gaming 6
Cooling All air: 2x140mm Fractal exhaust; 3x 140mm Cougar Intake; Enermax T40F Black CPU cooler
Memory 2x 16GB Mushkin Redline DDR-4 3200
Video Card(s) EVGA RTX 2080 Ti Xc
Storage 1x 500 MX500 SSD; 2x 6TB WD Black; 1x 4TB WD Black; 1x400GB VelRptr; 1x 4TB WD Blue storage (eSATA)
Display(s) HP 27q 27" IPS @ 2560 x 1440
Case Fractal Design Define R4 Black w/Titanium front -windowed
Audio Device(s) Soundblaster Z
Power Supply Seasonic X-850
Mouse Coolermaster Sentinel III (large palm grip!)
Keyboard Logitech G610 Orion mechanical (Cherry Brown switches)
Software Windows 10 Pro 64-bit (Start10 & Fences 3.0 installed)
Wow, just wow! You outdid yourself. That's quile alot of work, and some interesting results.

It is just estimates, which you reiterate numerous times, of the 960's abilities. I tend to think NVIDIA's engineers knew what they were doing when they implemented a 128-bit bus, and thusly, that it probably will perform a bit better than your estimates at 1080p.

Probably it will only be able to do mostly High settings though, with Ultra out of the question, and Very High in alot of games only with AA and tesselation turned down. For the vast majority of average gamers out there who just buy a mid-range card every year, I bet it will be good enough.
 

Mussels

Freshwater Moderator
Staff member
Joined
Oct 6, 2004
Messages
58,413 (8.21/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
a couple of the images seem confusing - text is the same, but different results.

 
Joined
Apr 19, 2012
Messages
12,062 (2.77/day)
Location
Gypsyland, UK
System Name HP Omen 17
Processor i7 7700HQ
Memory 16GB 2400Mhz DDR4
Video Card(s) GTX 1060
Storage Samsung SM961 256GB + HGST 1TB
Display(s) 1080p IPS G-SYNC 75Hz
Audio Device(s) Bang & Olufsen
Power Supply 230W
Mouse Roccat Kone XTD+
Software Win 10 Pro
Score one for HBM? Maybe that's why AMD is bidding its time waiting for HBM to become marketable.
If my results are correct (which they aren't), I think NVidia has put too much hope in Maxwell compression. There are events at which Maxwell compression goes beyond 30%, but in contrast, there are occasions when it is less than 30%
this needs to be a proper front page article.
Not my call, and it's not 100% accurate information, merely educated extrapolation. W1zzard could have done this himself quite easily, but he'd have to give up a few days to get it done (probably far prettier than I have done too)
It is just estimates, which you reiterate numerous times, of the 960's abilities. I tend to think NVIDIA's engineers knew what they were doing when they implemented a 128-bit bus, and thusly, that it probably will perform a bit better than your estimates at 1080p.
Yeah I wanted to reiterate that, because they are not accurate. NVidia said the Bus monitoring was not accurate, and W1zzard explained how memory controller load was by proxy memory bandwidth usage, but not a 1:1 represenation.
a couple of the images seem confusing - text is the same, but different results.

Those are the results for each game in order. I forgot to Title each graph to each game.
All graphs in order are
Far Cry 4
Insurgency
Shadow of Mordor
SPG2

Let me eat and I'll reup the images with titles in each case I've missed a game title.
 
Last edited:

Mussels

Freshwater Moderator
Staff member
Joined
Oct 6, 2004
Messages
58,413 (8.21/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
yeah all it needs is the titles to make sense.
 
Joined
Dec 14, 2006
Messages
536 (0.08/day)
System Name Ed-PC
Processor Intel i5-12600k
Motherboard Asus TUF Z690 PLUS Wifi D4
Cooling Noctua NH-14S
Memory Crucial Ballistix DDR4 C16@3600 16GB
Video Card(s) Nvidia MSI 970
Storage Samsung 980, 860evo
Case Lian Li Lancool II mesh Perf
Audio Device(s) onboard
Power Supply Corsair RM750x
Software Win10 Pro 64bit
Score one for HBM? Maybe that's why AMD is bidding its time waiting for HBM to become marketable.
Huh? , this shows the oposite of what would expect .
While all manufactures are going to go to 3d ram , it seems for mid range right now you don't need gobs of BW yet .
At least on current Nvidia cards as they don't go above 368 bus (GM2xx) .
That said I was thinking a 192bus for 960 would of been better, maybe the 960ti will be that .
 
Joined
Apr 19, 2012
Messages
12,062 (2.77/day)
Location
Gypsyland, UK
System Name HP Omen 17
Processor i7 7700HQ
Memory 16GB 2400Mhz DDR4
Video Card(s) GTX 1060
Storage Samsung SM961 256GB + HGST 1TB
Display(s) 1080p IPS G-SYNC 75Hz
Audio Device(s) Bang & Olufsen
Power Supply 230W
Mouse Roccat Kone XTD+
Software Win 10 Pro
Joined
Jan 2, 2015
Messages
1,099 (0.33/day)
Processor FX6350@4.2ghz-i54670k@4ghz
Video Card(s) HD7850-R9290
amd:nutkick:nvidia

amd powers the game systems and proves the worth of a architecture years old scaling from entry level to high end and nvidia can't boast any real performance improvement on a brand new architecture outside of efficiency
 
Last edited:

Tatty_Two

Gone Fishing
Joined
Jan 18, 2006
Messages
25,793 (3.88/day)
Location
Worcestershire, UK
Processor Rocket Lake Core i5 11600K @ 5 Ghz with PL tweaks
Motherboard MSI MAG Z490 TOMAHAWK
Cooling Thermalright Peerless Assassin 120SE + 4 Phanteks 140mm case fans
Memory 32GB (4 x 8GB SR) Patriot Viper Steel 4133Mhz DDR4 @ 3600Mhz CL14@1.45v Gear 1
Video Card(s) Asus Dual RTX 4070 OC
Storage WD Blue SN550 1TB M.2 NVME//Crucial MX500 500GB SSD (OS)
Display(s) AOC Q2781PQ 27 inch Ultra Slim 2560 x 1440 IPS
Case Phanteks Enthoo Pro M Windowed - Gunmetal
Audio Device(s) Onboard Realtek ALC1200/SPDIF to Sony AVR @ 5.1
Power Supply Seasonic CORE GM650w Gold Semi modular
Mouse Coolermaster Storm Octane wired
Keyboard Element Gaming Carbon Mk2 Tournament Mech
Software Win 10 Home x64
Joined
Apr 29, 2014
Messages
4,179 (1.15/day)
Location
Texas
System Name SnowFire / The Reinforcer
Processor i7 10700K 5.1ghz (24/7) / 2x Xeon E52650v2
Motherboard Asus Strix Z490 / Dell Dual Socket (R720)
Cooling RX 360mm + 140mm Custom Loop / Dell Stock
Memory Corsair RGB 16gb DDR4 3000 CL 16 / DDR3 128gb 16 x 8gb
Video Card(s) GTX Titan XP (2025mhz) / Asus GTX 950 (No Power Connector)
Storage Samsung 970 1tb NVME and 2tb HDD x4 RAID 5 / 300gb x8 RAID 5
Display(s) Acer XG270HU, Samsung G7 Odyssey (1440p 240hz)
Case Thermaltake Cube / Dell Poweredge R720 Rack Mount Case
Audio Device(s) Realtec ALC1150 (On board)
Power Supply Rosewill Lightning 1300Watt / Dell Stock 750 / Brick
Mouse Logitech G5
Keyboard Logitech G19S
Software Windows 11 Pro / Windows Server 2016
Very nice article!!! Its good to have figures like this to at least help alleviate alot of the theoreticals and "Ifs" surrounding memory bandwidth, memory usage, etc. Though I am a bit shocked by some of the results as I did not expect it to be so demanding at 1080p though I guess its safe to say this is thanks to new games and the ever changing realm with higher graphics and fidelity.

Nice article!
 
Joined
Dec 29, 2014
Messages
861 (0.25/day)
Wow, thanks for doing that! Lots of work!

I have a question about the protocol.... this is a 970, correct? And you are measuring memory controller load with the 970 running full tilt at 1440p and 1080p?

The first thing that occurs to me is that the 960 will run at slower framerates than the 970, and not because the bus is limiting it... all the specs are reduced. What you've shown is that the 970 would be memory bus limited if it was cut in half, but since the 960 will be running slower fps anyway, it might not have this issue. As a rough guess we could scale it by shaders and say we'd expect the 960 to run ~1024/1664 or 62% of the 970. I'd expect the memory bandwidth requirement to scale similarly.
 
Joined
Apr 19, 2012
Messages
12,062 (2.77/day)
Location
Gypsyland, UK
System Name HP Omen 17
Processor i7 7700HQ
Memory 16GB 2400Mhz DDR4
Video Card(s) GTX 1060
Storage Samsung SM961 256GB + HGST 1TB
Display(s) 1080p IPS G-SYNC 75Hz
Audio Device(s) Bang & Olufsen
Power Supply 230W
Mouse Roccat Kone XTD+
Software Win 10 Pro
Wow, thanks for doing that! Lots of work!

I have a question about the protocol.... this is a 970, correct? And you are measuring memory controller load with the 970 running full tilt at 1440p and 1080p?

The first thing that occurs to me is that the 960 will run at slower framerates than the 970, and not because the bus is limiting it... all the specs are reduced. What you've shown is that the 970 would be memory bus limited if it was cut in half, but since the 960 will be running slower fps anyway, it might not have this issue. As a rough guess we could scale it by shaders and say we'd expect the 960 to run ~1024/1664 or 62% of the 970. I'd expect the memory bandwidth requirement to scale similarly.

You are wholly correct. It's all done on a 970, and judging by the fact I discovered that memory controller load is directly correlated with GPU load, we can assume that the lower the maximum GPU load, the lower the memory bandwidth will be. That's a wild guess on my part, and in reality could be hugely wrong.
It's one of the many reasons I wanted to test a 770, as it shares a 970's 224GB/s bandwidth, but obviously has less horsepower for a backbone. It would not only show the true difference between Maxwell compression, but also the effect a lower powered GPU load has on bandwidth.
 
Joined
Jun 20, 2007
Messages
3,937 (0.64/day)
System Name Widow
Processor Ryzen 7600x
Motherboard AsRock B650 HDVM.2
Cooling CPU : Corsair Hydro XC7 }{ GPU: EK FC 1080 via Magicool 360 III PRO > Photon 170 (D5)
Memory 32GB Gskill Flare X5
Video Card(s) GTX 1080 TI
Storage Samsung 9series NVM 2TB and Rust
Display(s) Predator X34P/Tempest X270OC @ 120hz / LG W3000h
Case Fractal Define S [Antec Skeleton hanging in hall of fame]
Audio Device(s) Asus Xonar Xense with AKG K612 cans on Monacor SA-100
Power Supply Seasonic X-850
Mouse Razer Naga 2014
Software Windows 11 Pro
Benchmark Scores FFXIV ARR Benchmark 12,883 on i7 2600k 15,098 on AM5 7600x
Lovely write up though I didn't find the conclusion very..conclusive other than that too little bandwidth = problematic for performance.
Was that ever in question?

What I find more difficult to grasp is how important the speeds of GPU memory is. Often I find little real world gain from even significant over clocks except in acute situations.
 
Joined
Dec 31, 2009
Messages
19,366 (3.72/day)
Benchmark Scores Faster than yours... I'd bet on it. :)
This was a lot of work i am sure. Thanks for bringing it up.

Its nice to see something, and I use this term loosely as you do essentially, 'concrete' on the issue. I though, like newconroer, find this 'proves' what people know already (but could never put their finger on it). I just wish we could have concrete numbers to base the data off of. Its a logical leap, but lord knows without actual/factual data to start with, if it extrapolates out to fact.

People just need to know that, regardless of the bandwidth, what the FPS say is what you will get regardless. Another way to put it, I have the same 4 cars with different motors and they all run 12s 1/4 mile... one does it N/A, one boosted with a snail, the other a screw, and the other a rotary. It doesn't matter how it gets there, just that it does. :)
 
Joined
Jan 2, 2015
Messages
1,099 (0.33/day)
Processor FX6350@4.2ghz-i54670k@4ghz
Video Card(s) HD7850-R9290
so how does the compression work anyway? is it hardware limited to 30 percent or could they improve it with drivers?
 
Joined
Dec 29, 2014
Messages
861 (0.25/day)
I discovered that memory controller load is directly correlated with GPU load

That's a key finding right there. In that case you seem to have proved that the 960's 128bit bus will be fine at 1080p, and nearly always at 1440p. Doesn't mean it is a great card or anything, but that the 128bit bus won't be slowing it down, but rather the processor.

The big question I have, is can you say the same for the 2GB of vram? Would that scale with GPU load as well? And is there any way to tell how much vram is really needed (vs allocated) without testing identical cards with different amounts of vram?

You'll want to see this. Says the 960 sucks because of its 128bit bus, and at 4k it gets creamed by an R9 280. http://wccftech.com/nvidia-geforce-gtx-960-radeon-r9-280-4k-benchmarks/




Their conclusion that it will also suffer at 1080p doesn't make sense to me.
 
Joined
Feb 14, 2012
Messages
2,304 (0.52/day)
System Name msdos
Processor 8086
Motherboard mainboard
Cooling passive
Memory 640KB + 384KB extended
Video Card(s) EGA
Storage 5.25"
Display(s) 80x25
Case plastic
Audio Device(s) modchip
Power Supply 45 watts
Mouse serial
Keyboard yes
Software disk commander
Benchmark Scores still running
What I expected to see was the Memory Controller Load to be in direct correlation with VRAM usage.

I would expect MCL to be in direct correlation with cache eviction rate regardless of vram usage.

Also, why is it surprising that MCL increases with GPU load for typical usage?
 
Joined
Jan 2, 2015
Messages
1,099 (0.33/day)
Processor FX6350@4.2ghz-i54670k@4ghz
Video Card(s) HD7850-R9290
that is a ridiculous article. nothing at all is valid about it. no test setup listed. no multiple graphs at different settings and resolutions. not to mention 1 gpu is not enough for 4k and neither is 4gb depending on the game..
 
Joined
Dec 29, 2014
Messages
861 (0.25/day)
that is a ridiculous article.

Yep, shamefully weak. Even if the data is 100% real, conjuring an unrealistic situation where the 960 would suck just so you can knock it is... well, not very objective.

How many people will be 4k gaming with a 960 or R9 280? Who cares which one sucks a little less at that res? The proof will be what happens at 1080p.
 
Last edited:
Joined
Sep 7, 2011
Messages
2,785 (0.61/day)
Location
New Zealand
System Name MoneySink
Processor 2600K @ 4.8
Motherboard P8Z77-V
Cooling AC NexXxos XT45 360, RayStorm, D5T+XSPC tank, Tygon R-3603, Bitspower
Memory 16GB Crucial Ballistix DDR3-1600C8
Video Card(s) GTX 780 SLI (EVGA SC ACX + Giga GHz Ed.)
Storage Kingston HyperX SSD (128) OS, WD RE4 (1TB), RE2 (1TB), Cav. Black (2 x 500GB), Red (4TB)
Display(s) Achieva Shimian QH270-IPSMS (2560x1440) S-IPS
Case NZXT Switch 810
Audio Device(s) onboard Realtek yawn edition
Power Supply Seasonic X-1050
Software Win8.1 Pro
Benchmark Scores 3.5 litres of Pale Ale in 18 minutes.
so how does the compression work anyway? is it hardware limited to 30 percent or could they improve it with drivers?
Not all data is compressible by the same ratio, or at all in some cases. You can find out more info from the Maxwell white paper (PDF pages 10-11)
The salient points are:



@RCoon
Thanks for the time and effort. Having done a few articles myself, I can appreciate how a concept quickly morphs into leviathan proportions that you possibly didn't originally imagine.

EDIT:
How many people will be 4k gaming with a 960 or R9 280?
Hey, you haven't lived (and nor will you) until you've played a FPS at 4K with a mainstream card.
 
Last edited:
Joined
Jul 18, 2007
Messages
2,693 (0.44/day)
System Name panda
Processor 6700k
Motherboard sabertooth s
Cooling raystorm block<black ice stealth 240 rad<ek dcc 18w 140 xres
Memory 32gb ripjaw v
Video Card(s) 290x gamer<ntzx g10<antec 920
Storage 950 pro 250gb boot 850 evo pr0n
Display(s) QX2710LED@110hz lg 27ud68p
Case 540 Air
Audio Device(s) nope
Power Supply 750w superflower
Mouse g502
Keyboard shine 3 with grey, black and red caps
Software win 10
Benchmark Scores http://hwbot.org/user/marsey99/
that is a ridiculous article. nothing at all is valid about it. no test setup listed. no multiple graphs at different settings and resolutions. not to mention 1 gpu is not enough for 4k and neither is 4gb depending on the game..

did you not read the article linked to?

Here is the test setup used:

  • Intel Core i7-3960X
  • MSI X79A-GD65
  • AMD Radeon R9 280 (Stock/Reference)
  • Geforce GTX 960 (Stock/Reference)
  • Windows 8.1
  • Catalyst OMEGA Drivers
  • Nvidia 347.13 Drivers
The performance is given in percentages, with the GTX 960 as the base unit (100%) for the relative scale. Now here is the thing; lets face it, most of the people buying a GTX 960 or an R9 280 are not going to be gaming at 4K. So the extraordinary difference in performance here is meant to show you only one thing: that the bus width problem is very much real. While its going to be nowhere near as defined on 1080p, it will remain a problem. The fact is no amount of software can overcome lack of hardware.

great work rcoon!

if you ever get bored, or have a few nights of insomnia i would love to know what kinda of figures something like catzilla at high res uses as i think it might be more in line with lotr than source.

but as others have said the mc will work in tandem with the core more than the vram usage as the vram is only really filled or emptied, past that all the mc does is serve data from the vram to the core as its needs it (read when its under load).
 
Joined
Jan 2, 2015
Messages
1,099 (0.33/day)
Processor FX6350@4.2ghz-i54670k@4ghz
Video Card(s) HD7850-R9290
yes i read all of it and it barely even passes as a test setup list (to be honest i was just so baffled by the whole article when i typed that)

look at the chart itself.. a 960 is relative to 100 percent performance at 4k :wtf:

if anything they proved that you will certainly hit a vram wall with only 2gb at 4k and is a no brainier so they should have put the the 960 against the 285

im not really a fan of shrinking bus width like this thus far but to just try and bash it in this way is just silly

@HumanSmoke thanks for sharing
 
Last edited:
Joined
Apr 30, 2012
Messages
3,881 (0.89/day)
Nevermind I think I'm disoriented watching the SOTUA
 
Last edited:
Top