Tuesday, January 27th 2015

GTX 970 Memory Drama: Plot Thickens, NVIDIA has to Revise Specs

It looks like NVIDIA's first response to the GeForce GTX 970 memory allocation controversy clearly came from engineers who were pulled out of their weekend plans, and hence was too ambiguously technical (even for us). It's only on Monday that NVIDIA PR swung into action, offering a more user-friendly explanation on what the GTX 970 issue is, and how exactly did they carve the GM204 up, when creating the card.

According to an Anandtech report, which cites that easy explanation from NVIDIA, the company was not truthful about specs of GTX 970, at launch. For example, the non-public document NVIDIA gave out to reviewers (which gives them detailed tech-specs), had clearly mentioned ROP count of the GTX 970 to be 64. Reviewers used that count in their reviews. TechPowerUp GPU-Z shows ROP count as reported by the driver, but it has no way of telling just how many of those "enabled" ROPs are "active." The media reviewing the card were hence led to believe that the GTX 970 was carved out by simply disabling three out of sixteen streaming multiprocessors (SMMs), the basic indivisible subunits of the GM204 chip, with no mention of other components like the ROP count, and L2 cache amount being changed from the GTX 980 (a full-fledged implementation of this silicon).
NVIDIA explained to Anandtech that there was a communication-gap between the engineers (the people who designed the GTX 970 ASIC), and the technical marketing team (the people who write the Reviewer's Guide document, and draw the block-diagram). This team was unaware that with "Maxwell," you could segment components previously thought indivisible, or that you could "partial disable" components.

It turns out that in addition to three SMX units being disabled (resulting in 1,664 CUDA cores), NVIDIA reduced the L2 cache (last-level cache) on this chip to 1.75 MB, down from 2 MB, and also disabled a few ROPs. The ROP count is effectively 56, and not 64. The last 8 ROPs aren't "disabled." They're active, but not used, because their connection to the crossbar is too slow (we'll get to that in a bit). The L2 cache is a key component of the "crossbar." Think of the crossbar as a town-square for the GPU, where the various components of the GPU talk to each other by leaving and picking-up data labeled with "from" and "to" addresses. The crossbar routes data between the four Graphics Processing Clusters (GPCs), and the eight memory controllers of 64-bit bus width each (which together make up its 256-bit wide memory interface), and is cushioned by the L2 cache.

The L2 cache itself is segmented, and isn't a monolithic slab of SRAM. Each of the eight memory controllers on the GM204 is ideally tied to its segment of the L2 cache. Also tied to these segments are segments of ROPs. With NVIDIA reducing the L2 cache amount by disabling one such segment. Its component memory controller is instead rerouted to the cache segment of a neighbouring memory controller. Access to the crossbar for that memory controller is hence slower. To make sure there are no issues caused to the interleaving of these memory controllers, adding up to the big memory amount figure that the driver can address, NVIDIA partitioned the 4 GB of memory to two segments. The first is 3.5 GB large, and is made up of memory controllers with access to their own segments of the L2; the second segment is 512 MB in size, and is tied to that memory controller which is rerouted.

The way this partitioning works, is that the 3.5 GB partition can't be read while the 512 MB one is being read. Only to an app that's actively using the entire 4 GB of memory, there will be a drop in performance, because the two segments aren't being read at the same time. The GPU is either addressing the 3.5 GB segment, or the 512 MB one. Hence, there's a drop in performance to be expected, again, for apps that use up the entire 4 GB of memory.

While it's technically correct that the GTX 970 has a 256-bit wide memory interface, and given its 7.00 GHz (GDDR5-effective) memory clock, that translates to 224 GB/s of bandwidth on paper, not all of that memory is uniformly fast. You have 3.5 GB of it having normal access to the crossbar (the town-square of the GPU), and 512 MB of it having slower access. Therefore, the 3.5 GB segment really just has 196 GB/s of memory bandwidth (7.00 GHz x 7 ways to reach the crossbar x 32-bit width per chip), which can be said with certainty. Nor can we say how this segment affects the performance of the memory controller whose crossbar port it's using, if the card is using its full 4 GB. We can't tell how fast the 512 MB second segment really is. But it's impossible for the second segment to make up 28 GB/s (of the 224 GB/s), since NVIDIA itself claims this segment is running slower. Therefore NVIDIA's claims of GTX 970 memory bandwidth being 224 GB/s at reference clocks is inaccurate.

Why NVIDIA chose to reduce cache size and ROP count will remain a mystery. We can't imagine that the people designing the chip will not have sufficiently communicated this to the driver and technical marketing teams. To claim that technical marketing didn't get this the first time around, seems like a hard-sell. We're pretty sure that NVIDIA engineers read reviews, and if they saw "64 ROPs" on a first-page table, they would have reported it up the food-chain at NVIDIA. An explanation about this hardware change should have taken up an entire page in the technical documents the first time around, and NVIDIA could have saved itself a lot of explanation, much of it through the press.
Source: Anandtech
Add your own comment

138 Comments on GTX 970 Memory Drama: Plot Thickens, NVIDIA has to Revise Specs

#126
TRWOV
You know, this will make Matthew's work harder. I don't know if the GPU database would be able to show all this info in a easy to understand format. Maybe change 256bit to "224+32bit"? Memory size to "3.5GB+0.5GB"? Maybe add notes? :confused: Actually even 224+32bit is wrong since nVidia said that the two partitions can't be accessed at the same time so bus width is either 224bit or 32bit depending on the accessed partition. Maybe 224bit/32bit??? :confused:
Posted on Reply
#127
Parn
HumanSmokeIt also doesn't help that AMD promote the "underdog" persona - plenty of people root for the underdog, but a much larger percentage of people want to allied with the dominant brand.
Totally agree with that. This underdog strategy is really hurting the profit margins for AMD.

When I said AMD was less competitive recently, it was more from a power efficiency and cost point of view. Performance wise 290X is only marginally worse than 980, but because of the poor efficiency and higher manufacturering cost (~ 20% larger die size) the 290X has to be sold with a far smaller margin than 980.

I fully expect the new 380X to be faster than 980. However what's more important for AMD is to find a way to bring the die size (cost) and power consumption down to a level which is closer or better than the Maxwells. If not, then eventually the new flagship will be again priced as a mid-range product once 980Ti is out and that's not good for the profit margin.
Posted on Reply
#128
Warology
I bought 2 970's sli cant complain they were working awesome but I returned them since I was on day 13 of 14 for best buys return limit and got a 980 along with a small cash refund. Just didn't like what people were saying about higher res gaming and the being lied to part kind of sucks. But still the 970 is a awesome card really no reason to feel bad for buying 1 or 2 of these you people all got really good video cards but yes the lying part just gives it a bad taste all around. Just got lucky I was able to return them with no questions asked threw best buy and that they have both cards in stock at there stores. I was having small issues with older games running sli which is normal but for me sli is overkill and 1 card just runs a bit smoother. But lol use best buy to test sli and different model cards they take anything! back and yes I over clocked the F#$ing shit out of the 2 sli 970s before I returned them :) pretty good benchmarks and they took it like a champ.
Posted on Reply
#130
W1zzard
TRWOVYou know, this will make Matthew's work harder. I don't know if the GPU database would be able to show all this info in a easy to understand format. Maybe change 256bit to "224+32bit"? Memory size to "3.5GB+0.5GB"? Maybe add notes? :confused: Actually even 224+32bit is wrong since nVidia said that the two partitions can't be accessed at the same time so bus width is either 224bit or 32bit depending on the accessed partition. Maybe 224bit/32bit??? :confused:
We have no plans to expand the gpudb in such a way (I do all the coding). There is a text field for comments which could be suited for that kind of info though.
Posted on Reply
#131
vega22
suit has been filed in cali.

never saw that coming...
Posted on Reply
#133
64K
XzibitPCWorld - Nvidia hit with false advertising suit over GTX 970 performance

KitGuru - Nvidia slammed with class-action lawsuit over GeForce GTX 970 specifications

IncGamers - Nvidia faces lawsuit over GTX 970 specification claims


You can read the lawsuit at Scribd here.
Lawyers will suck up most of the money in legal fees and this could be very expensive for Nvidia if the judge allows it to be a class action suit on behalf of the entire USA. It would have been cheaper to allow 970 owners the option of returning for a refund or giving a partial rebate in exchange for promising not to sue if this does turn into a class action.
Posted on Reply
#134
Xzibit
3 lawfirms aren't going after chump change that's for sure. The one that was investigating isn't listed. They might jump on this one or file a second one in another state.
Posted on Reply
#136
TheoneandonlyMrK
HumanSmokeKitguru...should be Kitnoob

Basic failure in understanding the architecture


[Source]
Easily done apparently, NVIDIA's marketing department did worse.

on that note are we to believe that a new specially developed option(linked IMC port(optional post bin)) put into a chips design specifically to up yields and create more profit(or less scraped parts essentially)
within a new generation arch, was Not talked about in presentation's to the company's own staff, way before release etc,,

, very very feckin dubious,,, all of it reek's of incompetence or fraud (via the see no evil pact) either way I would never buy anything on a whim from nvidia again, and i was going to (ie 69$ tegra 3 7'' pad, Was going to be mine) ill look elsewhere first now as will a few others.
Posted on Reply
#137
HumanSmoke
theoneandonlymrkEasily done apparently, NVIDIA's marketing department did worse.
+1 wry observation ;)
Still setting the bar pretty low if an "enthusiast" sites knowledge base is deemed acceptable because it rivals that of a marketing/PR Dept.
theoneandonlymrkvery very feckin dubious,,, all of it reek's of incompetence or fraud (via the see no evil pact)
:p There's a actually a low-cost solution staring everyone in the face. Why not just give those affected a R9 290 or 290X? AMD's board partners(many of whom sell Nvidia cards) have AMD inventory backlogthanks to the channel stuffing Rory instituted. With AMD's designs not exactly flying of the shelves, why not cut a deal? :p
theoneandonlymrkI would never buy anything on a whim from nvidia again, and i was going to (ie 69$ tegra 3 7'' pad, Was going to be mine) ill look elsewhere first now as will a few others.
Happy shopping. The only thing at this stage with a Tegra inside that interests me is an Audi.
Posted on Reply
Add your own comment
May 8th, 2024 05:04 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts