Tuesday, January 27th 2015

GTX 970 Memory Drama: Plot Thickens, NVIDIA has to Revise Specs

It looks like NVIDIA's first response to the GeForce GTX 970 memory allocation controversy clearly came from engineers who were pulled out of their weekend plans, and hence was too ambiguously technical (even for us). It's only on Monday that NVIDIA PR swung into action, offering a more user-friendly explanation on what the GTX 970 issue is, and how exactly did they carve the GM204 up, when creating the card.

According to an Anandtech report, which cites that easy explanation from NVIDIA, the company was not truthful about specs of GTX 970, at launch. For example, the non-public document NVIDIA gave out to reviewers (which gives them detailed tech-specs), had clearly mentioned ROP count of the GTX 970 to be 64. Reviewers used that count in their reviews. TechPowerUp GPU-Z shows ROP count as reported by the driver, but it has no way of telling just how many of those "enabled" ROPs are "active." The media reviewing the card were hence led to believe that the GTX 970 was carved out by simply disabling three out of sixteen streaming multiprocessors (SMMs), the basic indivisible subunits of the GM204 chip, with no mention of other components like the ROP count, and L2 cache amount being changed from the GTX 980 (a full-fledged implementation of this silicon).
NVIDIA explained to Anandtech that there was a communication-gap between the engineers (the people who designed the GTX 970 ASIC), and the technical marketing team (the people who write the Reviewer's Guide document, and draw the block-diagram). This team was unaware that with "Maxwell," you could segment components previously thought indivisible, or that you could "partial disable" components.

It turns out that in addition to three SMX units being disabled (resulting in 1,664 CUDA cores), NVIDIA reduced the L2 cache (last-level cache) on this chip to 1.75 MB, down from 2 MB, and also disabled a few ROPs. The ROP count is effectively 56, and not 64. The last 8 ROPs aren't "disabled." They're active, but not used, because their connection to the crossbar is too slow (we'll get to that in a bit). The L2 cache is a key component of the "crossbar." Think of the crossbar as a town-square for the GPU, where the various components of the GPU talk to each other by leaving and picking-up data labeled with "from" and "to" addresses. The crossbar routes data between the four Graphics Processing Clusters (GPCs), and the eight memory controllers of 64-bit bus width each (which together make up its 256-bit wide memory interface), and is cushioned by the L2 cache.

The L2 cache itself is segmented, and isn't a monolithic slab of SRAM. Each of the eight memory controllers on the GM204 is ideally tied to its segment of the L2 cache. Also tied to these segments are segments of ROPs. With NVIDIA reducing the L2 cache amount by disabling one such segment. Its component memory controller is instead rerouted to the cache segment of a neighbouring memory controller. Access to the crossbar for that memory controller is hence slower. To make sure there are no issues caused to the interleaving of these memory controllers, adding up to the big memory amount figure that the driver can address, NVIDIA partitioned the 4 GB of memory to two segments. The first is 3.5 GB large, and is made up of memory controllers with access to their own segments of the L2; the second segment is 512 MB in size, and is tied to that memory controller which is rerouted.

The way this partitioning works, is that the 3.5 GB partition can't be read while the 512 MB one is being read. Only to an app that's actively using the entire 4 GB of memory, there will be a drop in performance, because the two segments aren't being read at the same time. The GPU is either addressing the 3.5 GB segment, or the 512 MB one. Hence, there's a drop in performance to be expected, again, for apps that use up the entire 4 GB of memory.

While it's technically correct that the GTX 970 has a 256-bit wide memory interface, and given its 7.00 GHz (GDDR5-effective) memory clock, that translates to 224 GB/s of bandwidth on paper, not all of that memory is uniformly fast. You have 3.5 GB of it having normal access to the crossbar (the town-square of the GPU), and 512 MB of it having slower access. Therefore, the 3.5 GB segment really just has 196 GB/s of memory bandwidth (7.00 GHz x 7 ways to reach the crossbar x 32-bit width per chip), which can be said with certainty. Nor can we say how this segment affects the performance of the memory controller whose crossbar port it's using, if the card is using its full 4 GB. We can't tell how fast the 512 MB second segment really is. But it's impossible for the second segment to make up 28 GB/s (of the 224 GB/s), since NVIDIA itself claims this segment is running slower. Therefore NVIDIA's claims of GTX 970 memory bandwidth being 224 GB/s at reference clocks is inaccurate.

Why NVIDIA chose to reduce cache size and ROP count will remain a mystery. We can't imagine that the people designing the chip will not have sufficiently communicated this to the driver and technical marketing teams. To claim that technical marketing didn't get this the first time around, seems like a hard-sell. We're pretty sure that NVIDIA engineers read reviews, and if they saw "64 ROPs" on a first-page table, they would have reported it up the food-chain at NVIDIA. An explanation about this hardware change should have taken up an entire page in the technical documents the first time around, and NVIDIA could have saved itself a lot of explanation, much of it through the press. Source: Anandtech
Add your own comment

138 Comments on GTX 970 Memory Drama: Plot Thickens, NVIDIA has to Revise Specs

#1
RCoon
Gaming Moderator
All the benchmarks in all the reviews are still accurate of course, so everything about how it performs in games at various resolutions is still true.

But NVidia has basically lied about hardware specifications. I don't believe for a second this was all one big mistake of somebody not saying to marketing that the card did not in fact have 64 ROPs and 224GB/s bandwidth. By all accounts it's pretty crappy business practice, and they should be punished accordingly.

That being said. I still like my 3.5GB 970 for the price I got it at.
Posted on Reply
#2
Selene
RCoon said:
All the benchmarks in all the reviews are still accurate of course, so everything about how it performs in games at various resolutions is still true.

But NVidia has basically lied about hardware specifications. I don't believe for a second this was all one big mistake of somebody not saying to marketing that the card did not in fact have 64 ROPs and 224GB/s bandwidth. By all accounts it's pretty crappy business practice, and they should be punished accordingly.

That being said. I still like my 3.5GB 970 for the price I got it at.
This is no different than the dual GPU cards, they physically have double the memory but only half is usable. This changes nothing, the card does have 4gb, and as you said all the benchmarks are still the same. I dont agree with them doing this and not telling people but if you got the card based on reviews and benchmarks you got what you paid for.

The truth is once the card gets a to a rez where 4gb would even be worth having the GPU cant handle it and it would make maybe 1-2fps difference at best, its been show time and time again, 256bit bus really can only handle 2gb.
Posted on Reply
#3
64K
Good points especially this one

btarunr said:
We're pretty sure that NVIDIA engineers read reviews, and if they saw "64 ROPs" on a first-page table, they would have reported it up the food-chain at NVIDIA.
Nvidia needs to do something to make this right with people who already bought a GTX 970 before the truth came out. I haven't run into any problems with my 970 but I would like a partial refund. Maybe a $50 Newegg gift card.

If this customer backlash gains traction it could result is a class action lawsuit. I'm certain with a market cap of 11 billion it would attract a top gun law firm to handle it.
Posted on Reply
#4
xkche
because 4GB is more fun that 3.5GB
Posted on Reply
#5
v12dock
Class action lawsuit?
Posted on Reply
#6
64K
v12dock said:
Class action lawsuit?
Possibly. Here in the USA we sue each other for anything and everything. It would be far cheaper for Nvidia to issue a partial refund.
Posted on Reply
#7
DeathtoGnomes
Sounds like nvidia might have been planning future sales of "new cards" with a rebrand of this card by unlocking more "stuff" later on. They just got caught doing it, some might call it cheating.
Posted on Reply
#9
pr0fessor
Like 4 GB is needed for today's games and 3.5 GB is not enough. The most customers won't notice any difference. A dirty feeling about NVIDIA hardware what so ever arrives. Quality is something else.
Posted on Reply
#10
ShurikN
"The way it's meant to be played"
Posted on Reply
#11
Ferrum Master
False ad is an false ad. And it is not as it should be, that's it, it is a lie.

It ain't a question about speed, but common sense. nVidia played faul.

For example AMD got TLB bug on K10, they force fixed it via kernel patch to all, despite it caused BSOD in very few specialized tasks, but yet they played clean.

Intel also plays clean and errata documents are available on Intel site and describes what stepping did correct for each CPU, thus kernels are patched and aware and disable many features, mostly virtualization sub features those are often broken. As all consumer semiconductor makers do with their device data sheet, it's been since 1950ties.

This time it is more than fishy. They actually intended to make such obscure design, they could save more on that one single memory chip, as it really give 2-5% max performance delta as they say. Doing that just for marketing and for the sake of round 4GB number?(noob user actually thinks it is the main criteria, OK yes) And spoofing the ROP count, just why(noob user doesn't know what it is). Gosh this is low... I am actually disappointed having nvidia cards in the past then a bit.

Although... everyone remembers the FX series bloop with broken DX9c shaders, they also acted a bit the same, there was a recall for them?

Well I guess it will bring up more AMD users, as they need dough really bad in order to keep them alive and maintain the competition. As the green camp is getting funny ideas and their marketing team smoking way too green stuff.
Posted on Reply
#12
bpgt64
pr0fessor said:
Like 4 GB is needed for today's games and 3.5 GB is not enough. The most customers won't notice any difference. A dirty feeling about NVIDIA hardware what so ever arrives. Quality is something else.
You are correct, it's likely not needed, or required. However it is still a lie, or atleast worst a negligent mistake. The 970 represented the possibility for you to not need to invest 1k+ US to attain a very playable experience at 4k@60 fps without AA and such turned off, at 600 US. So I would assume people who bought two of these were planning on either that resolution or 1440p @120 fps(arguably just as demanding). They saw this as opportunity to achieve that, where the ram is relevant.

Ironically, I downgraded from a pair of Titans to a pair of 980s(side grade? iono). For a simple reason, Shadowplay works at 4k@60FPS. It does not with AMD, and it is not possible using fraps, dxtory, or the like(see my vraps, 4x256gb SSD raid 0(lsi 9266-4i). I can record, and upload 4k video's now that look good.
Posted on Reply
#13
GhostRyder
64K said:
Possibly. Here in the USA we sue each other for anything and everything. It would be far cheaper for Nvidia to issue a partial refund.
There is a high chance that will happen actually, knowing how things have happened in the past and how anyone is willing to do anything for some publicity and a buck this would not surprise me at all. Its been to long since launch of the cards for this explanation to fly at this point (Being a "whoops there was a miscommunication" line) and they are likely going to feel backlash from it.

pr0fessor said:
Like 4 GB is needed for today's games and 3.5 GB is not enough. The most customers won't notice any difference. A dirty feeling about NVIDIA hardware what so ever arrives. Quality is something else.
3.5gb is plenty for most scenarios however some people bought this card for extreme resolutions and such that the 4gb would be helpful in the future. I know at least 1 person who bought 3 intending a 4K rig on a little bit of a lower budget ($900 for 3 cards versus $1,100 for 2 980's) and this is something that might have changed his mind and caused him to upgrade or look at the alternatives (Actually have not heard from him yet or if he knows about it ill have to ask him at the next LAN party).

While I doubt many people here or anywhere were concerned with the ROP count, the L2 Cache, among other things it is still not right to lie to your customers. Performance has not changed and the numbers seen before still stand, however the 3.5gb is the most concerning part to those running the extreme areas of gaming and still could have effected a small amount of users decisions (I am being conservative with that). Even if just 5% would have changed their minds based on this information that is 5% of the people who purchased the card that feel ripped off in some way (Random number, not an actual figure). I don't find the way they are handling this to be smart nor the way it started out to begin with smart.
Posted on Reply
#14
Ferrum Master
bpgt64 said:
where the ram is relevant.
The problem is with reasonable buyers, who bought the card to be future proof, and thus taking the vram amount into reasoning. And Games tend to eat more VRAM lately... if you play old ones, except Skyrim, then it is OK, but those who bought 970 wont just play CS:GO. It would be shame if after 6months witcher3 and GTA5 will bring this card to their knees and a new card will be needed again... but hey... that was the plan :nutkick:
Posted on Reply
#15
MAXLD
It's not super hard to believe that a marketing mistake was made initially when giving infos to reviewers and so on (even if it's reported around the web that those marketing dudes have high knowledge of GPU tech).

What is very hard to believe is that after months passed, and hundreds of reviews and articles, nobody at nVidia noticed the errors on so many reviews up until now. (Which if it was the case, it would prove they just don't give a damn what the reviews say, as long as they give a nice final grade and Pro/Cons appraisal.)

That said, it might mean that they did noticed the info/marketing "mistake" but didn't say anything until now because the card was working as intended anyway, getting mega hype, getting big sales, and pointing out the info mistake would actually be a possible marketing "downgrade", since the card was doing so well and AMD has no new response until Q2. So they just kept quiet, since this was detected only under specific games+settings, reviewers didn't even catch the info issue, and they just decided to shrug their shoulders... hoping that those user reports being made were just something considered by others as inaccurate or just non-relevant to the point of making a fuss about it. A gamble that failed completely.
Posted on Reply
#16
the54thvoid
Makes me happy I opted out. I was looking at going 4k, still am, and sli 970's looked like a good option.
I'd have bought the cards for the potential 4GB memory. If my experience could have been marred by this, in scenario's where 4k used over 3.5GB, I would be angry.
But, I read reviews and sli 970's seemed weaker than other options. I stayed with my 780ti and sli'd that instead.
This very poor PR for NV. Kind of impossible to defend the lie. Great card but needs to be formally rebranded as 3.5GB.
Posted on Reply
#17
matar
2 GTX 970 OEM was on my shopping list next month mainly the OEM because I love the OEM cooler and look just like the 980 but now that I read all this I will wait for the GTX 970Ti editions 20nm or what ever they will be called.
Posted on Reply
#18
RCoon
Gaming Moderator
Just to let you guys know, retailers and AIB partners (Gigabyte, Asus, MSI) are not accepting returns for this problem at this time. I presume they will be in avid communications with NVidia first before we get a response on where to go from here.
Posted on Reply
#19
yogurt_21
So they're pretty much saying the 970 as paper spec'ed would have been within a few percentage pts of the 980. Seriously if it's as fast as it is at 56 ROPs and less L2 than we thought, full memory specs and 64 ROPs would seem to further close the 10-12% performance gap between the 2 cards. That puts pressure on the 980 sales and further distances the 970 from the 960 which was already a massive gap to begin with.

False advertising aside, they had to neuter it. Next time though a little heads up will save them a lot of PR crap.
Posted on Reply
#20
Beertintedgoggles
I still call BS on this.... from the front page news yesterday, Nvidia claimed that both the 980 and the 970 suffered slowdowns over 3.5GB of VRAM memory usage. Today they are claiming that this "issue" was created in the way they disabled some of the components to create the 970 line. Something still doesn't add up here.

Edit: After checking the article from yesterday, the table included that showed the effects of running <3.5GB and >3.5GB were almost identical in the performance hit on both the 980 and the 970. If that is true, then someone is still lying.
Posted on Reply
#21
looniam
"Why NVIDIA chose to reduce cache size and ROP count will remain a mystery."

idk, it seemed the TR, PCper and esp. anand tech articles made it quite clear. though i do seem to have a talent at solving murder mysteries within the first chapter of the book.

" We can't imagine that the people designing the chip will not have sufficiently communicated this to the driver and technical marketing teams."

do you think they go out and have after work drinks? i'd be surprise if they're in the same building let alone on the same floor. in a perfect world all departments communicate well w/each other. however in the real world it is lacking.

"To claim that technical marketing didn't get this the first time around, seems like a hard-sell. We're pretty sure that NVIDIA engineers read reviews, and if they saw "64 ROPs" on a first-page table, they would have reported it up the food-chain at NVIDIA."

word on the street is the engineers were too busy watching kitty cat videos while eating cheetos.

"An explanation about this hardware change should have taken up an entire page in the technical documents the first time around, and NVIDIA could have saved itself a lot of explanation, much of it through the press."

yeah and i am surprised that technology journalists who have reported for years didn't see the asymmetrical design also. hopefully they will learn from nvidia's mistake as well.


edit: oh yeah HI, i am new :)
Posted on Reply
#22
night.fox
so NVIDIA in trouble? I wonder what the AMD camp are thinking about this.

I mean if nobody found out about this "anomaly", NVIDIA will not say anything. Thats for sure. I mean its been months since 970 was out and they only found out now?

And response was a communication gap between each NVIDIA departments? For sure they lock this parts in purpose. so when they unlocked it, they will call it 970ti. Look at 780 and 780ti. 780 ti was just unlocked 780. am i right?
Posted on Reply
#23
ironwolf
Any word from the AMD camp over this? I'd be curious if they might try to pull some PR stuff using this. Or if they will just keep their traps shut for the time being. :laugh:
Posted on Reply
#24
RejZoR
Selene said:
This is no different than the dual GPU cards, they physically have double the memory but only half is usable. This changes nothing, the card does have 4gb, and as you said all the benchmarks are still the same. I dont agree with them doing this and not telling people but if you got the card based on reviews and benchmarks you got what you paid for.

The truth is once the card gets a to a rez where 4gb would even be worth having the GPU cant handle it and it would make maybe 1-2fps difference at best, its been show time and time again, 256bit bus really can only handle 2gb.
SLI/Crossfire actually has a legit explanation behind it. When you merge two or more cards, they still have to process the same frames, either alternating or some other method on all of them. But the fact is, you effectively only have as much memory as each individual card has.

In theory, they could merge the memory pool and share it through PCIe, but I think OS doesn't really support that and GPU's aren't on the level where shaders could co-operate between GPU's in a seamless way in a way where you could just easily stack things up together.
Posted on Reply
#25
Ferrum Master
RejZoR said:
In theory, they could merge the memory pool and share it through PCIe
Nada, too much latency for high FPS rate and frame time costs... we are arguing about stutter when same gpu accesses via crossbar the other memory partition of itself, it would be a mess if a second GPU wan't to access unified pool data via PCIE and back to second card?
Posted on Reply
Add your own comment