Friday, September 25th 2020

RTX 3080 Crash to Desktop Problems Likely Connected to AIB-Designed Capacitor Choice

Igor's Lab has posted an interesting investigative article where he advances a possible reason for the recent crash to desktop problems for RTX 3080 owners. For one, Igor mentions how the launch timings were much tighter than usual, with NVIDIA AIB partners having much less time than would be adequate to prepare and thoroughly test their designs. One of the reasons this apparently happened was that NVIDIA released the compatible driver stack much later than usual for AIB partners; this meant that their actual testing and QA for produced RTX 3080 graphics cards was mostly limited to power on and voltage stability testing, other than actual gaming/graphics workload testing, which might have allowed for some less-than-stellar chip samples to be employed on some of the companies' OC products (which, with higher operating frequencies and consequent broadband frequency mixtures, hit the apparent 2 GHz frequency wall that produces the crash to desktop).

Another reason for this, according to Igor, is the actual "reference board" PG132 design, which is used as a reference, "Base Design" for partners to architecture their custom cards around. The thing here is that apparently NVIDIA's BOM left open choices in terms of power cleanup and regulation in the mounted capacitors. The Base Design features six mandatory capacitors for filtering high frequencies on the voltage rails (NVVDD and MSVDD). There are a number of choices for capacitors to be installed here, with varying levels of capability. POSCAPs (Conductive Polymer Tantalum Solid Capacitors) are generally worse than SP-CAPs (Conductive Polymer-Aluminium-Electrolytic-Capacitors) which are superseded in quality by MLCCs (Multilayer Ceramic Chip Capacitor, which have to be deployed in groups). Below is the circuitry arrangement employed below the BGA array where NVIDIA's GA-102 chip is seated, which corresponds to the central area on the back of the PCB.
In the images below, you can see how NVIDIA and it's AIBs designed this regulator circuitry (NVIDIA Founders' Edition, MSI Gaming X, ZOTAC Trinity, and ASUS TUF Gaming OC in order, from our reviews' high resolution teardowns). NVIDIA in their Founders' Edition designs uses a hybrid capacitor deployment, with four SP-CAPs and two MLCC groups of 10 individual capacitors each in the center. MSI uses a single MLCC group in the central arrangement, with five SP-CAPs guaranteeing the rest of the cleanup duties. ZOTAC went the cheapest way (which may be one of the reasons their cards are also among the cheapest), with a six POSCAP design (which are worse than MLCCs, remember). ASUS, however, designed their TUF with six MLCC arrangements - there were no savings done in this power circuitry area.

It's likely that the crash to desktop problems are related to both these issues - and this would also justify why some cards cease crashing when underclocked by 50-100 MHz, since at lower frequencies (and this will generally lead boost frequencies to stay below the 2 GHz mark) there is lesser broadband frequency mixture happening, which means POSCAP solutions can do their job - even if just barely.
Source: Igor's Lab
Add your own comment

297 Comments on RTX 3080 Crash to Desktop Problems Likely Connected to AIB-Designed Capacitor Choice

#26
moproblems99
bug
Tight launch timing, my ass. What happened to "if you don't have the time to do the work, don't release"?
We lost that when testing/qa was laid off.
Posted on Reply
#27
mborghi
This affects ALL 3080s models from ALL brands? Or only specific ones?
Posted on Reply
#28
Assimilator
mborghi
This affects ALL 3080s models from ALL brands? Or only specific ones?
Nobody knows because this is a THEORY.
Posted on Reply
#29
xtreemchaos
its a trend thats happening a lot nowa its a total lack of respect for the customer but thay will keep doing it as long as we let them its not only things like gpus its games too thay release them before there fit and leave them to modders to fix. greed and lazynuss and carnt give a shit comes to mind.
Posted on Reply
#30
mouacyk
Sorry Jensen -- about that upgrade you mentioned for your Pascal friends...
Posted on Reply
#31
Bubster
Nvidia have messed this product roll up so much...how come those fake Yotube reviewers didn't spot these (crashes)
Posted on Reply
#32
mechtech
Chrispy_
Yet more solid confirmation that Nvidia really rushed the whole 30-series launch.

It's uncharactaristic from Nvidia, so what do they know about RDNA2 that makes them in such a hurry to get this horse out of the gate before it's ready for prime time?
Probably nothing, and even if it was everything would it matter? Nvidia has huge market share, and there has been some challenges wit the RX5k series. Even when AMD had some real gems for dirt cheap (I'm thinking HD4850/4870) people still bought nvidia.

Maybe they did want to beat AMD to market, maybe they thought they were ready for prime time, who knows. Either way maybe the scalpers will be stuck with some excess stock ;)

Time will tell.
Posted on Reply
#33
moproblems99
Assimilator
Nobody knows because this is a THEORY.
Theory? I already have my pitchfork!?
Posted on Reply
#34
Axaion
Sooo.. the TUF/Strix seems safe then?

Great, they seemed to have the best cooling/noise ratio anyway
Posted on Reply
#35
theoneandonlymrk
Comedy, The way it's meant to be played.

No.

Consumers are getting played.

Damn Beta community tactics need to stop.
Posted on Reply
#36
B-Real
Animalpak
3000 series looked already too good to be true...
In what terms did it look good? The 3080's performance uplift from the 2080 was the only really positive about this series so far. But it didn't even reach the 1080's performance leap. Moreover, the efficiency gain of both the 3080 and the 3090 is close to crap: it is identical to the 2080/2080Ti's. The 1080 and 1080Ti's efficiency gain was 3x more than these. The 3080 consumes nearly 100W more than the 2080. And the OC capability of both 3000 cards are even worse than usual AMD GPU's: it's around 2-3%. Even the 2080 and 2080 Ti was around 10%, the 1080 was even more, 13%. They advertised the 3090 as a 8K GAMING card. In reality, most games run maybe with 30 fps with 20 fps lows. And the 4K performance leap over the 3080 is close to 10%. WTF? The 3070 was advertised as "Faster than 2080 Ti". If we can believe Galax's communication, the 3070 will be unequivocally slower than the 2080Ti.
Posted on Reply
#37
Solid State Soul ( SSS )
If this is true how come these issues never came up in the review cycle with many dozens reviewers posting high praise reviews ?

Even trusted reviewers like gamers nexus and jayz two cents who reveal issues your conventional reviewer wont do the extra work to uncover, never had an issue with them ?

I think this is just the case of end users trying so hard to overclock their cards pushing them to perform to whatever high standards they deem acceptable then post negative threads when their cards cant overclock high enough past default profiles
Posted on Reply
#38
metalfiber
The only ones one the list i've seen that have the most problems are MSI, EVGA, and ZOTAC with ZOTAC being the worst. I haven't seen the ASUS cards mentioned. As it says in the article Asus TUF used six MLCC's and they have the most custom components on a reference board and supposedly test them for 144 hours...

Posted on Reply
#39
xkm1948
Assimilator
Too many unknowns to tell. Igor's speculation is just that, speculation - but somehow his "possible" gets turned into "likely" by TPU's clickbait editors. Once again, shameful yellow journalism on par with WCCFTech.
The game of telephones and the desire to get click click click
Posted on Reply
#40
moproblems99
Solid State Soul ( SSS )
If this is true how come these issues never came up in the review cycle with many dozens reviewers posting high praise reviews ?
Cherry picked?
Solid State Soul ( SSS )
I think this is just the case of end users trying so hard to overclock their cards pushing them to perform to whatever high standards they deem acceptable then post negative threads when their cards cant overclock high enough past default profiles
Supposedly they are just the factory profiles. But it is very likely some are from people overclocking.
Posted on Reply
#41
Haile Selassie
I don't believe EE design is at fault here. After all, all qualifications are done under worst possible conditions. Moreover, the same issue is present on FE boards.
The MCUs are simply not binned good enough or there is an issue with boost algorithm. Same happened with Turing boards.
Posted on Reply
#42
Mirrormaster85
So, as an Electronics Engineer and PCB Designer I feel I have to react here.
The point that Igor makes about improper power design causing instability is a very plausible one. Especially with first production runs where it indeed could be the case that they did not have the time/equipment/driver etc to do proper design verification.


However, concluding from this that a POSCAP = bad and MLCC = good is waaay to harsh and a conclusion you cannot make.


Both POSCAPS (or any other 'solid polymer caps' and MLCC's have there own characteristics and use cases.


Some (not all) are ('+' = pos, '-' = neg):
MLCC:
+ cheap
+ small
+ high voltage rating in small package
+ high current rating
+ high temperature rating
+ high capacitance in small package
+ good at high frequencies
- prone to cracking
- prone to piezo effect
- bad temperature characteristics
- DC bias (capacitance changes a lot under different voltages)


POSCAP:
- more expensive
- bigger
- lower voltage rating
+ high current rating
+ high temperature rating
- less good at high frequencies
+ mechanically very strong (no MLCC cracking)
+ not prone to piezo effect
+ very stable over temperature
+ no DC bias (capacitance very stable at different voltages)


As you can see, both have there strengths and weaknesses and one is not particularly better or worse then the other. It all depends.
In this case, most of these 3080 and 3090 boards may use the same GPU (with its requirements) but they also have very different power circuits driving the chips on the cards.
Each power solution has its own characteristics and behavior and thus its own requirements in terms of capacitors used.
Thus, you cannot simply say: I want the card with only MLCC's because that is a good design.
It is far more likely they just could/would not have enough time and/or resources to properly verify their designs and thus where not able to do proper adjustments to their initial component choices.
This will very likely work itself out in time. For now, just buy the card that you like and if it fails, simply claim warranty. Let them fix the problem and down draw to many conclusions based on incomplete information and (educated) guess work.
Posted on Reply
#43
mahirzukic2
Bubster
Nvidia have messed this product roll up so much...how come those fake Yotube reviewers didn't spot these (crashes)
I am pretty sure that they have, but they may have attributed it to the newness of the architecture and the beta drivers.
Posted on Reply
#44
Metroid
This is disgraceful no matter how you look at, $699 gpu should have the best of the best of components on it. This same problem happened with the gt 8800, 30% rma and they did not want to accept it, whatthehell.
Posted on Reply
#45
Dimi
B-Real
In what terms did it look good? The 3080's performance uplift from the 2080 was the only really positive about this series so far. But it didn't even reach the 1080's performance leap. Moreover, the efficiency gain of both the 3080 and the 3090 is close to crap: it is identical to the 2080/2080Ti's. The 1080 and 1080Ti's efficiency gain was 3x more than these. The 3080 consumes nearly 100W more than the 2080. And the OC capability of both 3000 cards are even worse than usual AMD GPU's: it's around 2-3%. Even the 2080 and 2080 Ti was around 10%, the 1080 was even more, 13%. They advertised the 3090 as a 8K GAMING card. In reality, most games run maybe with 30 fps with 20 fps lows. And the 4K performance leap over the 3080 is close to 10%. WTF? The 3070 was advertised as "Faster than 2080 Ti". If we can believe Galax's communication, the 3070 will be unequivocally slower than the 2080Ti.
Excuse me but what the hell are you talking about?

Yes it consumes more but the perf/watt is the highest of any card, AMD doesn't even get close.

Posted on Reply
#46
Vya Domus
Dimi
Yes it consumes more but the perf/watt is the highest of any card, AMD doesn't even get close.
If 8% means "not even close". Sure ...

You gotta lay off the cool-aid, Pascal was almost 40% better than Maxwell in terms of per/watt. In comparison Ampere's improvement in that area is absolutely pathetic over Turing.

Posted on Reply
#47
Cheeseball
Not a Potato
metalfiber
The only ones one the list i've seen that have the most problems are MSI, EVGA, and ZOTAC with ZOTAC being the worst. I haven't seen the ASUS cards mentioned. As it says in the article Asus TUF used six MLCC's and they have the most custom components on a reference board and supposedly test them for 144 hours...


Yeah, ASUS probably did the work this time around. They even put a proper heatsink on the memory modules. It's sad because the TUF version of their RX 5700 XT didn't do so well compared to the other brands.
Posted on Reply
#48
blobster21
Metroid
This is disgraceful no matter how you look at, $699 gpu should have the best of the best of components on it. This same problem happened with the gt 8800, 30% rma and they did not want to accept it, whatthehell.
Vote with your wallet. Enough said.
Posted on Reply
#49
kiriakost
Igor mentions ..... Igor mentions .... Igor mentions .... and the TPU team gave GOLDEN EDITORS CHOICE ?
I did dare to make preliminary collection of RTS 3000 electrical weak points, and some one from the TPU stuff it did block my access at the topic ..... reason unproductive comments.

Here is another unproductive prediction .... masive product return to bases (they are many ) = Product Recalls.
Bubster
Nvidia have messed this product roll up so much...how come those fake Yotube reviewers didn't spot these (crashes)
They did not demonstrate actual games rather plain cards, some one made even a comparison RTX 3800 vs GTS 1660 Super at 4K ( he is an idiot).
Posted on Reply
#50
moproblems99
Dimi
Excuse me but what the hell are you talking about?

Yes it consumes more but the perf/watt is the highest of any card, AMD doesn't even get close.
8% isn't close?
Posted on Reply
Add your own comment