Friday, September 25th 2020

RTX 3080 Crash to Desktop Problems Likely Connected to AIB-Designed Capacitor Choice

Igor's Lab has posted an interesting investigative article where he advances a possible reason for the recent crash to desktop problems for RTX 3080 owners. For one, Igor mentions how the launch timings were much tighter than usual, with NVIDIA AIB partners having much less time than would be adequate to prepare and thoroughly test their designs. One of the reasons this apparently happened was that NVIDIA released the compatible driver stack much later than usual for AIB partners; this meant that their actual testing and QA for produced RTX 3080 graphics cards was mostly limited to power on and voltage stability testing, other than actual gaming/graphics workload testing, which might have allowed for some less-than-stellar chip samples to be employed on some of the companies' OC products (which, with higher operating frequencies and consequent broadband frequency mixtures, hit the apparent 2 GHz frequency wall that produces the crash to desktop).

Another reason for this, according to Igor, is the actual "reference board" PG132 design, which is used as a reference, "Base Design" for partners to architecture their custom cards around. The thing here is that apparently NVIDIA's BOM left open choices in terms of power cleanup and regulation in the mounted capacitors. The Base Design features six mandatory capacitors for filtering high frequencies on the voltage rails (NVVDD and MSVDD). There are a number of choices for capacitors to be installed here, with varying levels of capability. POSCAPs (Conductive Polymer Tantalum Solid Capacitors) are generally worse than SP-CAPs (Conductive Polymer-Aluminium-Electrolytic-Capacitors) which are superseded in quality by MLCCs (Multilayer Ceramic Chip Capacitor, which have to be deployed in groups). Below is the circuitry arrangement employed below the BGA array where NVIDIA's GA-102 chip is seated, which corresponds to the central area on the back of the PCB.
In the images below, you can see how NVIDIA and it's AIBs designed this regulator circuitry (NVIDIA Founders' Edition, MSI Gaming X, ZOTAC Trinity, and ASUS TUF Gaming OC in order, from our reviews' high resolution teardowns). NVIDIA in their Founders' Edition designs uses a hybrid capacitor deployment, with four SP-CAPs and two MLCC groups of 10 individual capacitors each in the center. MSI uses a single MLCC group in the central arrangement, with five SP-CAPs guaranteeing the rest of the cleanup duties. ZOTAC went the cheapest way (which may be one of the reasons their cards are also among the cheapest), with a six POSCAP design (which are worse than MLCCs, remember). ASUS, however, designed their TUF with six MLCC arrangements - there were no savings done in this power circuitry area.

It's likely that the crash to desktop problems are related to both these issues - and this would also justify why some cards cease crashing when underclocked by 50-100 MHz, since at lower frequencies (and this will generally lead boost frequencies to stay below the 2 GHz mark) there is lesser broadband frequency mixture happening, which means POSCAP solutions can do their job - even if just barely.
Source: Igor's Lab
Add your own comment

297 Comments on RTX 3080 Crash to Desktop Problems Likely Connected to AIB-Designed Capacitor Choice

#76
serelaw
roccale
It's beautiful :)
I'm ha haw-ing that the scalpers bought all the first cards and they are broken.
fuck them anyway.
Posted on Reply
#77
rtwjunkie
PC Gaming Enthusiast
serelaw
I'm ha haw-ing that the scalpers bought all the first cards and they are broken.
fuck them anyway.
I doubt many of them are offering refunds. As far as they are concerned they made money and moved on. So this doesn’t hurt them at all.
Posted on Reply
#78
Camm
[MEDIA=twitter]1309659834468298753[/MEDIA]
"The crashing with the RTX 3080 cards doesn’t appear to be down to the caps used, which is why we haven’t made a video yet, we don’t know the issue. What we do know is the FE and TUF Gaming models crash just as much as other models and they use MLCC’s."

Not over yet (although the lack of MLCC's probably doesn't help).
Posted on Reply
#79
Rashkae
On Reddit many people are reporting that the crashes stop once they use 2 dedicated power connectors instead of a splitter.
Posted on Reply
#80
Camm
Rashkae
On Reddit many people are reporting that the crashes stop once they use 2 dedicated power connectors instead of a splitter.
I wouldn't be surprised, but those in the tech press such as Igor & HWU have had the same issue and they aren't that dumb, lol.
Posted on Reply
#84
Turmania
Does this mean a recall is coming? If so not very good for Nvidia.
Posted on Reply
#85
xkm1948
Rashkae
True, so it seems to point more at the limits of Samsung 8nm



And Asus uses no cheap POSCAPs at all and there have been crash reports
POSCAP is more expensive than MLCC it seems. But sure, carry on with your pitch forks
Posted on Reply
#86
KarymidoN
Turmania
Does this mean a recall is coming? If so not very good for Nvidia.
no recall, the issue happens when the chip boosts on high clockspeeds, they will "FIX" it by limiting the clockspeeds in firmware or via driver update. the consumer will get less performance but who cares amiright?
Posted on Reply
#87
Rashkae
xkm1948
POSCAP is more expensive than MLCC it seems. But sure, carry on with your pitch forks
No. Other way around.
Posted on Reply
#89
dicktracy
Dis is why you don't want to be an early adopter.
Posted on Reply
#90
hwoarang5
avoid cheap out zotac, got it thanks...
Posted on Reply
#92
sergionography
With all the bad press and hiccups with the rtx3000 series, AMD has the red carpet rolling for it to make a grand entrance. It's impressive how lucky they have been in the past few years. Sure they did great work, but their competition also weren't at their best.
Posted on Reply
#93
dragontamer5788
xkm1948
POSCAP is more expensive than MLCC it seems. But sure, carry on with your pitch forks
Which MLCC? Which POSCAP?

* Here's a $0.30 POSCAP: 6.3V 150uF 3528 sizing

* Here's a POSCAP that's $1.64: 6.3V 150uF 2917 sizing

* Here's a $1.53 MLCC 6.3V 150uF 1210 size.

* Here's a $0.27 6.3V 150uF 1206 size MLCC.

Yeah, all are 150uF and 6.3V rated. I did that on purpose, because there's many, many, many other specifications on capacitors than just voltage, capacitance, chemistry, and size. (150uF is huge for the sizes we're talking about. I probably should have picked a smaller size... too late, I'm not looking all this stuff up again)

EDIT: Ah crap, I forgot to normalize for Metric-Size vs American-size. Whatever.

There are expensive MLCCs, there are cheap MLCCs, there are expensive POSCAPs, there are cheap POSCAPs. There are big MLCCs, there are small MLCCs, there are big POSCAPs, there are small POSCAPs, there are low ESR and then lower ESR caps, there are ESL-optimized caps, there are frequency optimized caps. There are multi-terminal caps, there are 2-terminal caps. There are sideways caps. There are stacked caps.

There are 755,004 MLCCs available for purchase from Digikey. There are 13,507 Tantalum-Polymer Capacitors for purchase from Digikey. There are KEMET POSCAPs, there are Panasonic POSCAPs, there are Samsung POSCAPs, there are Vishaay POSCAPs. There are AVX MLCCs, there are KEMET MLCCs, there are Murata MLCCs.

But Vishaay is more known for their resistors not really their caps. Murata is known for their MLCCs. KEMET is known for... I forget. There's a reputation for each of these companies to keep track of too.

--------

I should note: high-speed digital circuits with high-power and high-frequencies with multilayer PCB boards was a subject I ran away screaming from in college. That's literally one of the hardest subjects I've ever seen in my life. Yes, choosing the wrong capacitor can cause issues, but other mistakes include having your PCB-traces the wrong length (too long, or too short), or come at various angles, or otherwise messing up your transmission line characteristics.



Do you see that? You literally just made a capacitor here because any copper running with some insulator in between them makes capacitance. Did your PCB trace make a turn? Congratulations, you now have parasitic inductance, and the signal may reflect off of the copper corner. Will that mess up your design? Maybe.
Posted on Reply
#94
BigJonno
Looks like Asus Tuf Gaming RTX3080 also has a poscap version out there!



edit: This is the Tuf Gaming non OC version.
Posted on Reply
#95
WeisserWalFisch
Why should any customer purchase feature stripped and unreliable hardware that performs at thermal limits of its components burning down your house.
Posted on Reply
#96
lexluthermiester
svan71
I'm starting to understand Apples decision to tell nvidia to get bent.
That's a very misinformed understanding considering we're about a decade removed from that decision..
Posted on Reply
#97
-The_Mask-
EVGA confirms the issue with the capacitors
Recently there has been some discussion about the EVGA GeForce RTX 3080 series.

During our mass production QC testing we discovered a full 6 POSCAPs solution cannot pass the real world applications testing. It took almost a week of R&D effort to find the cause and reduce the POSCAPs to 4 and add 20 MLCC caps prior to shipping production boards, this is why the EVGA GeForce RTX 3080 FTW3 series was delayed at launch. There were no 6 POSCAP production EVGA GeForce RTX 3080 FTW3 boards shipped.

But, due to the time crunch, some of the reviewers were sent a pre-production version with 6 POSCAP’s, we are working with those reviewers directly to replace their boards with production versions.
EVGA GeForce RTX 3080 XC3 series with 5 POSCAPs + 10 MLCC solution is matched with the XC3 spec without issues.

Also note that we have updated the product pictures at EVGA.com to reflect the production components that shipped to gamers and enthusiasts since day 1 of product launch.
Once you receive the card you can compare for yourself, EVGA stands behind its products!

Thanks
EVGA
forums.evga.com/m/tm.aspx?m=3095238&p=1
Posted on Reply
#98
Jism
WeisserWalFisch
Why should any customer purchase feature stripped and unreliable hardware that performs at thermal limits of its components burning down your house.
Computer hardware is well protected against Overcurrent, Shorting out and all that. That's not the issue.

You got a AIB here that skimps out on parts that nvidia initially recommends. If the card is sold at the same price as the rest then their intention is just to make more profit over each sold card.
Posted on Reply
#99
Shatun_Bear
The Ampere disaster continues.

There is a consequence when you rush release GPUs made on a poor process node that use up to 400W. We've never had cards in living memory that draw so much power, it's ridiculous.
Posted on Reply
#100
Chomiq

You don't need to watch full video, Buildzoid starts rambling 5 minutes in.
Posted on Reply
Add your own comment