RTX 3080 Crash to Desktop Problems Likely Connected to AIB-Designed Capacitor Choice

mtcn77 · Sep 26, 2020

I don't get the reception that I'm signalling for, but what about chain of excellence?
This is what TSMC posted a month before. Fluke or coincidence?
Nvidia is trying to reinvent the wheel, maybe...

Introducing TSMC 3DFabric: TSMC’s Family of 3D Silicon Stacking, Advanced Packaging Technologies and Services - Taiwan Semiconductor Manufacturing Company Limited

Computing workloads have evolved more over the past decade than perhaps the previous four decades. Not too long ago, word processing, spreadsheets, presentation graphics and the occasional game of solitaire were the typical workload for even the most advanced processors in the world.

www.tsmc.com

blobster21 · Sep 26, 2020

Come on, we need more insightful comments here ! (and i'm bored to death anyway, so keep them coming please

)

lexluthermiester · Sep 26, 2020

BoboOOZ said:
Nvidia never admits being wrong and always blames the partners (TSMC, Apple, etc.), so here they will say that the fault is with the AIB and the fix will be based on downclocking...

It's not NVidia's fault. The AIB's are solely to blame for not following the recommendations and not doing proper testing. The reality is, people will need to do a little bit of downclocking to keep those card stable. It's not the end of the world and likely will not even affect over-all card performance to a noticeable degree.

Chomiq · Sep 26, 2020

lexluthermiester said:
It's not NVidia's fault. The AIB's are solely to blame for not following the recommendations and not doing proper testing.

Nvidia has to approve each partner board design. Also, aib partners didn't even get the drivers until review samples were shipped out.

lexluthermiester · Sep 26, 2020

Chomiq said:
Nvidia has to approve each partner board design.

The design, yes. That doesn't mean it was tested buy NVidia. That is the responsibility of the AIB's.

Chomiq said:
Also, aib partners didn't even get the drivers until review samples were shipped out.

And that is still not NVidia's fault. The problem would not exist if the AIB's had followed the recommendations stated by NVidia. That is what recommendations are for.

zlobby · Sep 26, 2020

roccale said:
It's beautiful

Indeed so, most indeedely!

BoboOOZ · Sep 26, 2020

lexluthermiester said:
It's not NVidia's fault. The AIB's are solely to blame for not following the recommendations and not doing proper testing. The reality is, people will need to do a little bit of downclocking to keep those card stable. It's not the end of the world and likely will not even affect over-all card performance to a noticeable degree.

We don't know yet what's happening exactly, but you are already sure Nvidia has no responsibility in this? That's very unbiased of you.

asdkj1740 · Sep 26, 2020

subbed

EarthDog · Sep 26, 2020

asdkj1740 said:
subbed

just an FYI, there is a "watch" button at the top of the page just for subscribing.

BoboOOZ said:
We don't know yet what's happening exactly, but you are already sure Nvidia has no responsibility in this? That's very unbiased of you.

What is Nvidia's role in this?

lexluthermiester · Sep 26, 2020

BoboOOZ said:
We don't know yet what's happening exactly, but you are already sure Nvidia has no responsibility in this?

So far, these problems are NOT happening with NVidia's own cards, nor the higher-tier cards from AIB's. It's just the lower tier offerings from AIB's. The responsibility rests with the AIBs. Please review;

BoboOOZ said:
That's very unbiased of you.

Bias has nothing to do with it. The info out there is showing the problem.

TheoneandonlyMrK · Sep 26, 2020

lexluthermiester said:
So far, these problems are NOT happening with NVidia's own cards, nor the higher-tier card from AIB's. It's just the lower tier offerings from AIB's. The responsibility rests with the AIBs. Please review;

No company shouts more about their work with partners, Devs and AIB.
The reference spec design they passed AIB was different to their own reference card's.
And they compressed development and testing time to near zero.
And they allowed such design variation in their development reference kit instead of both knowing that it needed specific voltage conditioning and informing AIB partners or limiting those AIB designs.

It's not all on Nvidia but they share the blame.

BoboOOZ · Sep 26, 2020

lexluthermiester said:
So far, these problems are NOT happening with NVidia's own cards, nor the higher-tier cards from AIB's. It's just the lower tier offerings from AIB's. The responsibility rests with the AIBs.

Bias has nothing to do with it. The info out there is showing the problem.

That's not true, and Jays2c is fun and all, but his technical abilities aren't awesome. he might be onto something, but apparently, FE crashes as well

https://twitter.com/x/status/1309659834468298753

Most of the time, in this type of situation, the responsibility is shared, but the chances than Nvidia gave very clear and correct specifications and the AIB just blatantly disprespected them are close to 0.

Time will tell, but it looks like we were expecting another Pascal and we got another Fermi... They'll fix it soon, I imagine, if it's just a matter of dropping the frequency a tad should be easily fixable.

Dave65 · Sep 26, 2020

lexluthermiester said:
It's not NVidia's fault. The AIB's are solely to blame for not following the recommendations and not doing proper testing. The reality is, people will need to do a little bit of downclocking to keep those card stable. It's not the end of the world and likely will not even affect over-all card performance to a noticeable degree.

You GOT to be kidding , right?
This is exactly on Nvidia. :shadedshu:

Radi_SVK · Sep 26, 2020

Mirrormaster85 said:
So, as an Electronics Engineer and PCB Designer I feel I have to react here.
The point that Igor makes about improper power design causing instability is a very plausible one. Especially with first production runs where it indeed could be the case that they did not have the time/equipment/driver etc to do proper design verification.

However, concluding from this that a POSCAP = bad and MLCC = good is waaay to harsh and a conclusion you cannot make.

Both POSCAPS (or any other 'solid polymer caps' and MLCC's have there own characteristics and use cases.

Some (not all) are ('+' = pos, '-' = neg):
MLCC:
+ cheap
+ small
+ high voltage rating in small package
+ high current rating
+ high temperature rating
+ high capacitance in small package
+ good at high frequencies
- prone to cracking
- prone to piezo effect
- bad temperature characteristics
- DC bias (capacitance changes a lot under different voltages)

POSCAP:
- more expensive
- bigger
- lower voltage rating
+ high current rating
+ high temperature rating
- less good at high frequencies
+ mechanically very strong (no MLCC cracking)
+ not prone to piezo effect
+ very stable over temperature
+ no DC bias (capacitance very stable at different voltages)

As you can see, both have there strengths and weaknesses and one is not particularly better or worse then the other. It all depends.
In this case, most of these 3080 and 3090 boards may use the same GPU (with its requirements) but they also have very different power circuits driving the chips on the cards.
Each power solution has its own characteristics and behavior and thus its own requirements in terms of capacitors used.
Thus, you cannot simply say: I want the card with only MLCC's because that is a good design.
It is far more likely they just could/would not have enough time and/or resources to properly verify their designs and thus where not able to do proper adjustments to their initial component choices.
This will very likely work itself out in time. For now, just buy the card that you like and if it fails, simply claim warranty. Let them fix the problem and down draw to many conclusions based on incomplete information and (educated) guess work.

Amen and thank you!
Dont think I have to look for more informative and unbiased opinion.

lexluthermiester · Sep 26, 2020

theoneandonlymrk said:
It's not all on Nvidia but they share the blame.

There's likely some truth to that, but people are acting like it's ALL on NVidia which is a crock of poop.... Example you ask?...

Dave65 said:
You GOT to be kidding , right?
This is exactly on Nvidia.

There you go..

MelonGx · Sep 26, 2020

https://twitter.com/x/status/1309840810880282625

For those people who insisted TUF won't crash, I post an evidence video of my TUF crashed.

lexluthermiester · Sep 26, 2020

BoboOOZ said:
That's not true, and Jays2c is fun and all, but his technical abilities aren't awesome.

His are better than yours it would seem...

asdkj1740 · Sep 26, 2020

lexluthermiester said:
His are better than yours it would seem...

you should check his latest response on his twitter.

harm9963 · Sep 26, 2020

The one i want!

TUF-RTX3080-10G-GAMING｜Graphics Cards｜ASUS USA

TUF Gaming graphics cards add hefty 3D horsepower to the TUF Gaming ecosystem, with features like Auto-Extreme manufacturing, steel backplates, high-tech fans, and IP5X certifications. And it’s all backed by a rigorous battery of validation tests to ensure compatibility with the latest TUF...

www.asus.com

mtcn77 · Sep 26, 2020

Rado D said:
Amen and thank you!
Dont think I have to look for more informative and unbiased opinion.

Agreed. People who show up at such a debate make it almost into a fortune to behold.

BigBonedCartman · Sep 26, 2020

RTX 2000 series had faulty brand new card randomly dying, RTX 3000 series has AIB partners cheaping out on capacitors, AMD constantly has driver issues..... WTF is wrong with GPU manufacturing?

lexluthermiester · Sep 26, 2020

asdkj1740 said:
you should check his latest response on his twitter.

Who's? What response are we talking about?

BigBonedCartman said:
WTF is wrong with GPU manufacturing?

Nothing. They are making ever more complex and powerful cards to push the limits of performance in very tight time constraints. I'm not excusing these problems, only offering explanation. The industry needs to slow it down a little and focus on quality more.

Sybaris_Caesar · Sep 26, 2020

I don't get it tbh. POSCAP is supposedly more expensive than MLCC (per that reddit post). So supposedly overbuilt cards aren't performing as intended or something? But damn, people are gonna run towards ASUS now. Both Strix and cheaper TUF use all-MLCC design.

lexluthermiester · Sep 26, 2020

Khonjel said:
I don't get it tbh. POSCAP is supposedly more expensive than MLCC (per that reddit post).

The Reddit post was wrong. The whole process of mounting the smaller components is a more expensive one. The components themselves are not all that expensive it's just getting them soldered on that presents the more involved process.

blobster21 · Sep 26, 2020

Khonjel said:
I don't get it tbh. POSCAP is supposedly more expensive than MLCC (per that reddit post). So supposedly overbuilt cards aren't performing as intended or something? But damn, people are gonna run towards ASUS now. Both Strix and cheaper TUF use all-MLCC design.

No, it's not THAT easy. If anything, those components will to the job within their respective operating range nicely. It's just that the gpu boost is too much to handle for them. 2 contributors wrote it already :

Mirrormaster85 said:
concluding from this that a POSCAP = bad and MLCC = good is waaay to harsh and a conclusion you cannot make.

Both POSCAPS (or any other 'solid polymer caps' and MLCC's have there own characteristics and use cases.

Some (not all) are ('+' = pos, '-' = neg):
MLCC:
+ cheap
+ small
+ high voltage rating in small package
+ high current rating
+ high temperature rating
+ high capacitance in small package
+ good at high frequencies
- prone to cracking
- prone to piezo effect
- bad temperature characteristics
- DC bias (capacitance changes a lot under different voltages)

POSCAP:
- more expensive
- bigger
- lower voltage rating
+ high current rating
+ high temperature rating
- less good at high frequencies
+ mechanically very strong (no MLCC cracking)
+ not prone to piezo effect
+ very stable over temperature
+ no DC bias (capacitance very stable at different voltages)

TiN said:
Just replacing everything with MLCCs will NOT help the design to reach higher speeds and stability. Why? Because one need to use all different caps in tandem, as their frequency response is different, as well as ESR, ESL and other factors.

Having everything with MLCC like glorified asus does means you have single deep resonance notch, instead of two less prominent notches when use MLCC+POSCAP together. Using three kinds, smaller POSCAP, bigger POSCAP, and some MLCCs gives better figure with 3 notches.. But again, with modern DC-DC controllers lot of this can be tuned from PID control and converter slew rate tweaks. This adjustability is one of big reasons why enthusiast cards often use "digital" that allows tweaking almost on the fly for such parameters. However this is almost never exposed to user, as wrong settings can easily make power phases go brrrrrr with smokes. Don't ask me how I know...

Before looking onto poor 6 capacitors behind the die - why nobody talks about huge POSCAP capacitor bank behind VRM on FE card, eh? Custom AIB cards don't have that, just usual array without much of bulk capacitance. If I'd be designing a card, I'd look on a GPU's power demands and then add enough bulk capacitance first to make sure of good power impedance margin at mid-frequency ranges, while worrying about capacitors for high-frequency decoupling later, as that is relatively easier job to tweak.

After all these wild theories are easy to test, no need any engineering education to prove this wrong or right. Take "bad" crashing card with "bad POSCAPs", test it to confirm crashes... Then desolder "bad POSCAPs", put bunch of 47uF low-ESR MLCCs instead, and test again if its "fixed". Something tells me that it would not be such a simple case and card may still crash, heh. ;-)

Processor	Ryzen 7 5800X3D
Motherboard	Gigabyte X570 Aorus Elite
Cooling	Thermalright Phantom Spirit 120 SE
Memory	2x16 GB Crucial Ballistix 3600 CL16 Rev E @ 3600 CL14
Video Card(s)	RTX3080 Ti FE
Storage	SX8200 Pro 1 TB, Plextor M6Pro 256 GB, WD Blue 2TB
Display(s)	LG 34GN850P-B
Case	SilverStone Primera PM01 RGB
Audio Device(s)	SoundBlaster G6 \| Fidelio X2 \| Sennheiser 6XX
Power Supply	SeaSonic Focus Plus Gold 750W
Mouse	Endgame Gear XM1R
Keyboard	Wooting Two HE

System Name	Home
Processor	Ryzen 3600X
Motherboard	MSI Tomahawk 450 MAX
Cooling	Noctua NH-U14S
Memory	16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s)	MSI RX 5700XT EVOKE OC
Storage	Samsung 970 PRO 512 GB
Display(s)	ASUS VA326HR + MSI Optix G24C4
Case	MSI - MAG Forge 100M
Power Supply	Aerocool Lux RGB M 650W

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

System Name	Home
Processor	Ryzen 3600X
Motherboard	MSI Tomahawk 450 MAX
Cooling	Noctua NH-U14S
Memory	16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s)	MSI RX 5700XT EVOKE OC
Storage	Samsung 970 PRO 512 GB
Display(s)	ASUS VA326HR + MSI Optix G24C4
Case	MSI - MAG Forge 100M
Power Supply	Aerocool Lux RGB M 650W

System Name	Daves
Processor	AMD Ryzen 3900x
Motherboard	AsRock X570 Taichi
Cooling	Enermax LIQMAX III 360
Memory	32 GiG Team Group B Die 3600
Video Card(s)	Powercolor 5700 xt Red Devil
Storage	Crucial MX 500 SSD and Intel P660 NVME 2TB for games
Display(s)	Acer 144htz 27in. 2560x1440
Case	Phanteks P600S
Audio Device(s)	N/A
Power Supply	Corsair RM 750
Mouse	EVGA
Keyboard	Corsair Strafe
Software	Windows 10 Pro

System Name	DESKTOP-DFEPB9I :D
Processor	Ryzen 7 5800X @ 4.5GHz all cores full time, 1.225V
Motherboard	ROG Strix B550 E - gaming
Cooling	Silverstone Permafrost 240mm Radiator
Memory	32GB Corsair Vengeance Pro 3600MHz (4x8GB)
Video Card(s)	Zotac RTX 3080 Trinity
Storage	Primary: Samsung Evo 970 500GB,Secondary: Samsung Evo 970 500GB + Samsung Evo 850 250GB
Display(s)	Gigabyte G27Q 27" 1440p 144Hz 1ms IPS, FreeSync Premium + Gsync
Case	Antec DP502 FLUX
Audio Device(s)	Asus Supreme FX on board
Power Supply	Corsair RM750W (2019)
Mouse	Corsair M65 Pro RGB
Keyboard	Corsair K70 V2 Cherry MX Brown
Software	Windows10 Pro x64

System Name	Harm's Rig's
Processor	5950X /2700x / AMD 8370e 4500
Motherboard	ASUS DARK HERO / ASRock B550 Phantom Gaming 4
Cooling	Arctic Liquid Freezer III 420 Push/Pull -6 Noctua NF-A14 i and 6 Noctua NF-A14 i Meshify 2 XL
Memory	CORSAIR Vengeance RGB RT 32GB (4x16GB) DDR4 4266cl16 - Patriot Viper Steel DDR4 16GB (4x 8GB)
Video Card(s)	ZOTAC AMP EXTREME AIRO 4090 / 1080 Ti /290X CFX
Storage	SAMSUNG 980 PRO SSD 1TB/ WD DARK 770 2TB , Sabrent NVMe 512GB / 1 SSD 250GB / 1 HHD 3 TB
Display(s)	Thermal Grizzly WireView / TCL 646 55 TV / 50 Xfinity Hisense A6 XUMO TV
Case	Meshify 2 XL- TT 37 VIEW 200MM'S-ARTIC P14MAX
Audio Device(s)	Sharp Aquos
Power Supply	Seasonic Prime TX-1600 ATX3.1 \| Fully FSP Hydro PTM PRO 1200W ATX 3.0 PCI-E GEN-5 80 Plus Platinum -
Mouse	G502 - PS5 DualSense
Keyboard	G413-PS5 DualSense

System Name	SYBARIS
Processor	AMD Ryzen 5 3600
Motherboard	MSI Arsenal Gaming B450 Tomahawk
Cooling	Cryorig H7 Quad Lumi
Memory	Team T-Force Delta RGB 2x8GB 3200CL16
Video Card(s)	Colorful GeForce RTX 2060 6GV2
Storage	Crucial MX500 500GB \| WD Black WD1003FZEX 1TB \| Seagate ST1000LM024 1TB \| WD My Passport Slim 1TB
Display(s)	AOC 24G2 24" 144hz IPS
Case	Montech Air ARGB
Audio Device(s)	Massdrop + Sennheiser PC37X \| Koss KSC75
Power Supply	Corsair CX650-F
Mouse	Razer Viper Mini \| Cooler Master MM711 \| Logitech G102 \| Logitech G402
Keyboard	Drop + The Lord of the Rings Dwarvish
Software	Tiny11 Windows 11 Education 24H2 x64