- Joined
- Apr 9, 2019
- Messages
- 14 (0.01/day)
System Name | Custom built by Me |
---|---|
Processor | Xeon E3-1230v2 |
Motherboard | Asus P8z77-v-lk |
Cooling | Fans |
Memory | 8GB DDR3 1333 |
Video Card(s) | EVGA 650 TI Boost superclocked |
Storage | 6.75 TB with 6 hard drives boot drive being a samsung m.2 256GB SATA III SSD |
Display(s) | 2 displays 1) Samsung 1)Dell |
Case | Coller Master HAF |
Audio Device(s) | 2) Kanto YU2 speakers 1) Yamaha 70W Active sub woofer 2) Bose rear speakers |
Power Supply | Cooler Master GTX 550 |
Mouse | Microsoft wireless mobile mouse 4000 |
Keyboard | Dell |
Benchmark Scores | CPU-z multi thread 1769 single thread 367 Passmark 3685.1 UserBench: Game 28%, Desk 62%, Work 40% |
For those of you who are reading this a second of third time, you will find that the complete story has changed some what compared to the original story that I posted last night. I' am sorry for the change, but after conversing with others on this forum, I just found it very hard to keep the same questions now that I know the answers to them so as much as I fought with myself to maintain the same questions and to only clean it up a bit since my first edition was so sloppilly written that I was forced to correct it,(that's not saying anything bad on those who wished me to correct it, that just how bad my original message was written.) that I felt obligated to do so, and while doing so this version somehow landed on the screen instead I am over tired now, and I think I went into this kinda like in auto pilot mode (reference the the movie "Click" if you have seen it) and out comes this totally different take on the same exact system but with more insite to the issue that I do believe we will discovered as the final outcome reveals itself in the near future. So the story morphed into this thing now. all that i wrote is true, just a different perspective of it. Id like to claim it isn't my fault but it all is my fault. as I'm the only person who has made any changes to it. I didn't start out trying to change it really I didn't , but the more I proof read it, the more shit I kept changing until it turned into what it is now, and I'm too damned tired now to change it back again. I'm kinda stuck as I was just going to to hit the cancle button because I'm no longer happy with this version either, but that would only make the original version come back again, and even I can't tolerate reading it. at least this version has paragrahps in it. so between the two this has to be an improvement, just not sure of its purpose, of if its message still can be figfured out. and it if made any sense. Again sorry but something has to step on the orignal message in order to get rid of it, so this is it.
Before I get into this issue here are the specs:
* Asus Crosshair VIII formula MB
* 2 NVMe SSD's connected to both NVMe slots on the motherboard. Samsung 980 Pro's if I remember correctly
* Ryzen 5950x CPU
* 128GB G.Skill trident Neon Ram
* 1600 Watt EVGA Power supply
* 2x Asus ROG Strix RTX 3090's in SLI mode, Initially came to me unbridged, as just separate cards, I then placed the bridge onto the cards and since then tried to turn on SLI but could never get that far
* Equipped with an EK custom water cooler kit
* Case has 3-3 fan radiators, and 1 single fan radiator. So total of 4 radiators all together, or 10 sing fan radiators. - Plenty of cooling power.
* Loop travels from water pump to side mounted 3 fan radiator, then through front mounted 3 fan radiator, from there through both video cards, then thru the CPU water block, then through rear single fan radiator, then finally thru top mounted 3 fan radiator before returning back to the pump reservoir.
* All parts brand new, except for the 2 video cards, they were purchased from a friend of a friend, but as I see it, they can't have too many hours on the clock being as new as they are, or could they?
This is a customer of mine who brought in his custom built rig due to having issues with it booting up. and I've pretty much eliminated it down to the video cards. Here's where I am with the troubleshooting:
If I boot up into safe mode, the machine will boot, I log into Windows 10, then device manager/Display adapters. I remove the Nvidia drivers then reboot. this will set the "Microsoft basic display drivers" as default. If running with these drivers, the machine works all day long with no issues. Of course this is like driving your car in "limp home mode" and there's no using any of the features that the 3090's has to offer. No different than if I was running an ATI video card from 10 to 15 years ago, or has it been longer than that? . As soon as either windows automatically installs, or I manually install NVidia's drivers, the display flashes as normal, comes back on for a few seconds, then I get a black/blank screen, the whole system appears locked up and I do not ever get a BSOD. Nothing that I can see in the event logs that points to the error.
I've performed all your general trouble-shooting techniques that I know of.
1)I've tried a slew of different versions of the Nvidia drivers. The cards are both ROG Strix gamming 3090's, so I also tried Asus's video drivers.
2) I've removed the SLI bridge and tried them as separate cards.
3) I've performed a clean install of Windows 10, I've performed a clean install of Windows 11, with both installs, nothing else was installed but just the Nvidia drivers to eliminate any 3rd party apps or drivers. 4) I've updated the Motherboard's BIOS to the most current BIOS,
5) Video card's BIOS was updated to ver 4. I can't remember what version they were prior to the update to 4, but it was below v3 I do know that.
6) When v4 did not seem to correct the issue I tried back-flashing to ver 3 but that was no allowed. And I don't believe it would have made a difference anyway.
7) I've updated every driver in the system to most current,
8) I've enabled then disabled resize Bar settings in BIOS
9) I've tried slowing the PCIe bus down to Gen 3 but normally it was set for PCIe 4 8x8
10 Actually the two video cards as well as the two NVMes were all set on auto as I received the machine, and I had made the changes from PCIe-4/or auto to PCIe 3 on both the video cards and the SSD's just out of curiosity to see it that made any difference. it did not. Then first went back to auto on the SSD's but left the video cards on pcie-3 then finally all parts back to auto which places the video cards into PCIe-4 mode, not sure of which mode the ssd's went into on auto, but not interested in that anyway as they seem to be working just fine.
11) played around with several settings within the BIOS until I told myself, who am I kidding with, simply setting the settings to default should at least be enough to boot the machine and every piece of hardware attached should function in their proper mode then. May not be the most optimal, but should at least be enough for troubleshooting purposes.
Nothing so far helped.
on my customer's first visit after dropping off the machine (And per his request) he did not want me to take apart his cooling system, I have already told him I've not been exposed to custom water cooling yet, that as far as water cooling goes, I have an AIO in my personal machine and that is my extent I have been exposed to so far. This being my first custom water cooling system I've worked on, he felt better and so did I, that he actually be present when or if we decide to remove it to be able to pull these two video cards out. And on the other hand, this was my customer's very first PC build, so he had more experience than I with the plumbing, but when it came to troubleshooting, he was at a loss. So i couldn't do much without his presence on this issue so we had to time these visits where he and I both could arrange for the time to do this together Just so everyone is on the same page and not wondering why I wasn't going ape shit on this thing in a few days. it has taken a few weeks to have been able to have two visits with the both of us present at the same time. This is also when he explained to me for the second time how much trouble he had working with the thermal pads he had purchased, or had extra, or had to replace, due to the original ones crushing on him. And this is why he felt so strongly to be present, because he knew just how hard it was for him to deal with and knowing that I haven't been exposed to that type of thing made him to want to be there when or if we had to open the water blocks. And by the sounds of all that, I had no problem with that request either, as I felt I was going to be taught hands on, about custom water cooling kinda like a crash course and that's all it'll take for me to get the hang of it anyways, so I wasn't in the least bit offended about his request, and sorta felt relieved actually that he requested that it be that way. After all, the guy seemed sharp, and somewhat knowledgeable for the little amount of experience he did have. So if it it's going to be that way where a kid twice as young as I teaches me the workings of custom water cooling I'm not too proud to turn that down. I'll take it anyway I can get it. And he's going to pay me to boot, so why the heck object to that?
I know what thermal pads are, and have worked with them in the past on laptops and sometimes on M.2 SSD's but never experienced any of them ever "crushing" on me, so I wasn't exactly sure what he meant by this unless these pads are somehow of a different type perhaps? But the way I saw it was, hey you can explain that to me better once we open up these water blocks, and have a look inside, but we're not there yet, all we're doing on this visit is seeing how a single 3090 works alone in the machine. but just the same i took note of that, and he began to mention that perhaps this may be why these cards are acting this way.
Next step move was to remove the cards, Customer came back over for a second visit, he brought with him his (spare RTX-2080) So we carefully drained the loop, then removed both video cards from the system and replaced them with a single RTX-2080, with factory cooling fans in place, we jumped the cooling blocks of both video cards where the water cooler was only cooling down the CPU, then fired up the system, system started with the Microsoft basic drivers, and before I could manually install Nvidia drivers, from nvidia's website, Windows had already downloaded then installed what it thought the most current drivers should be. At this point I wasn't to interested as to which version of Nvidia drivers it chose, I was simply interested in whether the system would be stable with any Nvidia driver loaded. So I never checked the version and frankly to me, it didn't matter, the video card took the driver and loaded it and maintained stability, and that is all we wanted to know with the 2080. So this proves that the motherboard is good right?
So that brought our focus to the two RTX-3090's. Before the swap out of the video cards, we wasn't sure if the problem was coming from the motherboard (more likely) or the two video cards (Less likely, or perhaps we were both wishfully thinking, due to the warranty still being good on the motherboard, and not so with the video cards). So accepting the motherboard is working ok, that's an easy acceptance no problem there at all. However by we accepting that truth, we also accept the what has failed is actually two independent video cards both failing at the same exact time, with the exact same failures, showing the exact same symptoms of failure, and that is a harder for me to swallow. Or could it be just one of the two cards but since both are installed could one being broken possibly bleed into the other card, making it "act" broken as well.? The fact that they were first brought to me NOT in SLI mode but independently operating as two separate cards tends to have me thinking if either one was working and the other not, simply moving the display cable to the other card would have exposed the good card, however since if either card causes the machine to lock-up there's really no way in telling 100% for sure yet until we try each card alone in the machine. With me so far? Good.
Even though they were in the same machine, electronically speaking, they could have been miles apart from each other assuming nothing is wrong with the power supply, and since the one RTX-2080 came out without a scratch we can safely assume the power supply is working properly, right? After all it is A 1600 WATT P/S I almost got sstuck with that just niow thinking well, you are ruinning to high power cards...but no 1600 watts is P.e.n.t.y. of power with amps to spare so then that leads me to question this, what are the odds of both video cards failing at the same exact time, with the same exact failure, with the same exact failed symptoms? I'm no mathematician, but I can surely guess that those odds are awfully low. Just a hunch but I could guess probably somewhere between getting hit by lightning or winning the lottery perhaps? maybe not *that* low but you get my point right? So We took 1 of the 2 3090's out of the system leaving only 1 3090 in the PCIe-1 position, and tried the system with only one card. We were also pressed for time at this point due to the time it takes to drain the loop then refill it, along with calls I had been receiving for the day, and other distractions, time just slipped passed us and was getting late. So we decided that if the first card showed signs of life we would then do the other card to see what happens. However the card performed the same as one card, as it did with both cards, so we wasn't to excited, to re-drain the loop once again and swap to the other card and then refill the loop to see if that card performed any better. The wind was knocked out of our sails. when the first card we tested on its own performed no better than when it held both cards in its slots. I haven't given up yet, but I am kicking rocks as I see my customer out the door. I told him hey we're getting closer to the answer, just one more visit and I think we'll have this thing licked. I just wasn't sure if I was convincing myself, much less my customer. And that just aggravated me.
So that is where we are at this point without trying the second card to see how it behaves. Now I will be the first to accept the results of this second card if it proves to be bad also, but I have to go thru with this because otherwise I'll never be sure that both of these cards have failed together at the same time. I have more to say, but I feel this is a good stopping point cause I have shit to do today. and one of those things is work. So off to work I go although I'm not really going anywhere as I work from home, so off to work I stay but just flip the closed sign around to open I suppose. Meanwhile you guru's out there should make some bets on how this second card is going to behave. I really have a hard time accepting they are both bad, but not so much that I'd put any real money on it. I 'm not that stubborn.
Before I get into this issue here are the specs:
* Asus Crosshair VIII formula MB
* 2 NVMe SSD's connected to both NVMe slots on the motherboard. Samsung 980 Pro's if I remember correctly
* Ryzen 5950x CPU
* 128GB G.Skill trident Neon Ram
* 1600 Watt EVGA Power supply
* 2x Asus ROG Strix RTX 3090's in SLI mode, Initially came to me unbridged, as just separate cards, I then placed the bridge onto the cards and since then tried to turn on SLI but could never get that far
* Equipped with an EK custom water cooler kit
* Case has 3-3 fan radiators, and 1 single fan radiator. So total of 4 radiators all together, or 10 sing fan radiators. - Plenty of cooling power.
* Loop travels from water pump to side mounted 3 fan radiator, then through front mounted 3 fan radiator, from there through both video cards, then thru the CPU water block, then through rear single fan radiator, then finally thru top mounted 3 fan radiator before returning back to the pump reservoir.
* All parts brand new, except for the 2 video cards, they were purchased from a friend of a friend, but as I see it, they can't have too many hours on the clock being as new as they are, or could they?
This is a customer of mine who brought in his custom built rig due to having issues with it booting up. and I've pretty much eliminated it down to the video cards. Here's where I am with the troubleshooting:
If I boot up into safe mode, the machine will boot, I log into Windows 10, then device manager/Display adapters. I remove the Nvidia drivers then reboot. this will set the "Microsoft basic display drivers" as default. If running with these drivers, the machine works all day long with no issues. Of course this is like driving your car in "limp home mode" and there's no using any of the features that the 3090's has to offer. No different than if I was running an ATI video card from 10 to 15 years ago, or has it been longer than that? . As soon as either windows automatically installs, or I manually install NVidia's drivers, the display flashes as normal, comes back on for a few seconds, then I get a black/blank screen, the whole system appears locked up and I do not ever get a BSOD. Nothing that I can see in the event logs that points to the error.
I've performed all your general trouble-shooting techniques that I know of.
1)I've tried a slew of different versions of the Nvidia drivers. The cards are both ROG Strix gamming 3090's, so I also tried Asus's video drivers.
2) I've removed the SLI bridge and tried them as separate cards.
3) I've performed a clean install of Windows 10, I've performed a clean install of Windows 11, with both installs, nothing else was installed but just the Nvidia drivers to eliminate any 3rd party apps or drivers. 4) I've updated the Motherboard's BIOS to the most current BIOS,
5) Video card's BIOS was updated to ver 4. I can't remember what version they were prior to the update to 4, but it was below v3 I do know that.
6) When v4 did not seem to correct the issue I tried back-flashing to ver 3 but that was no allowed. And I don't believe it would have made a difference anyway.
7) I've updated every driver in the system to most current,
8) I've enabled then disabled resize Bar settings in BIOS
9) I've tried slowing the PCIe bus down to Gen 3 but normally it was set for PCIe 4 8x8
10 Actually the two video cards as well as the two NVMes were all set on auto as I received the machine, and I had made the changes from PCIe-4/or auto to PCIe 3 on both the video cards and the SSD's just out of curiosity to see it that made any difference. it did not. Then first went back to auto on the SSD's but left the video cards on pcie-3 then finally all parts back to auto which places the video cards into PCIe-4 mode, not sure of which mode the ssd's went into on auto, but not interested in that anyway as they seem to be working just fine.
11) played around with several settings within the BIOS until I told myself, who am I kidding with, simply setting the settings to default should at least be enough to boot the machine and every piece of hardware attached should function in their proper mode then. May not be the most optimal, but should at least be enough for troubleshooting purposes.
Nothing so far helped.
on my customer's first visit after dropping off the machine (And per his request) he did not want me to take apart his cooling system, I have already told him I've not been exposed to custom water cooling yet, that as far as water cooling goes, I have an AIO in my personal machine and that is my extent I have been exposed to so far. This being my first custom water cooling system I've worked on, he felt better and so did I, that he actually be present when or if we decide to remove it to be able to pull these two video cards out. And on the other hand, this was my customer's very first PC build, so he had more experience than I with the plumbing, but when it came to troubleshooting, he was at a loss. So i couldn't do much without his presence on this issue so we had to time these visits where he and I both could arrange for the time to do this together Just so everyone is on the same page and not wondering why I wasn't going ape shit on this thing in a few days. it has taken a few weeks to have been able to have two visits with the both of us present at the same time. This is also when he explained to me for the second time how much trouble he had working with the thermal pads he had purchased, or had extra, or had to replace, due to the original ones crushing on him. And this is why he felt so strongly to be present, because he knew just how hard it was for him to deal with and knowing that I haven't been exposed to that type of thing made him to want to be there when or if we had to open the water blocks. And by the sounds of all that, I had no problem with that request either, as I felt I was going to be taught hands on, about custom water cooling kinda like a crash course and that's all it'll take for me to get the hang of it anyways, so I wasn't in the least bit offended about his request, and sorta felt relieved actually that he requested that it be that way. After all, the guy seemed sharp, and somewhat knowledgeable for the little amount of experience he did have. So if it it's going to be that way where a kid twice as young as I teaches me the workings of custom water cooling I'm not too proud to turn that down. I'll take it anyway I can get it. And he's going to pay me to boot, so why the heck object to that?
I know what thermal pads are, and have worked with them in the past on laptops and sometimes on M.2 SSD's but never experienced any of them ever "crushing" on me, so I wasn't exactly sure what he meant by this unless these pads are somehow of a different type perhaps? But the way I saw it was, hey you can explain that to me better once we open up these water blocks, and have a look inside, but we're not there yet, all we're doing on this visit is seeing how a single 3090 works alone in the machine. but just the same i took note of that, and he began to mention that perhaps this may be why these cards are acting this way.
Next step move was to remove the cards, Customer came back over for a second visit, he brought with him his (spare RTX-2080) So we carefully drained the loop, then removed both video cards from the system and replaced them with a single RTX-2080, with factory cooling fans in place, we jumped the cooling blocks of both video cards where the water cooler was only cooling down the CPU, then fired up the system, system started with the Microsoft basic drivers, and before I could manually install Nvidia drivers, from nvidia's website, Windows had already downloaded then installed what it thought the most current drivers should be. At this point I wasn't to interested as to which version of Nvidia drivers it chose, I was simply interested in whether the system would be stable with any Nvidia driver loaded. So I never checked the version and frankly to me, it didn't matter, the video card took the driver and loaded it and maintained stability, and that is all we wanted to know with the 2080. So this proves that the motherboard is good right?
So that brought our focus to the two RTX-3090's. Before the swap out of the video cards, we wasn't sure if the problem was coming from the motherboard (more likely) or the two video cards (Less likely, or perhaps we were both wishfully thinking, due to the warranty still being good on the motherboard, and not so with the video cards). So accepting the motherboard is working ok, that's an easy acceptance no problem there at all. However by we accepting that truth, we also accept the what has failed is actually two independent video cards both failing at the same exact time, with the exact same failures, showing the exact same symptoms of failure, and that is a harder for me to swallow. Or could it be just one of the two cards but since both are installed could one being broken possibly bleed into the other card, making it "act" broken as well.? The fact that they were first brought to me NOT in SLI mode but independently operating as two separate cards tends to have me thinking if either one was working and the other not, simply moving the display cable to the other card would have exposed the good card, however since if either card causes the machine to lock-up there's really no way in telling 100% for sure yet until we try each card alone in the machine. With me so far? Good.
Even though they were in the same machine, electronically speaking, they could have been miles apart from each other assuming nothing is wrong with the power supply, and since the one RTX-2080 came out without a scratch we can safely assume the power supply is working properly, right? After all it is A 1600 WATT P/S I almost got sstuck with that just niow thinking well, you are ruinning to high power cards...but no 1600 watts is P.e.n.t.y. of power with amps to spare so then that leads me to question this, what are the odds of both video cards failing at the same exact time, with the same exact failure, with the same exact failed symptoms? I'm no mathematician, but I can surely guess that those odds are awfully low. Just a hunch but I could guess probably somewhere between getting hit by lightning or winning the lottery perhaps? maybe not *that* low but you get my point right? So We took 1 of the 2 3090's out of the system leaving only 1 3090 in the PCIe-1 position, and tried the system with only one card. We were also pressed for time at this point due to the time it takes to drain the loop then refill it, along with calls I had been receiving for the day, and other distractions, time just slipped passed us and was getting late. So we decided that if the first card showed signs of life we would then do the other card to see what happens. However the card performed the same as one card, as it did with both cards, so we wasn't to excited, to re-drain the loop once again and swap to the other card and then refill the loop to see if that card performed any better. The wind was knocked out of our sails. when the first card we tested on its own performed no better than when it held both cards in its slots. I haven't given up yet, but I am kicking rocks as I see my customer out the door. I told him hey we're getting closer to the answer, just one more visit and I think we'll have this thing licked. I just wasn't sure if I was convincing myself, much less my customer. And that just aggravated me.
So that is where we are at this point without trying the second card to see how it behaves. Now I will be the first to accept the results of this second card if it proves to be bad also, but I have to go thru with this because otherwise I'll never be sure that both of these cards have failed together at the same time. I have more to say, but I feel this is a good stopping point cause I have shit to do today. and one of those things is work. So off to work I go although I'm not really going anywhere as I work from home, so off to work I stay but just flip the closed sign around to open I suppose. Meanwhile you guru's out there should make some bets on how this second card is going to behave. I really have a hard time accepting they are both bad, but not so much that I'd put any real money on it. I 'm not that stubborn.
Last edited: