• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Confusing Results from MODS (MATS) while Trying to Diagnose my GTX 1080

Joined
Jun 27, 2022
Messages
9 (0.01/day)
Video Card(s) EVGA GTX 1080 SC ACX 3.0
I have an EVGA GTX 1080 SC, which will turn on and light up, but will only output a black screen. When I turn the computer on, the card will not output a signal for about 30 seconds, then output a black screen. From what I've looked up, this is indicative of a memory problem.

Regardless, I tried to diagnose the problem using MODS (MATS). However, after running the program twice, the results I got confused me. I got all read errors on every single memory module, with no write or unknown errors. I looked this up, and it doesn't seem like anyone's ever had this issue. I'm not sure what it is. Would anyone be able to help, or know what this is?

Thanks for your help! This has really been bugging me for the last while.
 

Attachments

  • report.txt
    103.5 KB · Views: 1,379
Joined
Nov 24, 2018
Messages
2,109 (1.07/day)
Location
south wales uk
System Name 1.FortySe7en VR rig 2. intel teliscope rig 3.MSI GP72MVR Leopard Pro .E-52699, Xeon play thing
Processor 1.3900x @stock 2. i7 7700k @5. 3. i7 7700hq
Motherboard 1.aorus x570 ultra 2. z270 Maximus IX Hero,4 MR9A PRO ATX X99
Cooling 1.Hard tube loop, cpu and gpu 2. Hard loop cpu and gpu 4 360 AIO
Memory 1.Gskill neo @3600 32gb 2.hyperxfury 32gb @3000 3. 16gb hyperx @2400 4 64GB 2133 in quad channel
Video Card(s) 1.GIGABYTE RTX 3080 WaterForce WB 2. Aorus RTX2080 3. 1060 3gb. 4 Arc 770LE 16 gb
Storage 1 M.2 500gb , 2 3tb HDs 2. 256gb ssd, 3tbHD 3. 256 m.2. 1tb ssd 4. 2gb ssd
Display(s) 1.LG 50" UHD , 2 MSI Optix MAG342C UWHD. 3.17" 120 hz display 4. Acer Preditor 144hz 32inch.z
Case 1. Thermaltake P5 2. Thermaltake P3 4. some cheapo case that should not be named.
Audio Device(s) 1 Onboard 2 Onboard 3 Onboard 4. onboard.
Power Supply 1.seasonic gx 850w 2. seasonic gx 750w. 4 RM850w
Mouse 1 ROG Gladius 2 Corsair m65 pro
Keyboard 1. ROG Strix Flare 2. Corsair F75 RBG 3. steelseries RBG
VR HMD rift and rift S and Quest 2.
Software 1. win11 pro 2. win11 pro 3, win11 home 4 win11 pro
Benchmark Scores 1.7821 cb20 ,cb15 3442 1c 204 cpu-z 1c 539 12c 8847 2. 1106 cb 3.cb 970
have you try installing drivers after cleaning the old ones out with NVCleanstall_?.
 
Joined
Jun 27, 2022
Messages
9 (0.01/day)
Video Card(s) EVGA GTX 1080 SC ACX 3.0
Thanks for the response!

The current PC I have has no integrated graphics, nor does it have a second PCIe slot, so I'd have a bit of trouble trying to get those drivers installed with a completely black screen. I ran MODS (MATS) without video by having the PC shutdown after the tests were complete, so I'd know. However, I'll see if I can get another PC to test it on.

Though, I am getting a black screen from the moment the PC is on - so I don't see BIOS or anything. This leads me to believe its not a driver issue - but again, I'll make sure to get another PC to test that on.
 
Joined
Nov 24, 2018
Messages
2,109 (1.07/day)
Location
south wales uk
System Name 1.FortySe7en VR rig 2. intel teliscope rig 3.MSI GP72MVR Leopard Pro .E-52699, Xeon play thing
Processor 1.3900x @stock 2. i7 7700k @5. 3. i7 7700hq
Motherboard 1.aorus x570 ultra 2. z270 Maximus IX Hero,4 MR9A PRO ATX X99
Cooling 1.Hard tube loop, cpu and gpu 2. Hard loop cpu and gpu 4 360 AIO
Memory 1.Gskill neo @3600 32gb 2.hyperxfury 32gb @3000 3. 16gb hyperx @2400 4 64GB 2133 in quad channel
Video Card(s) 1.GIGABYTE RTX 3080 WaterForce WB 2. Aorus RTX2080 3. 1060 3gb. 4 Arc 770LE 16 gb
Storage 1 M.2 500gb , 2 3tb HDs 2. 256gb ssd, 3tbHD 3. 256 m.2. 1tb ssd 4. 2gb ssd
Display(s) 1.LG 50" UHD , 2 MSI Optix MAG342C UWHD. 3.17" 120 hz display 4. Acer Preditor 144hz 32inch.z
Case 1. Thermaltake P5 2. Thermaltake P3 4. some cheapo case that should not be named.
Audio Device(s) 1 Onboard 2 Onboard 3 Onboard 4. onboard.
Power Supply 1.seasonic gx 850w 2. seasonic gx 750w. 4 RM850w
Mouse 1 ROG Gladius 2 Corsair m65 pro
Keyboard 1. ROG Strix Flare 2. Corsair F75 RBG 3. steelseries RBG
VR HMD rift and rift S and Quest 2.
Software 1. win11 pro 2. win11 pro 3, win11 home 4 win11 pro
Benchmark Scores 1.7821 cb20 ,cb15 3442 1c 204 cpu-z 1c 539 12c 8847 2. 1106 cb 3.cb 970
Last edited:
Joined
Jun 27, 2022
Messages
9 (0.01/day)
Video Card(s) EVGA GTX 1080 SC ACX 3.0
No worries. I have an HD7970 that I used to put MODS (MATS) on a USB drive, then switched GPUs and ran MODS (MATS) off the USB.

I'm hoping it's either I did something wrong with MODS (MATS), or its some sort of GPU BIOS issue. It's odd, like you're saying, to see just entirely read errors.

-- -- -- --

Update:

I was able to get another PC to test it on.

The first PC (test RIG) has:
CPU: AMD Athlon X4 860K (no integrated graphics)
Motherboard: ASUS A78M-E (no extra PCIe slots)
RAM: 8GB DDR3 ADATA XPG (2x4GB)
PSU: 600W ATNG (a pretty cheap non-name-brand PSU)
Drive: Samsung 128GB SATA SSD (81.8GB free)
Other GPU: HD7970

My second PC has:
CPU: Intel i5-6600k
Motherboard: MSI Z170A Krait Gaming 3X
RAM: 16GB SiliconPower (2x8GB)
PSU: 850W Seagate (A few years old, good condition)
Drives: 256GB SiliconPower m.2 NVME (135GB free), 1TB Western Digital HDD (904GB free)
Other GPU: RX 480 4GB

I can boot the PC into safe mode.

ok mate i thought with running mats you had the bios. its a rare thing to have all you ram have errors thats what pointed me towards drivers.
this guy is a god :) with gpus Guide - Diagnosing memory corruption on select Nvidia GPUs using MATS - YouTube
I took a look, though his MODS (MATS) errors seems to be showing write errors, not read errors. I can't seem to find anyone getting those. Though I appreciate the help!

-- -- -- -- --

Update 2:
xtreemchaos, after double-watching the video, I see he actually *did* have all read errors, just like me. Additionally, like in his video, when I checked the graphics card with GPU-Z, I get the same "unknown BIOS" message (I'll post a picture), and am getting the same Error 43. I even have the same card (EVGA GTX 1080 SC).

To try and see if it was just a bad BIOS, I reflashed the BIOS using NVFlash. (I'll send a picture of that, too).

The similarities are striking... do you think it's the same problem? A bad BIOS chip? If so, where can I get a new BIOS chip that I can solder in? I looked on eBay, but the listings just didn't give enough info for me to seem super sure what was the right chip. It seems the 1050-1080 have the same chip? Are all BIOS chips the same for all manufacturers?

Thank you guys again for all your help! At least I'm making some progress.
 

Attachments

  • NVFlash.jpeg
    NVFlash.jpeg
    2 MB · Views: 1,037
  • GPUz.png
    GPUz.png
    396.7 KB · Views: 1,184
Last edited:

majaha

New Member
Joined
Jul 10, 2022
Messages
4 (0.01/day)
Hey Gatorfan, did you even make any progress on this?
I ask because I have a faulty graphics card too that has similar symptoms when running MATS. One particular similarity that sticks out to me are these lines in report.txt:
Code:
   ADDRESS EXPECTED   ACTUAL  REREAD1  REREAD2 FAILBITS TPSBE  ROW COL                                                                                              BIT(s)
   ------- --------   ------  -------  ------- -------- -----  --- ---                                                                                              ------
000135fcbc 00000000 bad0aca2 bad0aca4 bad0aca3 bad0aca2 RD1f0 0000 000                          D033,D037,D039,D042,D043,D045,D047,D052,D054,D055,D057,D059,D060,D061,D063
000135fcb8 00000000 bad0aca5 bad0aca7 bad0aca6 bad0aca5 RD1f0 0000 000                     D032,D034,D037,D039,D042,D043,D045,D047,D052,D054,D055,D057,D059,D060,D061,D063
000135fcb4 00000000 bad0aca8 bad0acaa bad0aca9 bad0aca8 RD1f0 0000 000                          D035,D037,D039,D042,D043,D045,D047,D052,D054,D055,D057,D059,D060,D061,D063
000135fcb0 00000000 bad0acab bad0acad bad0acac bad0acab RD1f0 0000 000                D032,D033,D035,D037,D039,D042,D043,D045,D047,D052,D054,D055,D057,D059,D060,D061,D063
000135fcac 00000000 bad0acae bad0acb0 bad0acaf bad0acae RD1f0 0000 000

I'm getting those exact same "bad0acxx" read values, and that looks a lot like some kind of debug or error substitution value: "Bad ac" is in there in hex (short for bad access?), and the other numbers are just counting up, if you swap REREAD1 and REREAD2. The question is, what part of the system is giving those values, and what does it mean?
 
Last edited:
Joined
Jun 27, 2022
Messages
9 (0.01/day)
Video Card(s) EVGA GTX 1080 SC ACX 3.0
Wow! So it DOES happen to other people!

But, I am still working on the problem. From the video above, it seems like a possible candidate for me could be the BIOS chip. I purchased two un-flashed BIOS chips from the UK and they're shipping in. I hope to solder them on, and reflash it on the board like it was a super-corrupted BIOS chip. After that, I'm not sure quite what I'll do... It'd probably be bad silicon at that point (a bad GPU die).

For you, I think you're about as far as I would be (if not farther!) The question is, as you said what part of the system is giving those values... Is the GPU ever recognized by the system? Can you read the BIOS off of the chip with GPUz? (and what information does the program tell you, and not tell you?). I'll be honest, I'm not the most knowledgeable when it comes to fixing super-deep issues with GPUs, but I'm happy to help where I can with the journey!


Also, odd question, but does anyone know the specifications/model number for the surface-mount resistors/capacitors around a GTX 1080 (GP104) die? I need to order some replacements.
 

majaha

New Member
Joined
Jul 10, 2022
Messages
4 (0.01/day)
I've made some progress on my card, and I've figured some things out pertaining to the bad0acXX thing. On my card:
  • I only see 0xbad0acXX read errors in MATS on memory adresses larger than about 464MB. i.e. running
    Code:
    ./mats -b 467 -e 470
    gives bad0ac reads, but lower values don't.
    I found the crossover point just by trial and error.
  • Mats won't show other errors until I run a proper MODS test e.g.
    Code:
    ./runmods gputest.js -test 118 -oqa
    which seems to activate the card or the drivers in some way, or stresses it enough that the errors begin to show up. After that, running
    Code:
    ./mats -b 0 -e 3
    gives lots of write errors on on particular memory bank and none on the others, like you would expect with a failing chip.
  • Running MATS with memory ranges that are 2MB or less (and not in the bad0ac range) e.g.
    Code:
    ./mats -b 0 -e 2
    always passes, presumably because then the test fits completely inside the GPUs 2MB memory cache and doesn't get read or written to the memory banks. Just something to be aware of when testing.
That's as far as I've got with my card, hopefully this gives you some hints or ideas with yours :)
 
Joined
Jun 27, 2022
Messages
9 (0.01/day)
Video Card(s) EVGA GTX 1080 SC ACX 3.0
That's excellent information! I'll be testing my own card with that info to see if it gives me anything new. I appreciate you updating me! Hopefully this can help someone else in the future, too.

After retesting, it unfortunately seems I'm getting the same results. My original tests were at 20MB, so not too near 464MB. However, I changed the latest test to be 2MB, just out of curiosity to see if I'd get any change, and I unfortunately did not. I'll attach the report just to show.

I saw, however, that the report at the bottom says, "If you are getting failure for first MB of FB then try option -no_scan_out". Is this any bit useful?
Where would I put this command? (before or after "$LOCATION/$PKGNAME/mats" -e 2 ?)

Thanks again
 

Attachments

  • report.txt
    103 KB · Views: 294
Last edited:

majaha

New Member
Joined
Jul 10, 2022
Messages
4 (0.01/day)
Are you familiar with linux at all? If I were you, I'd try to test the card interactively by plugging your monitor into the motherboard.

Then you can follow the advice here, under the heading "Using MATS with a card that has no output" https://repair.wiki/w/Nvidia_Memory_Testing_Guide

Play around with running mats with different values for -b (the beginning point in MB) and -e (the end point in MB). Adding "-c 1" tests only 1% of the memory, making the test much quicker. Also try running the mods gputest.js beforehand to see if that changes things (I found I could cancel it with Ctrl-C pretty soon after it had started and that would make a difference to subsequent runs of mats).
 
Joined
Jun 27, 2022
Messages
9 (0.01/day)
Video Card(s) EVGA GTX 1080 SC ACX 3.0
I am not tremendously familiar with linux, but here's what I got:

I played around with MATS and MODS, and with the ./mods gputest.js -skip_rm_state_init -mfg command, I would only get "GpuDevMgr not initialized. Error Code = 000000000818 (Mods detected an assertion failure)". It recognized the card, as in it showed the device ID as a GP104, but that's as far as it would get.

With the MATS commands, I tried all sorts of tests. I tried low-memory tests (./mats -n 1 -e 2), I tried playing around with random sections of the memory (-b 3999 -e 4000), I tried larger memories, and doing one percent of larger memories (-c). These, similarly, seemed to recognize the card, as it showed it was testing a GP104, but every test gave me the same Error Code = 00000001 failure, with bad0acX errors, like was seen in the logs.
 
Joined
Jun 27, 2022
Messages
9 (0.01/day)
Video Card(s) EVGA GTX 1080 SC ACX 3.0
I did a very basic test of some resistances - but I could certainly do a more in-depth check. I'll update you on what I get!

In the very least, I'm learning a lot :)
 

OneMoar

There is Always Moar
Joined
Apr 9, 2010
Messages
8,744 (1.71/day)
Location
Rochester area
System Name RPC MK2.5
Processor Ryzen 5800x
Motherboard Gigabyte Aorus Pro V2
Cooling Enermax ETX-T50RGB
Memory CL16 BL2K16G36C16U4RL 3600 1:1 micron e-die
Video Card(s) GIGABYTE RTX 3070 Ti GAMING OC
Storage ADATA SX8200PRO NVME 512GB, Intel 545s 500GBSSD, ADATA SU800 SSD, 3TB Spinner
Display(s) LG Ultra Gear 32 1440p 165hz Dell 1440p 75hz
Case Phanteks P300 /w 300A front panel conversion
Audio Device(s) onboard
Power Supply SeaSonic Focus+ Platinum 750W
Mouse Kone burst Pro
Keyboard EVGA Z15
Software Windows 11 +startisallback
dead card is dead unless you have surface mount and bga experience


as bones would say He's dead jim ....
 
Joined
Jun 27, 2022
Messages
9 (0.01/day)
Video Card(s) EVGA GTX 1080 SC ACX 3.0
Well, I hope to try and investigate everything before I have to lay it to rest. In the least, at least I'll know a bit more about troubleshooting GPUs. Regardless, I fear you're probably right, but maybe I'll be lucky - I appreciate the input!
 

OneMoar

There is Always Moar
Joined
Apr 9, 2010
Messages
8,744 (1.71/day)
Location
Rochester area
System Name RPC MK2.5
Processor Ryzen 5800x
Motherboard Gigabyte Aorus Pro V2
Cooling Enermax ETX-T50RGB
Memory CL16 BL2K16G36C16U4RL 3600 1:1 micron e-die
Video Card(s) GIGABYTE RTX 3070 Ti GAMING OC
Storage ADATA SX8200PRO NVME 512GB, Intel 545s 500GBSSD, ADATA SU800 SSD, 3TB Spinner
Display(s) LG Ultra Gear 32 1440p 165hz Dell 1440p 75hz
Case Phanteks P300 /w 300A front panel conversion
Audio Device(s) onboard
Power Supply SeaSonic Focus+ Platinum 750W
Mouse Kone burst Pro
Keyboard EVGA Z15
Software Windows 11 +startisallback
Component failure such as memory or graphics core is extremely rare more than likely it's power related you can check buildzoids videos on YouTube and see if he has a video of your card it might give you an idea of where to start probing for power
 
Top