• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

BSOD whea_uncorrectable_error

kuuuuujo

New Member
Joined
Aug 5, 2021
Messages
5 (0.00/day)
I built PC year ago, everything worked up to this point. Started getting BSOD. Always whea_uncorrectable_error with no additional information. It will never crash when it's idle, only when I do something. The biggest issue is that the error is displayed stuck at 0% of saving memory dump and it never dumps it. I tried tips from dozens of posts but I just can't get it to dump, after restart there is never any MEMORY.dmp and the minidump folder is empty. I have the settings correct, tried all types of dumps and increased paging file to be bigger than my ram.

I noticed it always crashes somewhere during, or after SSD test in UserBenchmark, so I use it to debug the problem. I monitor temperatures, but I don't see anything out of the ordinary when it crashes.

Corsair MP600 SSD could be the cause because when I was building PC I removed the original heat sink from it, and used the one from motherboard instead. The temps aren't that high though, it's around 45-50 when it crashes. all the temperatures are in 30-65 range so the PC doesn't seem to overheat that much, but... I temporarily attached a fan to blow the heat away from the SSD and it crashes a bit later than usual, so there is a difference.

What could be the issue here?

So far I:
  • updated bios
  • updated drivers
  • updated windows
  • updated SSD firmware
  • installed the original heatsink with new pad
  • checked SSD health with multiple applications, none reported issues
  • removed Sonic Radar 3 as I heard this causes issues
  • did sfc /scannow
EDIT: System specs:
Asus ROG STRIX X570-I
Ryzen 7 3700X
RTX 2080 Ti
Corsair MP600 2TB M.2
Corsair Vengeance 2x32GB 3000MHz
Corsair SF750 750W
 
Last edited:
Folks are going to ask for your system specs - so you might want to post what they are.
 
SMART values are fine for your SSD?
Run Memtest (just for 5 minutes to see if something is really wrong)
if that's all fine try something that causes issues on a bunch of ryzen platforms.

go to the DRAM Settings and disable the Power Down Mode.
and change the PSU Idle Control from Auto to typical current idle (just for good measure. probably not the problem but it does not hurt either)
 
Seeing as you suspect the drive, do you have a spare? Even some old 128GB SSD? I thought WHEA was more commonly related to windows updates, CPU or memory config.
 
Looks like a faulty processor or motherboard. Unless it's reporting an error for RAM, but often, it isn't because of RAM. But a faulty IMC, can cause RAM-related errors.
I'm golden with a Ryzen 7 3700X here with an MSI B450 Tomahawk motherboard.

Looks like you should clear the CMOS and then re-enter the boot order, boot config, date and time.

I have Corsair Vengeance LPX 2x8GB 3000 Mhz with XMP and Fclk at 1:1 and still no RAM-related error.
 
Do you have any overclocks applied = RAM, CPU, GPU, etc ??
 
Run Memtest for two hours and see if it spits out any errors.
My money's on the RAM here.

P.S. I had the same memory modules and after a while they couldn't hold their XMP speeds
I had to run at stock FQ to ensure system stability
 
Whats the ID of the event? 18, 19?
 
Is "CRC Error Count" (or similarly-worded) above 0 in SMART? Any WHEA logger events in Event Viewer?
 
Any WHEA logger events in Event Viewer?
First place to look when you get BSOD, event viewer. cant miss the big red X.
 
Thank you everyone for the suggestions, here's where I'm standing now:
  • SMART values are fine on SSD
  • I ran MemTest86 for 1 hour, up to Test 13, 0 errors
  • Switched DRAM Timing Control / Power Down Enable to disabled
  • Changed the PSU Idle Control from Auto to Typical Current Idle
  • Never did any OC
Those steps didn't solve the issue.

I connected second SSD and installed fresh Windows 10 on it, updated drivers and ran userBenchmark again. This time it passes every time and I guess that since Windows is no longer on the disk that is potential culprit it doesn't BSOD so I can see errors that userBenchmark throws:

Code:
ERROR: G: Drive bench execution failed
ERROR: t[0:0] error during write: A device which does not exist was specified. (433)
ERROR: There has been an error during threads execution

G is the MP600 SSD. I went to the event viewer and these are the events that trigger at the time benchmark runs:

Code:
Information: Volume G: (\Device\HarddiskVolume4) is healthy. No action is needed.
Warning: Reset to device, \Device\RaidPort2, was issued.
Error: The driver detected a controller error on \Device\RaidPort2.
Warning: An error was detected on device \Device\Harddisk1\DR1 during paging operation.
Error: A fatal hardware error has occured. A record describing the condition is contained in the data section of this event.

After those events it's just a never ending log of warnings for that device. Since there's no BSOD, there's still no memory dump, there's only XML file of event data which I don't understand at all and I am not sure if it's even useful.

I am biased, since I want it to be SSD because it's less troublesome than motherboard. Could it be motherboard? BIOS? Maybe me connecting new drive caused some issues that I associate with previous BSOD? Or does this prove it's SSD?
 
G is the MP600 SSD. I went to the event viewer and these are the events that trigger at the time benchmark runs:

Code:
Information: Volume G: (\Device\HarddiskVolume4) is healthy. No action is needed.
Warning: Reset to device, \Device\RaidPort2, was issued.
Error: The driver detected a controller error on \Device\RaidPort2.
Warning: An error was detected on device \Device\Harddisk1\DR1 during paging operation.
Error: A fatal hardware error has occured. A record describing the condition is contained in the data section of this event.

After those events it's just a never ending log of warnings for that device. Since there's no BSOD, there's still no memory dump, there's only XML file of event data which I don't understand at all and I am not sure if it's even useful.

If you go into the "Details" tab of at least this error:

Code:
Error: A fatal hardware error has occured. A record describing the condition is contained in the data section of this event.

You should be able to get the RawData output of letters and numbers. If you paste that into a hex to string converter, you may see your SSD listed in there somewhere. For example, for the one seen here (https://docs.microsoft.com/en-us/an...rdware-error-occured-whea-logger-event-i.html), if you paste the RawData into the converter, you get

Code:
CPERÿÿÿÿ�������Î��3�
<`ÁƒR§H‡ÑÙF}we����������������|!Wf^ûD€3›tÊÎß[ø3�p.ˆN™,o&ÚóÛzâuF†É§Ö�����������������������È�����������������������������������������������������������������STORPORT�¤�������� 0û§àÓ    [±ß9´s�t�o�r�a�h�c�i�����������������INTEL   �SSDSC2KW010X6����¤���¤���}àP���������� ������������������d�������������������2���ÿÿÿÿ������������ �������������������������2�������e���d���������������4���2���2�����������”��

In the middle of that you can see "INTEL SSDSC2KW010X6" is causing the fault.

Reason I know this is because I had a similar issue with occasionally getting that error with a rare blue screen (not as easily as you're getting them) and it ended up being some kind of incompatibility between the older AMD chipset (~2010) and my Samsung EVO 860 SSD - the answer was to turn off Native Command Queuing (NCQ). You're on much newer hardware so that doesn't seem as likely.

This is the reason I asked about the CRC error count in SMART - with NCQ on the error count would go up if I ran something like CrystalDiskMark, even though it shows the value as being "Good." It hasn't increased since disabling NCQ months ago.

SMART.png


Those other events do seem relevant for sure with the issue you're having, hopefully someone here has experience with them.
 
Whats the ID of the event? 18, 19?
The ID of the fatal hardware error is 1, if that's what you mean. I get this when reading event list in windows, I got no other numbers/codes on BSOD when it happened.

If you go into the "Details" tab of at least this error:

Code:
Error: A fatal hardware error has occured. A record describing the condition is contained in the data section of this event.
I get this:

Screenshot 2021-08-06 at 17.41.16.png


CristalDiskInfo doesn't give me any S.M.A.R.T. data at all.

Screenshot 2021-08-06 at 17.41.33.png


When I was using Windows on that drive, every time I did something hardware intensive (like benchmark) I would get BSOD. It seems that when I use Windows on different SSD I don't get BSOD, but after the error appears in events the MP600 disk becomes disconnected, as in it's no longer detected by any software.
 
To solve this problem:
PRECISION BOOST OVERDRIVE [Enhanced Mode 3]

AMD CBS\
CORE PERFORMANCE BOOST [Auto]
Global C-State Control [Disabled]

AMD Overclocking\
ECO Mode [Disabled]

Precision Boost Overdrive [Advanced]
PBO Limits [Motherboard]
Precision Boost Overdrive Scalar [Auto]
Curve Optimizer [Disabled]
Max CPU Boost Clock Override [100MHz] (200 works too =slight increase of 50-80 pts in Cinebench r20 multi, but more W)
Platform Thermal Throttle Limit [Manual]
Platform Thermal Throttle Limit 255
 
To solve this problem:

I have the latest BIOS and I don't have all those options you mentioned.

PRECISION BOOST OVERDRIVE [Enhanced Mode 3] I don't have Enhanced Modes anywhere in settings
CORE PERFORMANCE BOOST [Auto] was already set by default
Global C-State Control [Disabled] changed this
ECO Mode [Disabled] was already set by default
Precision Boost Overdrive [Advanced] only have disabled/enabled/manual so I set it to manual
PBO Limits [Motherboard] setting manual PBO allows to change PPT, TDC and EDC Limits, but no Motherboard option anywhere
Precision Boost Overdrive Scalar [Auto] was set by default
Curve Optimizer [Disabled] don't have that option
Max CPU Boost Clock Override [100MHz] (200 works too =slight increase of 50-80 pts in Cinebench r20 multi, but more W) changed this
Platform Thermal Throttle Limit [Manual] changed this
Platform Thermal Throttle Limit 255 changed this

Unfortunately after those changes I was able to make the problem wasn't fixed.
 
Last edited:
Run the RAM at stock settings without XMP and check stability again. As some already posted, most possibly that the RAM timings aren't stable for the voltages applied (RAM, SOC, etc).
 
As I understand D.O.C.P is the ASUS equivalent of XMP and I never had it turned on. The RAM runs at whatever is default. I never overclocked anything. MemTest86 didn't show errors for the first hour, didn't try it longer.
 
As I understand D.O.C.P is the ASUS equivalent of XMP and I never had it turned on. The RAM runs at whatever is default. I never overclocked anything. MemTest86 didn't show errors for the first hour, didn't try it longer.
Nice! Then your M.2 disk should be the root of the problem.
 
Yeah, it seems like SSD is the root after all. I thought it might be the RAM as a lot of BSODs are memory related,
but sometimes it's tricky to find the real cause.
 
I thought it might be the RAM as a lot of BSODs are memory related
If all the BSODs that occurred, had stop error codes of RAM corruption being the reason.
 
@RJARRRPCGP In fact, a lot of BSOD ARE memory related.
I didn't say all, but at least half or more of them are indeed memory related
That's my experience.
 
Back
Top