• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Getting Cache hierarchy error

Shoburo

New Member
Joined
Oct 14, 2024
Messages
8 (0.04/day)
Hello, i need some assistance in locating a hardware error or faulty hardware.
My hardware:

Mobo: Asus rog strix B550-A Gaming
CPU: AMD Ryzen 9 5900x
CPU Cooler: Asus rog strix LC II 360
RAM: Goodram IRDM pro 2x16gb 3600mhz(running at 3200mhz)
GPU: Asus TUF Gaming rx 6900xt
PSU: Coolermaster MWE 850W V2
Storage: 2 pieces of pcie SSD and one HDD
OS: Windows 11 Pro 64-bit (10.0, Build 22631) version: 23H2
Sometimes my PC restarts and usually dont get a normal event log only that the system shutdown was unexpected. But once I got this:
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 23

The details view of this entry contains further information.​
And as I read it was an instability error, here on this site but i reseted my bios to default and updated it with the latest version.(altough didnt get any restart since then)

And only some games are affected
(Like: Hunt showdown, Civ VI, Baldurs Gate 3 [I guess only happens at CPU heavy multiplayer games, never had any restarts with singleplayer games even CPU heavy ones]
Multiplayer games dont cause crashes are: Space marine 2, Monster hunter world, risk of rain 2, WH40k Darktide, Sons of the forest)

I tried some stress test to see which hardware causing it (did CPU, gpu and ram tests) but when testing nothing was wrong and even the temperatures were good.
So only clue was the fatal hw error with Cache hierarchy Error.

Maybe but not sure it got damaged from undervolting? I wanted to do it because checked the temp in summer and was my cpu at 80+ Celsius when gaming and thought thats too much. I looked up a youtube tutorial and i saw suggestion to set the VDDCR CPU override to manual 1.025 and core ratio to x40.
It ran for like a few month with that without any issue. And someday (maybe at a BIOS firmware update, dont remember correctly) it just started to happen, first once a month then later with random occurencies 1x or more times in a gaming session. Some of my friends said i shouldnt undervolting it because it was the problem but with the default settings i got the error as well.

Thanks for the help in advance,
Greg
 
Maybe but not sure it got damaged from undervolting?
In theory, a static undervolt could damage your CPU, but that would probably take longer than just a few months. You most likely just corrupted your Windows installation and file systems.

I wanted to do it because checked the temp in summer and was my cpu at 80+ Celsius when gaming and thought thats too much.
For a Ryzen 9 5900X, AMD specifies the maximum operating temperature (TJmax) as 90°C. That means you can run your CPU 24/7/365 at that temperature according to them. If you feel that you want a lower temperature, you can limit the maximum temperature in the PBO settings, which is usually the preferred way to do that.

I looked up a youtube tutorial and i saw suggestion to set the VDDCR CPU override to manual 1.025 and core ratio to x40.
Yeah, that's not the greatest idea to use the static overclocking feature which that sort of undervolt actually is, since it also disables certain CPU safety features. For undervolting any Zen2 to Zen5 CPU, you should stick to the curve optimizer function of the PBO system, which doesn't disable the safety features. It can make your system unstable, but your CPU shouldn't outright die or degrade from it.

Some of my friends said i shouldnt undervolting it because it was the problem but with the default settings i got the error as well.
Try to clear CMOS, load defaults as needed, keep CPU and memory at stock, and run stability tests and the games that crashed in the past to see what your CPU does. If the problems persist, check your file system integrity, re-install drivers, test under a different user account, and if nothing helps, consider a fresh Windows installation.
 
I have looked up at the pbo but the options weren't available to as they suggested, and forgot to say the idling temp was 60°c, thats why i got looking at it.
P.s: I tried the file integrity check which said i had some corrupted files after the first crash, so that didn't help. Anyway i still need to test the pc when i have time to check if the last bios update fixed or not. I'll update when at sunday at last if no crash has happened before that.
 
It's been a year... it's been over a year (!)... and seeing those words still give me panic attacks!

This one is always a rabbit hole so I don't envy you.

What this means, as the event log states, is that an uncorrectable hardware error, a "machine check condition", occurred and was identified by the CPU, and the CPU's response was to initiate a system restart.

What caused this could be a plethora of things and there is no magical answer (in my case, it was actually a faulty graphics cards). But it could be a faulty CPU, a faulty motherboard, unstable RAM, unstable overclock/undervolt, and so on.

And undervolting your CPU shouldn't be able to damage it, as far as I know? It can make it unstable... but it shouldn't be able to damage it. Overvolting could obviously damage it, however.

I would set the CPU/RAM back to default, so no custom clock speeds/voltages on the CPU side, and test the RAM with profile speeds disabled for now. If that stops the restarts, enable the profile speeds on the RAM (but leave CPU alone) and test again.
 
Update on the situation. It crashed twice with everything on default so next turn is reinstalling windows. If that does nothing then what? Should i change my mobo, cpu and ram combination?
Was it the same error?
 
So after reinstallation of windows its still crashed under heavy load gaming. No detailed error log, only says the shutdown was unexpected.
Edit: Now there are some flickering on my screen and some weird anomalies or glitches(dont know whats that called, but sometimes there are some black flickerings which sometimes cover all the screen or some parts of it for a ms of time)[did happened before driver installation].
And in the game where it really frequently crashing(Hunt showdown) there are a lot of graphical glitches and flickering, but i thought of that the game is poorly coded. So i it happens on idling maybe my gpu is bad?
 
Last edited:
So after reinstallation of windows its still crashed under heavy load gaming. No detailed error log, only says the shutdown was unexpected.
Edit: Now there are some flickering on my screen and some weird anomalies or glitches(dont know whats that called, but sometimes there are some black flickerings which sometimes cover all the screen or some parts of it for a ms of time)[did happened before driver installation].
And in the game where it really frequently crashing(Hunt showdown) there are a lot of graphical glitches and flickering, but i thought of that the game is poorly coded. So i it happens on idling maybe my gpu is bad?
It wouldn't hurt to swap out the GPU and see if that changes things. I think there was another user here on the forums whose GPU ended up causing these issues however there might be several different causes for the Cache Hierarchy error including simply having a bad chip and replacing it under warranty.

If you go through the usual suspects below without any improvement I would consider RMA.
  • Fresh OS Install
  • Updated UEFI/BIOS
  • Reset UEFI/BIOS to Defaults
  • Check Voltages
  • Updated Drivers
  • GPU SWAP
  • PSU SWAP
 
Last edited:
H
It wouldn't hurt to swap out the GPU and see if that changes things. I think there was another user here on the forums whose GPU ended up causing these issues however there might be several different causes for the Cache Hierarchy error including simply having a bad chip and replacing it under warranty.
Is there a way to pinpoint the faulty hardware?
 
H

Is there a way to pinpoint the faulty hardware?
I would start swapping out parts but not everyone has the parts or time to do such things. If I go back to your original description "Sometimes my PC restarts and usually don't get a normal event log only that the system shutdown was unexpected." and ignore the cache error for the moment, and the fact this mostly happens with some games then testing with different drivers and swapping out the GPU makes sense to me as things to try.

Some thoughts. I have no idea if this is related to your problem but unlike multicore stress tests, gaming workloads with these newer high-end GPU's might cause power transients causing the system to shutdown. I'm not sure if this would work but perhaps a test might be to power limit your GPU to a few different settings and see if the system still shuts down in the games you are having issues with. (assuming power transient requests will respect power limits - I don't know if they will or not)

 
Last edited:
I would start swapping out parts but not everyone has the parts or time to do such things. If I go back to your original description "Sometimes my PC restarts and usually don't get a normal event log only that the system shutdown was unexpected." and ignore the cache error for the moment, and the fact this mostly happens with some games then testing with different drivers and swapping out the GPU makes sense to me as things to try.

Some thoughts. I have no idea if this is related to your problem but unlike multicore stress tests, gaming workloads with these newer high-end GPU's might cause power transients causing the system to shutdown. I'm not sure if this would work but perhaps a test might be to power limit your GPU to a few different settings and see if the system still shuts down in the games you are having issues with. (assuming power transient requests will respect power limits - I don't know if they will or not)

I already tried it with 90% power cap but it was the same, and even logged it amd's software and the highest power usage was 274W and it wasnt a spike but a stable 254ish-274W range.

Check Voltages
Sometimes my CPU voltage is reaching between 1.5 and 1.6[but not reaching 1.6, i guess](on default settings), but later i'll try check it with logging when has time for it. Altough i didnt check gpu's voltage properly so next time i'll watch that as well.

I have logged some RE4 gaming, but i don't see any bad situations. Altough i tried to log the game where I get a crash, but it didn't save so I can't show that.

Beside this when crashed and restarted, the PC loaded up until the login process but my gpu didn't give a signal only after manual shutdown with pressing the power on button.

I tried to look up some error logs or any form where the system would show any particular anomalies in running the system.

So I think maybe from the non-signal windows load the faulty one is the gpu. Ill try to contact some friends who would lend their gpu-s for testing.

Ps: About power transient spikes, i dont know. Why its started after 2 years and not right after when a made pc? If my psu is weak then it wouldnt be able to run my computer after the first try of those events which caused the crashes e.g:hunt show. I was playing it years before this pc and no problem occured until recently and not even hunt was the first one to cause it. (Anyway sry for referencing the game a lot, but this is the only game where its crashing nonstop)
 

Attachments

Could be your ram even
 
I already tried it with 90% power cap but it was the same, and even logged it amd's software and the highest power usage was 274W and it wasnt a spike but a stable 254ish-274W range.


Sometimes my CPU voltage is reaching between 1.5 and 1.6[but not reaching 1.6, i guess](on default settings), but later i'll try check it with logging when has time for it. Altough i didnt check gpu's voltage properly so next time i'll watch that as well.

I have logged some RE4 gaming, but i don't see any bad situations. Altough i tried to log the game where I get a crash, but it didn't save so I can't show that.

Beside this when crashed and restarted, the PC loaded up until the login process but my gpu didn't give a signal only after manual shutdown with pressing the power on button.

I tried to look up some error logs or any form where the system would show any particular anomalies in running the system.

So I think maybe from the non-signal windows load the faulty one is the gpu. Ill try to contact some friends who would lend their gpu-s for testing.

Ps: About power transient spikes, i dont know. Why its started after 2 years and not right after when a made pc? If my psu is weak then it wouldnt be able to run my computer after the first try of those events which caused the crashes e.g:hunt show. I was playing it years before this pc and no problem occured until recently and not even hunt was the first one to cause it. (Anyway sry for referencing the game a lot, but this is the only game where its crashing nonstop)
CPU shouldn’t go beyond 1.5V at any circumstance. I hardly ever see 1.5V on my 5900X. Usually spikes up to 1.47-1.48V (for 4.95-5.0GHz)

What software are you using for CPU voltage monitoring and did you change anything in bios for Vcore? …including PBO.

GPU transients are so brief that no software can catch them no matter how fast data polling is.

HWiNFO64 (sensors mode only) is your tool for monitoring system.
For CPU voltage look the “CPU Core Voltage (SVI2 TFN)” sensor.
Right below that sensor is the “CPU SoC Voltage (SVI2 TFN)”
Check that too.

Important:
On main HWiNFO settings the “Snapshot CPU Polling” should be enabled for Ryzen based systems.
 
My Ryzen 9 5900X has not been higher than 142W PPT, often a bit less than that, TMK. So 200-something W at default, is sus.

Note that an unstable VRAM OC (not core) on your video card may cause this error. Were you OC'ing "MCLK" on your RX 6900XT?
 
Last edited:
I routinely see 1.535v on my 5900X, but it is boosting pretty hard when I run it.

I run 260/170/200 -25 AC +200
 
Everything is on default since August to see if the undervolting the cpu were the cause, and never overclocked the gpu in any forms. So on total default settings the pc crashes, and there was one solution for it, changing the gpu. (But that was just temporary cuz it wasnt mine so i had to switch back for the 6900xt)(changed it to a gtx1080 if thats necessary information)
For monitoring the cpu Voltage I checked it at the amd's adrenaline software.
And I'm sure its the gpu because I checked with an another one and it didn't happen.
I checked the log file again, and the cpu's max voltage was 1.485, maybe the amd software were rounding it.
My Ryzen 9 5900X has not been higher than 142W PPT, often a bit less than that, TMK. So 200-something W at default, is sus.

Note that an unstable VRAM OC (not core) on your video card may cause this error. Were you OC'ing "MCLK" on your RX 6900XT?
The 274W is not for the cpu but the gpu. Sry for not specifying it.
 
So apparently, after the last crash I saw a switch on the gpu, P-mode to Q-mode. So I tried the Q-mode and Its blew the fuse of my room and the lower part of the house and the safety switch in the electric box. After I got back the electricity the PC didn't get any power....so my brand new PSU fried from it. It was a Seasonic vertex gx-1200 and got it 1 month ago. I brought my pc to a service today to check which component got damaged beside the psu. I guess the solution for the crashes was to completely break my pc :/
 
Back
Top