• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Random reboots and Cache Hierarchy Error

Joined
Oct 4, 2019
Messages
3 (0.00/day)
Location
London
System Name The Thing
Processor 5800X3D
Motherboard Gaming Pro Carbon AC
Cooling Dark Rock Pro 4
Memory 3200 DDR4
Video Card(s) 6950XT
Storage NVME 2TB x2
Case Corsair
Power Supply Corsair
Mouse Corsair
Software Win11
I'm getting the following in event viewer after a black screen reboot:

A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

This has happened twice in 2 weeks now. Each error is the same APIC ID apart from 1, which shows the APIC ID: 1 and 0 on the 20/8/2023 . The latest shows both as APIC ID: 0 on the 31/8/2023

I'm using the following hardware:

Win11Pro build 2262.2134
B450 Gaming Pro Carbon AC motherboard
5800X3D Processor using Kombo Strike 3 (been solid for months)
Dark Rock Pro 4 CPU cooler
2x 8GB DIMM's of 3200MHZ G.Skill RAM (F4-3200C16-8GTZB)
6950XT GPU reference model
Corsair 750W Gold rated PSU (one of the newer models)

Its been rock solid for months now. Only the last 2 weeks have I had these random reboots (not gaming when this happens)

What I've done recently:

- I've updated the chipset driver to the latest and today the BIOS to the latest (beta version)
- OCCT CPU test was okay with a quick 30 min test.

Has anyone had this before or can somebody shed some light to why might be causing this?
 
  • Kombostrike 3 is -30 Curve Optimizer equivalent, -30 is max for 5800X3D and aggressive and not guaranteed for every CPU sample.
  • Cache Hierarchy is usually indicative of cores being unstable and not getting the Vcore they need. Unless the cores are physically defective, which is rarer and doesn't sound like yours is.
  • Random reboots happen when that instability is severe enough, and often at idle. That's the point in the V-F curve where the unstable -30 offset becomes a problem for aforementioned core.
  • Cache Hierarchy has nothing to do with RAM, Fabric or SOC-related things, or chipset.
Run some quick corecycler script with default settings and adjust your CO (Kombostrike) settings accordingly. https://github.com/sp00n/corecycler

If you want more granularity than Kombostrike offers, upgrade to a AGESA 1208 BIOS or later with Curve Optimizer controls reinstated for 5800X3D.

The Event Viewer entry for Cache Hierarchy may have an APIC ID that tells you which core errored out (iirc it counts SMT threads so count every 2 accordingly starting from 0). But don't take it at its word, run some tests. Sometimes the APIC ID is nonsensical or sometimes it doesn't reveal anything at all, but in your case I'd probably check out Core 0 (the first one in the list) first.
 
Last edited:
I'm getting the following in event viewer after a black screen reboot:

A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

This has happened twice in 2 weeks now. Each error is the same APIC ID apart from 1, which shows the APIC ID: 1 and 0 on the 20/8/2023 . The latest shows both as APIC ID: 0 on the 31/8/2023

I'm using the following hardware:

Win11Pro build 2262.2134
B450 Gaming Pro Carbon AC motherboard
5800X3D Processor using Kombo Strike 3 (been solid for months)
Dark Rock Pro 4 CPU cooler
2x 8GB DIMM's of 3200MHZ G.Skill RAM (F4-3200C16-8GTZB)
6950XT GPU reference model
Corsair 750W Gold rated PSU (one of the newer models)

Its been rock solid for months now. Only the last 2 weeks have I had these random reboots (not gaming when this happens)

What I've done recently:

- I've updated the chipset driver to the latest and today the BIOS to the latest (beta version)
- OCCT CPU test was okay with a quick 30 min test.

Has anyone had this before or can somebody shed some light to why might be causing this?
Did you solve it ? I have the same problem , it only happened 3 -4 times in the past couple of months but all started after adding some RAM.
  • Kombostrike 3 is -30 Curve Optimizer equivalent, -30 is max for 5800X3D and aggressive and not guaranteed for every CPU sample.
  • Cache Hierarchy is usually indicative of cores being unstable and not getting the Vcore they need. Unless the cores are physically defective, which is rarer and doesn't sound like yours is.
  • Random reboots happen when that instability is severe enough, and often at idle. That's the point in the V-F curve where the unstable -30 offset becomes a problem for aforementioned core.
  • Cache Hierarchy has nothing to do with RAM, Fabric or SOC-related things, or chipset.
Run some quick corecycler script with default settings and adjust your CO (Kombostrike) settings accordingly. https://github.com/sp00n/corecycler

If you want more granularity than Kombostrike offers, upgrade to a AGESA 1208 BIOS or later with Curve Optimizer controls reinstated for 5800X3D.

The Event Viewer entry for Cache Hierarchy may have an APIC ID that tells you which core errored out (iirc it counts SMT threads so count every 2 accordingly starting from 0). But don't take it at its word, run some tests. Sometimes the APIC ID is nonsensical or sometimes it doesn't reveal anything at all, but in your case I'd probably check out Core 0 (the first one in the list) first.
Could it be the RAM (the second kit is kinda bad) ? It was stable before adding more ram and a bios update with the same CO (tested with corecycler etc. , might have to test it again) , i see you are saying that has nothing to do with it.

5700x - Processor APIC ID: 0 and Processor APIC ID: 8. ( is it the first and fifth core? )
 
Did you solve it ? I have the same problem , it only happened 3 -4 times in the past couple of months but all started after adding some RAM.

Could it be the RAM (the second kit is kinda bad) ? It was stable before adding more ram and a bios update with the same CO (tested with corecycler etc. , might have to test it again) , i see you are saying that has nothing to do with it.

5700x - Processor APIC ID: 0 and Processor APIC ID: 8. ( is it the first and fifth core? )

If you suspect it's RAM, then revert CO back to 0 and then run HCI/TM5.

CO is not guaranteed, and CO is also not guaranteed to stay the same after a new AGESA update. 1207 in particular significantly changed the V-F curve for some CPUs, so much so that it necessitates an extra -0.07V to -0.1V offset just to get close to what it was doing in 1206. Obviously, knowing that, CO settings are going to be changed as well.
 
If you suspect it's RAM, then revert CO back to 0 and then run HCI/TM5.

CO is not guaranteed, and CO is also not guaranteed to stay the same after a new AGESA update. 1207 in particular significantly changed the V-F curve for some CPUs, so much so that it necessitates an extra -0.07V to -0.1V offset just to get close to what it was doing in 1206. Obviously, knowing that, CO settings are going to be changed as well.
I think i went from 1207 to 120a , never had that problem before that but around that time i added an extra ram kit that is a bit messy, ill just go to 120b and do all the tests again ... my RAM passed TM5 but with some loose timings.
Edit: What is also different is the GPU curve , can that have something to do ? I just dont want to re-test everything ^_^
 
Last edited:
I think i went from 1207 to 120a , never had that problem before that but around that time i added an extra ram kit that is a bit messy, ill just go to 120b and do all the tests again ... my RAM passed TM5 but with some loose timings.
Edit: What is also different is the GPU curve , can that have something to do ? I just dont want to re-test everything ^_^

GPU shouldn't have anything to do with it.

RAM shouldn't have anything to do with it either........if you are 100% sure it's Cache Hierarchy. If there are WHEAs showing up as either Bus/Interconnect or Unknown Source, then maybe not. If you have turned a dual rank setup into a quad rank setup by adding 2 sticks, VSOC and VDDGs might not be enough.
 
I havent changed anything yet but i dont have any WHEAs or any other issues, i just this get random reboots , happened only a few times most of them under mixed load , like having a game opened + doing some work , but mostly i feel like im getting the reboot when im using the Android Emulator (i had it open all the times this happened). I have 4x8 GB in the system , first kit could do 14-18-18-38 @ 1867 GDM off (or something like that) the second kit cant do better than this 16-22-22-44 GDM on (even when using only this kit), maybe some timings can still be lowered but i gave up , this was TM5 error free. (i need to change them to a single kit)
System was 100% stable no problem before (tested overnight 2 or 3 times with TM5 , corecycler etc,,) , i messed up with this RAM kit , i sold an old computer that had a matching kit with the one i have in this one so i had to use this kit instead ( i was using it on an intel system). I added this RAM kit, updated bios ( 1207 to 120a ) and using a new curve on the GPU (everything else is the same as before) then i started to get this random reboots , the thing is that they happen very rarely in random situations and its hard to see where the problem is (and i dont want to reverse everything to stock atm), when im looking at the CPU curve i see NOW that the cores in trouble are exactly the cores that have higher values (0 and 5 , core5 is the best core), i think ill have to test it overnight again , maybe i have to increase the curve values even more on Core0/Core5.
Edit: I had to change ProcOdt to 48 to make the RAM stable , it defaults to 32 or something.

1700147123941.png
1700147395336.png

,
 
I havent changed anything yet but i dont have any WHEAs or any other issues, i just this get random reboots , happened only a few times most of them under mixed load , like having a game opened + doing some work , but mostly i feel like im getting the reboot when im using the Android Emulator (i had it open all the times this happened). I have 4x8 GB in the system , first kit could do 14-18-18-38 @ 1867 GDM off (or something like that) the second kit cant do better than this 16-22-22-44 GDM on (even when using only this kit), maybe some timings can still be lowered but i gave up , this was TM5 error free. (i need to change them to a single kit)
System was 100% stable no problem before (tested overnight 2 or 3 times with TM5 , corecycler etc,,) , i messed up with this RAM kit , i sold an old computer that had a matching kit with the one i have in this one so i had to use this kit instead ( i was using it on an intel system). I added this RAM kit, updated bios ( 1207 to 120a ) and using a new curve on the GPU (everything else is the same as before) then i started to get this random reboots , the thing is that they happen very rarely in random situations and its hard to see where the problem is (and i dont want to reverse everything to stock atm), when im looking at the CPU curve i see NOW that the cores in trouble are exactly the cores that have higher values (0 and 5 , core5 is the best core), i think ill have to test it overnight again , maybe i have to increase the curve values even more on Core0/Core5.
Edit: I had to change ProcOdt to 48 to make the RAM stable , it defaults to 32 or something.

View attachment 321847View attachment 321848
,

Okay, probably not RAM or Fabric then, that's just dual rank unless your sticks are super old (ie. 2016).

I think your CO curve is just too aggressive. Corecycler is okay at load testing, but idle reboots are not a load problem and this not a problem that corecycler can uncover because corecycler does not test that section of the V-F curve. All you can really do is relax those cores that WHEA caught and trial and error.
 
  • Like
Reactions: izy
Okay, probably not RAM or Fabric then, that's just dual rank unless your sticks are super old (ie. 2016).

I think your CO curve is just too aggressive. Corecycler is okay at load testing, but idle reboots are not a load problem and this not a problem that corecycler can uncover because corecycler does not test that section of the V-F curve. All you can really do is relax those cores that WHEA caught and trial and error.
I went a step down on core 0 and 5 , ill go from here. It doesnt happen on idle just under some load , usually when using the android studio emulator, its a bit weird that it wasnt happening before with the same settings. Yeah , the RAM is pretty old , i had 4x stick of those that could do 14-18-18-38 @ 1867 GDM off but when i sold my intel system i sold it by mistake with 2x of them and i got stuck with this other kit.
 
Back
Top