• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,670 (7.43/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Intel has identified the root cause for stability issues being observed with certain high-end 13th- and 14th Gen Core "Raptor Lake" processor models, which were causing games and other compute-intensive applications to randomly crash. When the issues were first identified, Intel recommended a workaround that would reduce core-voltages and restrict the boost headroom of these processors, which would end up with reduced performance. The company has apparently discovered the root cause of the problem, as Igor's Lab learned from confidential documents.

The documents say that Intel isolated the problem to a faulty value in the microcode's end of the eTVB (enhanced thermal velocity boost) algorithm. "Root cause is an incorrect value in a microcode algorithm associated with the eTVB feature. Implication Increased frequency and corresponding voltage at high temperature may reduce processor reliability. Observed Found internally," the document says, mentioning "Raptor Lake-S" (13th Gen) and "Raptor Lake Refresh-S" (14th Gen) as the affected products.



The company goes on to elaborate on the issue in its Failure Analysis (FA) document:
Failure Analysis (FA) of 13th and 14th Generation K SKU processors indicates a shift in minimum operating voltage on affected processors resulting from cumulative exposure to elevated core voltages. Intel analysis has determined a confirmed contributing factor for this issue is elevated voltage input to the processor due to previous BIOS settings which allow the processor to operate at turbo frequencies and voltages even while the processor is at a high temperature. Previous generations of Intel K SKU processors were less sensitive to these type of settings due to lower default operating voltage and frequency.

Identifying the root cause of the problem isn't the only good news, Intel also has a new microcode ready for 13th Gen and 14th Gen Core processors (version: 0x125), for motherboard manufacturers and PC OEMs to encapsulate into UEFI firmware updates. This new microcode corrects the issue, which should restore stability of these processors at their normal performance. Be on the lookout for UEFI firmware (BIOS) updates from your motherboard vendor or prebuilt OEM.

View at TechPowerUp Main Site | Source
 
Sounds like good news.
 
Previous generations of Intel K SKU processors were less sensitive to these type of settings due to lower default operating voltage and frequency.
Terminal Velocity Boost.
 
TPU left out the part on a few other sites, which is that the problem will cause processor degradation depending on how long it was exposed to the problem. No one knows if it will fall under warranty or not.
 
Selecting XMP profile raises CPU OC flag and voids warranty. But other than that
Restore stability at their nominal performance, sure. Lower tolerances and you get same performance. Make it make sense.
 
So a microcode issue from the vendor was the cause of the stability woes. At least the power of the internet and social media put pressure on Intel, and it worked to find a solution…

But the key point is raptor lake was factory-pushed further to the limit (of stability) than previous Lakes.
 
Intel states, "While this issue is potentially contributing to instability, it is not the root cause."

And Intel states that it is "still investigating" looking for a root cause.
 
Last edited:
Will Intel honor warranty for 13th gen owners whose CPU got degraded because of this?
 
TPU left out the part on a few other sites, which is that the problem will cause processor degradation depending on how long it was exposed to the problem. No one knows if it will fall under warranty or not.

If it's a manufacturing fault it should do, or class action lawsuit from anyone with a damaged processor will be incoming.

The bottom line is Intel pushed these chips too hard due to the fundamental design not matching AMD chips.
 
In an older article:

It looks like Intel owes a lot of apologies. Oops. :slap:
 
While it's Intel fault it certainly wasn't intentional to write faulty micro code. That's just a innocent **** up by someone that went unnoticed.
 
This is the value in the microcode that was creating the problem

"Win_Benchmarks_At_Any_Cost=YES"

And this is the fix

"Win_Benchmarks_At_Any_Cost=NO"
 
While it's Intel fault it certainly wasn't intentional to write faulty micro code. That's just a innocent **** up by someone that went unnoticed.
...and was blamed on motherboard makers before they finally noticed it.
 
If they weren't doing dubious question stuff in regards to MB defaults the finger wouldn't have been pointed their direction in the first place. In any case it was a problem regardless. It's a issue for AMD and Intel and MB makers shouldn't be doing that type of thing with defaults. Nearly everyone is in agreement with that who has even a shred of sense and integrity. It's a bad judgement call by MB maker's to eek out more performance to win benchmarks at any costs as john_ satirically puts it on the microcode. It probably saved a number of chips from potentially dying from degradation by MB makers pushing new bios with proper bios defaults that weren't dodgy AF.
 
Interesting interpretation on the news article. My interpretation is running bios at spec isnt a workaround but a fix, but this microcode fix allows the out of spec configuration to run stable again (or at least makes it more likely). I also disagree with the article that the old performance levels are "normal" for the product. Not only were they out of spec but they were using a faulty microcode that enabled TVB too frequently. I would like to see new reviews on this new microcode. :)

I expect now the push for better bios defaults will suddenly vanish, as Intel will want this out the news asap, and the board vendors will want to continue pushing bios's that default to out of spec. The solution for both of those is to let the bios situation drop.

...and was blamed on motherboard makers before they finally noticed it.
They are still not innocent, running the spec does stabilise chips. Just looks like there is 2 triggers to the problem.
 
This is the value in the microcode that was creating the problem

"Win_Benchmarks_At_Any_Cost=YES"

And this is the fix

"Win_Benchmarks_At_Any_Cost=NO"

Nice non biased commet there, well done.
 
Let's get into conspiracy theory mode. I find it quite convenient to release a fix that degrades RPL performance now months before the release of Arrow Lake. Now I understand the +/- 10% margin of error in intel's slides.

Pat is a criminal genius. lol
 
Back
Top