Tuesday, July 23rd 2024

Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon
Long-term reliability issues continue to plague Intel's 13th Gen and 14th Gen Core desktop processors based on the "Raptor Lake" microarchitecture, with users complaining that their processors have become unstable with heavy processing workloads, such as games. This includes the chips that have minor levels of performance tuning or overclocking. Intel had earlier isolated many of these stability issues to faulty CPU core frequency boosting algorithms, which it addressed through updates to the processor microcode that it got motherboard- and prebuilt manufacturers to distribute as UEFI firmware updates. The company has now come out with new findings of what could be causing these issues.
In a statement Intel posted on its website on Monday (22/07), the company said that it has been investigating the processors returned to it by users under warranty claims (which it has been replacing under the terms of its warranty). It has found that faulty processor microcode has been causing the processors to operate under excessive core voltages, leading to their structural degradation over time. "We have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor."Modern processor power management runs on an intricate clockwork of collaboration between software, firmware, and hardware, with the software constantly telling the hardware what levels of performance it wants, and the hardware managing its power- and thermal budgets by rapidly altering the power and clock speeds of the various components, such as CPU cores, caches, fabric, and other on-die components. A faulty collaboration between any of the three key components could break this clockwork, as has happened in this case.
Intel is releasing yet another microcode update to its 13th- and 14th Gen Core processors, which will address not just the faulty boosting algorithm issue the company unearthed in June, but also the faulty voltage management the company discovered now. This new microcode should be released some time around mid-August to partners (motherboard manufacturers and PC OEMs), who will then need to validate it on their machines, before passing it along to end-users as UEFI firmware updates.
Meanwhile, an interesting issue has come to light, which that some of Intel's processors built on the Intel 7 node are experiencing chemical oxidation of the die as they age. Intel responded to this, stating that it had discovered the oxidation manufacturing issues in 2023, and addressed it. The company also stated that die oxidation is not related to the stability issues it is embattled with.
Sources:
Intel Community, Intel (Reddit)
In a statement Intel posted on its website on Monday (22/07), the company said that it has been investigating the processors returned to it by users under warranty claims (which it has been replacing under the terms of its warranty). It has found that faulty processor microcode has been causing the processors to operate under excessive core voltages, leading to their structural degradation over time. "We have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor."Modern processor power management runs on an intricate clockwork of collaboration between software, firmware, and hardware, with the software constantly telling the hardware what levels of performance it wants, and the hardware managing its power- and thermal budgets by rapidly altering the power and clock speeds of the various components, such as CPU cores, caches, fabric, and other on-die components. A faulty collaboration between any of the three key components could break this clockwork, as has happened in this case.
Intel is releasing yet another microcode update to its 13th- and 14th Gen Core processors, which will address not just the faulty boosting algorithm issue the company unearthed in June, but also the faulty voltage management the company discovered now. This new microcode should be released some time around mid-August to partners (motherboard manufacturers and PC OEMs), who will then need to validate it on their machines, before passing it along to end-users as UEFI firmware updates.
Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation. Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance, the company stated.It's important to note here, that the microcode update won't fix the issues on processors already experiencing instability, but prevent it on chips that aren't. The instability is caused by irreversible physical degradation of the chip. These chips will, of course, be covered under warranty.
Meanwhile, an interesting issue has come to light, which that some of Intel's processors built on the Intel 7 node are experiencing chemical oxidation of the die as they age. Intel responded to this, stating that it had discovered the oxidation manufacturing issues in 2023, and addressed it. The company also stated that die oxidation is not related to the stability issues it is embattled with.
We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue, the company stated.If you feel your chip might be affected, you can file for an RMA.
387 Comments on Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon
Ugly to disgusting. This needs a proper watchdog / regulatory investigation at this point.
They reduced 500mhz and that made it last 6 to 8 months before failing.
This is the source...
Anyway, as this Saturday I'm leaving on vacation for two weeks, I decided that this week would be the best time to ask an RMA for it, especially in light of the latest revelations. I was hoping to get a replacement when I got back from vacation.
So, on Tuesday I made the RMA request, telling them that my CPU is unstable even with the latest BIOSes and the Intel defaults.
- 8 hours later, they were asking for the serial number, as I have given them an incorrect serial number initially.
- 4 hours later the RMA was approved, and I received an estimation of 5-7 business days to receive the replacement, from the time I submit the faulty unit.
- 1 hour later I received the DHL return label. The collection was scheduled for Thursday, 2 days later, but I decided not to wait for that.
- 3 hours later I handed over the faulty CPU at a local DHL service point
- 15 hours later the CPU started its travel towards Intel
- 22 hours later the CPU arrived at Intel
- 7 hours later they initiated shipping for the replacement CPU
- 2 hours later the CPU was on its way to me
- 3 hours later I received an email confirming they received the CPU and they need to validate it which would take 1 to 3 business days. Not sure what that email was about, since they already sent the replacement, probably just an automated message.
- 16 hours later I had a brand new 13900KF CPU on my desk.
Overall, the process took just a bit over 3 days, which I find almost unbelievable. I didn't get any pushback from Intel, and apparently when you send them a CPU that they know is very likely to be affected they don't actually do any validation, except that you actually sent them the CPU, and almost immediately send you a replacement.
So, while they might have been reluctant to accept RMAs in the past, it seems that right now they are actually doing the right thing regarding RMAs, and they are trying to make things as painless as possible for their customers.
While I can't totally forgive them for their past behavior, I have to give them a 10/10 for my recent experience.
The new CPU uses just 1.279 V in the BIOS, compared to the defective CPU, which used 1.447V. I left the BIOS on the Intel defaults with no XMP, and so far I don't have any stability issues and it reaches the maximum frequencies properly. But I won't do more extensive testing or stress it more until the microcode updates are released.
It even explicitly states that in the quote from verge you posted.
In my bios I have independent controls for core and ring voltage, which is an indication they different, but also Intel documentation says it itself.
edc.intel.com/content/www/us/en/design/products/platforms/details/raptor-lake-s/13th-generation-core-processors-datasheet-volume-1-of-2/002/ring-interconnect/
I will quote it here as well
Then let's take into account that the ring bus as we know it today is still the same ring bus that was used back when Intel was still pumping out quad-core CPUs. As Intel began to add more cores to their processors with the HEDT line, they found that adding more cores to the ring bus introduced too much stress and latency thus leading to reduced performance. Hence it leads to why Intel introduced a mesh-style bus to their X-series of chips that were part of their HEDT lineup.
Fast-forward to today and Intel is expecting the very same ring bus to handle far more cores than it ever was designed to handle.
On RL the cache clock got a massive boost, so logically voltage will have gone up also to power that.