Tuesday, January 12th 2016

Intel Core Skylake Processors Freeze Under Certain Workloads, Company Issues Fix

Intel's 6th generation Core "Skylake" architecture is faced with a major bug. Responding to a support question on its Communities page, on how certain multi-threaded stress tests like Prime95 can cause the system to freeze; Intel confirmed that an issue affects all 6th generation Core products. In certain highly-specific workloads, such as Prime95, "Skylake" chips hang or cause "unpredictable behavior."

Intel stated that it identified the issue and has released a fix. The company is working with its motherboard partners to get the fix across to users through a system BIOS update:
Intel has identified an issue that potentially affects the 6th Gen Intel Core family of products. This issue only occurs under certain complex workload conditions, like those that may be encountered when running applications like Prime95. In those cases, the processor may hang or cause unpredictable system behavior. Intel has identified and released a fix and is working with external business partners to get the fix deployed through BIOS.
Add your own comment

34 Comments on Intel Core Skylake Processors Freeze Under Certain Workloads, Company Issues Fix

#1
Tsukiyomi91
only affects those who stress-testing it to ensure OCed speeds are stable. Other than that, regular, day-to-day use shouldn't have such problems.
Posted on Reply
#2
R-T-B
How much you want to bet this microcode "fix" breaks BCLK OCing?

Just a thought... I for one have never had Prime95 freeze, and I always stress test with it.

EDIT: Though it does appear to be reproducible doing a highly specific test, interesting. Still, I am leaving the above as food for thought...
Posted on Reply
#3
Tsukiyomi91
as per the thread quoted by btarunr, it only affects after running a certain type of benchmarking test. Not sure which one though...
Posted on Reply
#4
R-T-B
Tsukiyomi91, post: 3400410, member: 129086"
as per the thread quoted by btarunr, it only affects after running a certain type of benchmarking test. Not sure which one though...
Yeah, I was able to freeze my system fairly quickly using the Prime95 instructions, so it's real.

Here's the other thing though: It STILL would not surprise me if they break BCLK ocing that's become popular while they fix this...
Posted on Reply
#5
FordGT90Concept
"I go fast!1!11!1!"
That's a really, really specific number to cause a freeze (like winning the lottery to find that). I'll grab the BIOS update ASAP just to be safe.

Edit: No BIOS updates since December. I'm sure that doesn't contain a fix...
Posted on Reply
#6
TheGuruStud
Tsukiyomi91, post: 3400405, member: 129086"
only affects those who stress-testing it to ensure OCed speeds are stable. Other than that, regular, day-to-day use shouldn't have such problems.
That didn't stop people from crucifying AMD and that bug was so rare it was insane.
Posted on Reply
#7
P4-630
Hmm.. Just bought an i5-6500, I only do 3D Mark benching, I don't think my processor will run into these freezing problems with that.
If it would, I just update the BIOS then.
Posted on Reply
#8
95Viper
Quote from PCWorld article "How to test your PC for the Skylake bug" by Gordon Mah Ung

If you use the current 28.7 version, you will need to create a text file in the folder using Notepad. You can do this by right-mouse-clicking in the Prime95 folder and selecting New > Text Document. Give the text document the title “local.txt.” Once the file is created, double-click it to open with Notepad and type the line CpuSupportsFMA3=0. Save the file in the same folder.


You’ll need to create a text file to tell Prime95 to use AVX to hit the bug

You have to do this because, according to the bug finders, by default the newer version of Prime95 will use AVX2 and the error appears to only occur with AVX.

Start Prime95 by double-clicking Prime95.exe. Dismiss the dialog by clicking Just Stress Testing.


Just skip and select: “Just Stress Testing.”

A dialog box will appear to Run a Torture Test. Select Custom and change the Min FFT size (in K) to 768, and change Max FFT size (in K) to 768. Select Run FFs in-place and also set the run time to 120 minutes or longer. Clicking OK will start the Torture Test.


Using these settings in Prime95, people have reported being able to hang up their new Skylake CPUs.

Now just wait and see if it locks up. Most of the problems seem to occur with the top-end desktop Core i7-6700K, but Intel seems to be implying it could occur on other CPUs.

Before you run this test, you should be aware that Prime95 puts a heavy load on CPUs. Systems that are marginal on cooling or overclocked may crash on their own, so it’s probably best to run this test on a PC with stock settings to make sure it isn’t just an unstable overclock.
Posted on Reply
#12
truth teller
so they are sure that this major hardware cockup can be fixed in microcode? for all cases and not just this fft avx subroutine implementation? or can we expect multiple "fixes" from here on out for other "might freeze" scenarios?

if this can really be fixed with microcode only, microsoft probably already has a new version of mcupdate_genuineintel.dll available in windows update, *nix users might have to wait for a bios updates (if they ever come)
Posted on Reply
#13
RejZoR
Now I feel much better for going with the older Haswell-E instead. I don't want "unpredictable behavior" under stress. Mostly because encoding H.264 video on all cores gives me pretty similar stress conditions as torture stability testing tools. So, the situation is pretty real.
Posted on Reply
#14
cadaveca
My name is Dave
RejZoR, post: 3400544, member: 1515"
Now I feel much better for going with the older Haswell-E instead. I don't want "unpredictable behavior" under stress. Mostly because encoding H.264 video on all cores gives me pretty similar stress conditions as torture stability testing tools. So, the situation is pretty real.
I've been using Skylake for longer than you have been using Haswell-E, and I have yet to see an FMA3 freeze. I also don't tend to mod software just to be able to break stuff. Do keep in mind that's what you are doing. AMD supports FMA4, Intel FMA3, but either way, since both do not support the same stuffs, the chances of a programmer actually using this stuff is actually kind of rare, especially given that you need to mod P95 to get the freeze to happen.

As to this affecting BCLK OC on non-K CPUs, we'll need to see these BIOSes roll out first, and then the "fixed" BIOS will get the same mod that the current ones do. If it breaks BCLK OC, then you should be able to simply flash back to the appropriate BIOS, and then to avoid the problem, don't mod software, m'kay? :P
Posted on Reply
#15
truth teller
cadaveca, post: 3400550, member: 25138"
AMD supports FMA4, Intel FMA3
...
the chances of a programmer actually using this stuff is actually kind of rare
im not so sure about the changes of a programmer using avx instructions common to both camps being so rare

cadaveca, post: 3400550, member: 25138"
mod software just to be able to break stuff
cadaveca, post: 3400550, member: 25138"
don't mod software
changing config settings is now considered modding? the guys changed a text file flag...

besides this looks to me a lot like a marriage of the fdiv and f00f bugs (one only triggered on some situations, the other promptly halted execution until reset), and while people dont run prime95 fft mode with avx acceleration all the time, i still wouldnt want to use a cpu that might trip on some avx usage scenario
Posted on Reply
#16
cadaveca
My name is Dave
truth teller, post: 3400568, member: 158112"
im not so sure about the changes of a programmer using avx instructions common to both camps being so rare



changing config settings is now considered modding? the guys changed a text file flag...

besides this looks to me a lot like a marriage of the fdiv and f00f bugs (one only triggered on some situations, the other promptly halted execution until reset), and while people dont run prime95 fft mode with avx acceleration all the time, i still wouldnt want to use a cpu that might trip on some avx usage scenario
I mean, I've got like 10+ months of running Skylake (before retail launch), and have yet to see any problem, so yeah, it's rare now, and only really triggered by changing the default configuration of the software, which, yeah, I call modding. Most users will download and run the software without any such changes... until now. Most users won't even have any idea that changing options this way is possible. You do have to create a file that didn't exist prior and add it to the program, so yeah, modding.

However, you are right, it is possible that in the future there will be some software that causes this bug to trigger. Yet by then, these BIOS updates will probably be out.
Posted on Reply
#17
newtekie1
Semi-Retired Folder
FordGT90Concept, post: 3400421, member: 60463"
No BIOS updates since December. I'm sure that doesn't contain a fix...
I'm guessing we won't see updated BIOSes for a few weeks at least. Unless you have an eVGA board, then you won't see an updated BIOS at all...
Posted on Reply
#18
R-T-B
newtekie1, post: 3400585, member: 20670"
I'm guessing we won't see updated BIOSes for a few weeks at least. Unless you have an eVGA board, then you won't see an updated BIOS at all...
is eVGA really that bad with bios updates?
Posted on Reply
#19
EarthDog
Much ado about nothing... run P95 and modify the length? You MAY have an issue. Otherwise, nothing to really worry about.
Posted on Reply
#20
mcraygsx
This is exactly the reason enthusiasts ( Extreme edition ) CPU should not be released first. Let the mainstream users Beta test the new CPU and re release once they are perfected ( aka Devils Canyon, Kaby lake etc)
Posted on Reply
#21
newtekie1
Semi-Retired Folder
R-T-B, post: 3400640, member: 41983"
is eVGA really that bad with bios updates?
Still waiting on a promised BIOS update for my P55 FTW 200 motherboard to fix the broken front side USB ports...

So, I'd say yes.

cadaveca, post: 3400574, member: 25138"
However, you are right, it is possible that in the future there will be some software that causes this bug to trigger. Yet by then, these BIOS updates will probably be out.
From what I understand, they are actually disabling FMA3, forcing Prime95 to use the older AVX. That is what is causing the bug, and AFAIK, AMD supports AVX in their processors too(from the FX line on). So it could be something that crops up if you have a program that uses the older AVX. Unless disabling FM3A causes Prime95 to switch to AVX2.0, which AMD doesn't support. In which case you're right, none but a very select few Intel only programs will show an issue.
Posted on Reply
#22
xorbe
truth teller, post: 3400535, member: 158112"
so they are sure that this major hardware cockup can be fixed in microcode? for all cases and not just this fft avx subroutine implementation? or can we expect multiple "fixes" from here on out for other "might freeze" scenarios?
It only seems to happen with hyper-threading enabled if I understood correctly -- so that points to the instruction being right, but some sort of resource sharing being wrong, perhaps dispatch rules.
Posted on Reply
#23
HisDivineOrder
This would be a great time to also "fix" that whole motherboards that allow non-K CPU's to overclock "problem."

Y'know? Because stability. Or something.
Posted on Reply
#24
R-T-B
I did a bit of homework on this issue.

I'm a bit of a bios modder in my free time.

It appears this issue got fixed with Skylake Microcode update 56 (that's hexidecimal 56), possibly even earlier, I didn't test every version, but patching my bios with that microcode fixed this issue.

Here's the kicker: Microcode update 56 came out 10-24-2015. That's right, almost 3 months ago.

Yep, Intel has had this fixed for a while... Not sure if @btarunr wants to note that or not. ;)

Meanwhile, nearly all skylake boards I played with have old revisions down in the 30-4A range. My gigabyte board I use was particularly low. No idea why the board vendors do this, but they really dropped the ball here.

So blame the board vendors... for not actually using the code intel releases.

I'm skilled enough to patch the microcode on most bioses to the right version if anyone wants to contact me and accept the "if it's bricked it's not my fault" policy etc etc.

Here's a relevant screenshot. Before applying intels proper microcode, this would've crashed long ago:

Posted on Reply
#25
cadaveca
My name is Dave
R-T-B, post: 3400960, member: 41983"
I did a bit of homework on this issue.

I'm a bit of a bios modder in my free time.

It appears this issue got fixed with Skylake Microcode update 56 (that's hexidecimal 56), possibly even earlier, I didn't test every version, but patching my bios with that microcode fixed this issue.

Here's the kicker: Microcode update 56 came out 10-24-2015. That's right, almost 3 months ago.

Yep, Intel has had this fixed for a while... Not sure if @btarunr wants to note that or not. ;)

Meanwhile, nearly all skylake boards I played with have old revisions down in the 30-4A range. My gigabyte board I use was particularly low. No idea why the board vendors do this, but they really dropped the ball here.

So blame the board vendors... for not actually using the code intel releases.

I'm skilled enough to patch the microcode on most bioses to the right version if anyone wants to contact me and accept the "if it's bricked it's not my fault" policy etc etc.

Here's a relevant screenshot. Before applying intels proper microcode, this would've crashed long ago:


If you can combine the non-K OC bits with this MC update... We needs to have a chat. I'll be PM'ing you about this soon.
Posted on Reply
Add your own comment