• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Ryzen Machine Crashes to a Sequence of FMA3 Instructions

The haswell-specific bench runs an FMA3 industry-standard instruction, which takes down the FMA3-supporting Ryzen (and not FMA3-supporting Skylake).

This is news because an unprivileged application can take down a machine (and is hence a security hole). Would a company like Barclay's put its client live database on a "Naples" machine now?

What would be more interesting is to hear about it crashing on consumer stress tests which use that instruction set....
 
sounds patcheable
FMA4 for Zen is long-disabled in binutils, as probably in MS Visual Studio.
Also, it is confirmed that w/o SMT the benchmark is running fine, so the problem is not FMA, but once again - SMT conflicts.
...khm-khm... OpenMP... khm-khm....
 
Someone running Naples will likely have their own application coded to run on the Ryzen server, they don't just copy/paste the aforementioned code to run on their application & crash (test) a server.
Nope, today it's all about virtualization and the cloud ... in that case single Naples server in the server farm is hosting multiple VMs that different business use for various public online services ... it's enough that only one of these businesses allow their users to store something executable on the host and after one malicious (or unlucky) user - bam, all VMs on the node are down.

Amazon Cloud Node -> N x Business -> N x M x EndUsers
 
No, my point is the disgruntled IT guy Barclay's just fired could crash a "Naples" powered server with just this "little known program."

I can crash our intel servers easily with some code.
Nothing new here, move along...
 
We support it!!!

Oh..But don't use it lol
 
If this benchmark things are tailored to such specific level that they differentiate even SERIES within SAME VENDOR, why the hell is this a news?

"this important is because a simple application, running at user privileges (i.e. lacking special super-user/admin privileges), has the ability to crash the machine."
 
It's funny how people seem to be missing the point in this article, anyway, I hope AMD is able to fix this.
Agreed, the whole notion of it being exploitable at least seems to be missing in their thoughts
 
I've been lurking here for years but I feel the need to say something.

Guys, this IS a big deal. As others have noted an unknown instruction is supposed to raise an "Undefined Opcode" exception, something that predates even 16-bit protected mode. On CPUs which offer 'User' and 'Kernel' mode (ie everything since the mid 80s) the exception is handled by the operating system, which usually just kills off the process. The whole idea of User mode is that no User mode program can screw with the system without 'permission' from the OS.

This is similar to the Cyrix coma bug or the Pentium F00F bug. However I agree that this can probably be fixed in microcode.
 
Really guys?

Lets look at the Intel 7700K errata list.

"
Revision
Description
Date
001
Initial release
August 2016
002
• Errata
 Added errata KBL068-078
 Updated erratum KBL062
 Fixed erratum KBL063
November 2016
003
• Added SKUs Y/U w/iHDCP2.2, S/H-Processor lines
• Added Table 2, S/H-Processor Lines Component Identification
• Identification Information
 Added Table 4, Y-Processor Line With iHDCP2.2
 Added Table 6, U-Processor Line With iHDCP2.2
 Added Figure 3, S-Processor Line LGA Top-Side Markings
 Added Table 7, S-Processor Line
 Added Figure 4, H-Processor Line BGA Top-Side Markings
 Added Table 8, H-Processor Line
• Errata
 Updated Table 13, Errata Summary Table
 Added errata KBL079-083
January 2017
004
• Identification Information
 Updated Table 4, Y-Processor Line With iHDCP2.2
• Errata
 Updated Table 13, Errata Summary Table. Added J-1 stepping
 Updated KBL080
 Added errata KBL084-091
February 2017
§"

All processors have flaws, and a future stepping, or even current stepping with an update to microcode.

Big deal if left unpatched or unfixed? Yep. Will it be fixed? Yep.
 
The whole idea of User mode is that no User mode program can screw with the system without 'permission' from the OS.
Regardless of your suggestive nickname, I assume you've never played pranks on your co-workers with NtRaiseHardError, or dumb overflow vulnerabilities.
There are dozens of ways you can hang, BSOD, mess up your machine from userspace.

TO EVERYONE:
It's not even known or clear, whether the bug pertains to FMA instructions at all. It was only assumed, because benchmark BSODed on the FMA3 256bit benchmark stage, and only with SMT enabled.
The reason could be anything, from Windows bug, or libgomp bug, or SMT on Zen itself, or some other unknown factor.
Let's not jump to any conclusions before even knowing what the problem is.
 
Regardless of your suggestive nickname, I assume you've never played pranks on your co-workers with NtRaiseHardError, or dumb overflow vulnerabilities.
There are dozens of ways you can hang, BSOD, mess up your machine from userspace.

That is why I put the word "permission" in quote marks.:) I consider those methods to be software bugs, the CPU itself isn't to blame (minus errata problems of course).

BTW I just now read the HWbot post. For some reason I thought it was a reset like a triple fault. The Coma and F00F bugs were a better analogy than I realized.

I actually have written a simple operating system, though I wouldn't recommend designing as you go like I did.
 
Last edited:
@darkangel0504 awesome pic :)

(random i know; but he/she have their profile private)
 
Would a company like Barclay's put its client live database on a "Naples" machine now?

Are there any Naples servers running now with "live" client database? When there is it will be a problem, for now these enthusiast CPU just shut-down "crash" the system. Not a great option but better then the data being compromised. I'm sure this will be fixed especially when "Naples" sever equipment actually goes live.
 
no cpu is perfect as those who designed & produced them aren't also...neither the universe is not and nobody can understand or patch it...
 
They slide on diesel tho :)


..........absolute Genius!!!!!!!!! A hemi powered Tesla that runs on diesel must be created!!!!!!!!!!!!!!!
 
Really guys?

Lets look at the Intel 7700K errata list.

"
Revision
Description
Date
001
Initial release
August 2016
002
• Errata
 Added errata KBL068-078
 Updated erratum KBL062
 Fixed erratum KBL063
November 2016
003
• Added SKUs Y/U w/iHDCP2.2, S/H-Processor lines
• Added Table 2, S/H-Processor Lines Component Identification
• Identification Information
 Added Table 4, Y-Processor Line With iHDCP2.2
 Added Table 6, U-Processor Line With iHDCP2.2
 Added Figure 3, S-Processor Line LGA Top-Side Markings
 Added Table 7, S-Processor Line
 Added Figure 4, H-Processor Line BGA Top-Side Markings
 Added Table 8, H-Processor Line
• Errata
 Updated Table 13, Errata Summary Table
 Added errata KBL079-083
January 2017
004
• Identification Information
 Updated Table 4, Y-Processor Line With iHDCP2.2
• Errata
 Updated Table 13, Errata Summary Table. Added J-1 stepping
 Updated KBL080
 Added errata KBL084-091
February 2017
§"

All processors have flaws, and a future stepping, or even current stepping with an update to microcode.

Big deal if left unpatched or unfixed? Yep. Will it be fixed? Yep.

What's your point? What are you trying to say? TPU is simply reporting the news. Is this serious if left unfixed? Yes. Should TPU just stop reporting stuffs? No.
 
I'm pretty amazed by the comments...

It seems most people really don't understand how this problem works - looking at all the comments saying that you can crash any system with some code (and the Tesla on diesel stuff as well...)

And because many of you have already said that this can be PROBABLY fixed by microcode, it's almost natural to ask a question: what if it can't be fixed? :) Any bets?

Either way, IMO this is another sign that there's something deeply wrong with Ryzen architecture (most likely the SMT implementation). It's all very worrying. :/
 
As predicted by many ...
This issue will be fixed in a new AGESA [AMD Generic Encapsulated Software Architecture] microcode
 
What's your point? What are you trying to say? TPU is simply reporting the news. Is this serious if left unfixed? Yes. Should TPU just stop reporting stuffs? No.

Plenty of damage control going on at the moment.
 
I'm pretty amazed by the comments...

It seems most people really don't understand how this problem works - looking at all the comments saying that you can crash any system with some code (and the Tesla on diesel stuff as well...)

And because many of you have already said that this can be PROBABLY fixed by microcode, it's almost natural to ask a question: what if it can't be fixed? :) Any bets?

Either way, IMO this is another sign that there's something deeply wrong with Ryzen architecture (most likely the SMT implementation). It's all very worrying. :/

So prime, realbench for days, and then games all that use SMT didn't crash once. This program crashed that they admit does not currently support Zen. So what is so deeeeeply wrong with zen? Sound like you are more interested in exaggerating the problem. Your comment was fine until the last sentence where you made it a major flaw. This will likely be fixed with micro code update if anything.
 
Back
Top