• We've upgraded our forums. Please post any issues/requests in this thread.

Hardware Bug in AMD CPU Family

IlluminAce

New Member
Joined
Aug 6, 2011
Messages
46 (0.02/day)
Likes
40
Location
UK
System Name Ace2
Processor Intel i7 2600
Motherboard ASRock Extreme4 Gen3
Cooling Zalman CNPS10x Extreme
Memory Corsair Vengeance LP 16GB (4x4)
Video Card(s) Asus HD 6970 DirectCUII
Storage 4x Samsung 1TB 7.2krpm
Display(s) 1x 24" 16:10, 1x 20" 16:10, 3x 19" 5:4
Case Fractal Design R3
Audio Device(s) TBD
Power Supply Corsair HX850W
Software Debian dom0 (on Xen hypervisor)
#1
AMD's CPU range has a problem updating its stack pointer. Ouch!

Read all about it.

This has been confirmed on one of the Phenom II X4 range's CPUs and even on a high-end Opteron. AMD's confirmation indicates it probably affects a nice range of their CPUs. The problem comes to light only in a very specific case - the best I can ascertain, it's when you're in a rather specific section of the stack (possibly requiring stack randomization), and when a very particular situation arises with a particular arrangement of assembly calls involving a sequence of pops (so really in deep recursion) and some NOPs.

It was found by the man behind DragonFly BSD - a pretty nifty fork of OpenBSD which has undergone extensive kernel rewriting, so this guy knew his stuff, and put the requisite time in (over the course of a year) to track this blighter of a bug down. Kudos to Matthew Dillon for his efforts.

Before everybody dashes out to buy Intels, AFAIK the bug has only exhibited (or been noticed, at least) on DragonFly BSD in a particular method called by the GCC implementation in use there, and only very irregularly at that. So you should be safe :cool:

*segfault*
 
Last edited:

erocker

Senior Moderator
Staff member
Joined
Jul 19, 2006
Messages
42,380 (10.17/day)
Likes
18,023
Processor Intel i7 8700k
Motherboard Gigabyte z370 AORUS Gaming 7
Cooling Water
Memory 16gb G.Skill 4000 MHz DDR4
Video Card(s) Evga GTX 1080
Storage 3 x Samsung Evo 850 500GB, 1 x 250GB, 2 x 2TB HDD
Display(s) Nixeus EDG27
Case Thermaltake X5
Power Supply Corsair HX1000i
Mouse Zowie EC1-B
Software Windows 10
#2

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
14,551 (3.97/day)
Likes
8,059
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K at stock (hits 5 gees+ easily)
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (4 x 4GB Corsair Vengeance DDR3 PC3-12800 C9 1600MHz)
Video Card(s) Zotac GTX 1080 AMP! Extreme Edition
Storage Samsung 850 Pro 256GB | WD Green 4TB
Display(s) BenQ XL2720Z | Asus VG278HE (both 27", 144Hz, 3D Vision 2, 1080p)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair HX 850W v1
Software Windows 10 Pro 64-bit
#3
The articles came out the same day, yesterday, so it's not old news.

I agree it's not the kind of bug to make your PC crash and burn, however. Still, it's a bug and will be fixed.
 
Joined
Mar 10, 2010
Messages
4,984 (1.76/day)
Likes
1,551
Location
Manchester uk
System Name Quad GT evo V
Processor FX8350 @ 4.8ghz1.525c NB2.64ghz Ht2.84ghz
Motherboard Gigabyte 990X Gaming
Cooling 360EK extreme 360Tt rad all push/pull, cpu,NB/Vrm blocks all EK
Memory Corsair vengeance 32Gb @1333 cas9
Video Card(s) Rx vega 64 waterblockedEK + Rx580 waterblockedEK
Storage samsung 840(250), WD 1Tb+2Tb +3Tbgrn 1tb hybrid
Display(s) Samsung uea28"850R 4k freesync, samsung 40" 1080p
Case Custom(modded) thermaltake Kandalf
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup
Power Supply corsair 1000Rmx
Mouse CM optane
Keyboard CM optane
Software Win 10 Pro
Benchmark Scores 15.69K best overall sandra so far
#4
I agree it's not the kind of bug to make your PC crash and burn, however. Still, it's a bug and will be fixed.
both intel and AMD keep updated lists of known faults for their cpus, its been posted on here before im sure, each cpu seems to have a massive list of known bugs and errors but they chunder away, odd isnt it:rolleyes:
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
14,551 (3.97/day)
Likes
8,059
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K at stock (hits 5 gees+ easily)
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (4 x 4GB Corsair Vengeance DDR3 PC3-12800 C9 1600MHz)
Video Card(s) Zotac GTX 1080 AMP! Extreme Edition
Storage Samsung 850 Pro 256GB | WD Green 4TB
Display(s) BenQ XL2720Z | Asus VG278HE (both 27", 144Hz, 3D Vision 2, 1080p)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair HX 850W v1
Software Windows 10 Pro 64-bit
#5
both intel and AMD keep updated lists of known faults for their cpus, its been posted on here before im sure, each cpu seems to have a massive list of known bugs and errors but they chunder away, odd isnt it:rolleyes:
Yeah, that goes with what I said. Basically, the CPUs are pretty bug-free in all the usual operations they do, leaving more obscure code sequences with errors in them, which don't get used very often. That and the workarounds that developers use for known ensures that the systems keep running ok. Occasionally, bad errors like the Phenom TLB bug crop up, which put a kink in a CPU.
 

IlluminAce

New Member
Joined
Aug 6, 2011
Messages
46 (0.02/day)
Likes
40
Location
UK
System Name Ace2
Processor Intel i7 2600
Motherboard ASRock Extreme4 Gen3
Cooling Zalman CNPS10x Extreme
Memory Corsair Vengeance LP 16GB (4x4)
Video Card(s) Asus HD 6970 DirectCUII
Storage 4x Samsung 1TB 7.2krpm
Display(s) 1x 24" 16:10, 1x 20" 16:10, 3x 19" 5:4
Case Fractal Design R3
Audio Device(s) TBD
Power Supply Corsair HX850W
Software Debian dom0 (on Xen hypervisor)
#6
Quite right, the errata lists are surprisingly extensive (or unsurprisingly if you consider the complexity). However, this fault was previously unreported in the errata, and will exhibit as a segfault given the right conditions. Moreover, it's almost impossible to track down. It's far from inconceivable that this issue could be behind a variety of unexplained segfaults on production systems. They certainly were in Matthew's case, and his usage was relatively lightweight, if slightly specific.

Such an issue exhibiting on a home system could be put down to unstable hardware - too high OC/temps for example, or dodgy RAM, or an OS or userland software bug. We all know of them occurring; who knows, the odd one may have had just such a root cause. Ultimately it's not likely to cause us any major headaches.

As for whether it's a big deal, I'd have to disagree erocker. Just because something happens irregularly and under specific workloads doesn't make it unimportant, especially when an entire family of CPUs is affected. With Opterons, we're talking about the backbone of many a prod app/DB server and grid computation node. In the case of the former, a single segfault can be completely catastrophic; in the latter, occasional errors would often go largely uninvestigated, or assumptions made as to unstable hardware. If it only affected one particular model, or was a fault in a keyboard for example, then fair enough; but (probably a large subset of) an entire CPU family is another matter completely. If you have a datacentre of Opterons and do experience occasional segfaults which you haven't managed to track down... you now have an interesting decision to make :)

Whilst we shouldn't jump to conclusions, AMD's final errata statement will make for interesting reading for many infra teams and sysadmins, I'm sure.
 

trickson

OH, I have such a headache
Joined
Dec 5, 2004
Messages
6,486 (1.36/day)
Likes
927
Location
Planet Earth.
Processor Q9650
Motherboard Gigabyte.
Cooling air.
Memory 4gb kingston
Video Card(s) hd 5870
Software win7 64 bit
#7
This is not really a big deal at all. Intel has bugs AMD has bugs, Maybe they should hire a good exterminator for there FAB plants.
 

erocker

Senior Moderator
Staff member
Joined
Jul 19, 2006
Messages
42,380 (10.17/day)
Likes
18,023
Processor Intel i7 8700k
Motherboard Gigabyte z370 AORUS Gaming 7
Cooling Water
Memory 16gb G.Skill 4000 MHz DDR4
Video Card(s) Evga GTX 1080
Storage 3 x Samsung Evo 850 500GB, 1 x 250GB, 2 x 2TB HDD
Display(s) Nixeus EDG27
Case Thermaltake X5
Power Supply Corsair HX1000i
Mouse Zowie EC1-B
Software Windows 10
#8
I said "doesn't seem to be a big deal". I've run AMD for years.. Still have a s754 system that's been running 24/7 for about 6 years now. No bugs to report. People can make this bug out to whatever they want it to be or mean to them. ;)
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
14,551 (3.97/day)
Likes
8,059
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K at stock (hits 5 gees+ easily)
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (4 x 4GB Corsair Vengeance DDR3 PC3-12800 C9 1600MHz)
Video Card(s) Zotac GTX 1080 AMP! Extreme Edition
Storage Samsung 850 Pro 256GB | WD Green 4TB
Display(s) BenQ XL2720Z | Asus VG278HE (both 27", 144Hz, 3D Vision 2, 1080p)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair HX 850W v1
Software Windows 10 Pro 64-bit
#9
There's an update to the story now over at tng, part of which are exclusive. ;)
 

IlluminAce

New Member
Joined
Aug 6, 2011
Messages
46 (0.02/day)
Likes
40
Location
UK
System Name Ace2
Processor Intel i7 2600
Motherboard ASRock Extreme4 Gen3
Cooling Zalman CNPS10x Extreme
Memory Corsair Vengeance LP 16GB (4x4)
Video Card(s) Asus HD 6970 DirectCUII
Storage 4x Samsung 1TB 7.2krpm
Display(s) 1x 24" 16:10, 1x 20" 16:10, 3x 19" 5:4
Case Fractal Design R3
Audio Device(s) TBD
Power Supply Corsair HX850W
Software Debian dom0 (on Xen hypervisor)
#10
I said "doesn't seem to be a big deal". I've run AMD for years.. Still have a s754 system that's been running 24/7 for about 6 years now. No bugs to report. People can make this bug out to whatever they want it to be or mean to them. ;)
Quite, us end users are not likely to suffer much as a result of this - unless you do much compilation on DragonFly BSD ;) (but, seriously, I do like its tenents and the work that's gone into it. I might give it a spin soon). Perhaps on the odd occasion us 24/7'ers might encounter this bug without realising, but that's nothing too serious from our perspectives. Thankfully it corrupts the sp rather than eax for example - if it would occasionally silently corrupt my computations, I'd be a lot more concerned.

But as you say, it's what you make of it, and for those of us doing serious computing - large organisations with big datacentres - this sort of rare, intermittent problem can present pretty nasty real-world problems. Thankfully I deal with programming on grids as opposed to supporting them!