• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Hardware Bug in AMD CPU Family

IlluminAce

New Member
Joined
Aug 6, 2011
Messages
46 (0.01/day)
Location
UK
System Name Ace2
Processor Intel i7 2600
Motherboard ASRock Extreme4 Gen3
Cooling Zalman CNPS10x Extreme
Memory Corsair Vengeance LP 16GB (4x4)
Video Card(s) Asus HD 6970 DirectCUII
Storage 4x Samsung 1TB 7.2krpm
Display(s) 1x 24" 16:10, 1x 20" 16:10, 3x 19" 5:4
Case Fractal Design R3
Audio Device(s) TBD
Power Supply Corsair HX850W
Software Debian dom0 (on Xen hypervisor)
AMD's CPU range has a problem updating its stack pointer. Ouch!

Read all about it.

This has been confirmed on one of the Phenom II X4 range's CPUs and even on a high-end Opteron. AMD's confirmation indicates it probably affects a nice range of their CPUs. The problem comes to light only in a very specific case - the best I can ascertain, it's when you're in a rather specific section of the stack (possibly requiring stack randomization), and when a very particular situation arises with a particular arrangement of assembly calls involving a sequence of pops (so really in deep recursion) and some NOPs.

It was found by the man behind DragonFly BSD - a pretty nifty fork of OpenBSD which has undergone extensive kernel rewriting, so this guy knew his stuff, and put the requisite time in (over the course of a year) to track this blighter of a bug down. Kudos to Matthew Dillon for his efforts.

Before everybody dashes out to buy Intels, AFAIK the bug has only exhibited (or been noticed, at least) on DragonFly BSD in a particular method called by the GCC implementation in use there, and only very irregularly at that. So you should be safe :cool:

*segfault*
 
Last edited:
Joined
Jul 19, 2006
Messages
43,587 (6.71/day)
Processor AMD Ryzen 7 7800X3D
Motherboard ASUS TUF x670e
Cooling EK AIO 360. Phantek T30 fans.
Memory 32GB G.Skill 6000Mhz
Video Card(s) Asus RTX 4090
Storage WD m.2
Display(s) LG C2 Evo OLED 42"
Case Lian Li PC 011 Dynamic Evo
Audio Device(s) Topping E70 DAC, SMSL SP200 Headphone Amp.
Power Supply FSP Hydro Ti PRO 1000W
Mouse Razer Basilisk V3 Pro
Keyboard Tester84
Software Windows 11

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.98/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
The articles came out the same day, yesterday, so it's not old news.

I agree it's not the kind of bug to make your PC crash and burn, however. Still, it's a bug and will be fixed.
 
Joined
Mar 10, 2010
Messages
11,878 (2.30/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
I agree it's not the kind of bug to make your PC crash and burn, however. Still, it's a bug and will be fixed.

both intel and AMD keep updated lists of known faults for their cpus, its been posted on here before im sure, each cpu seems to have a massive list of known bugs and errors but they chunder away, odd isnt it:rolleyes:
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.98/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
both intel and AMD keep updated lists of known faults for their cpus, its been posted on here before im sure, each cpu seems to have a massive list of known bugs and errors but they chunder away, odd isnt it:rolleyes:

Yeah, that goes with what I said. Basically, the CPUs are pretty bug-free in all the usual operations they do, leaving more obscure code sequences with errors in them, which don't get used very often. That and the workarounds that developers use for known ensures that the systems keep running ok. Occasionally, bad errors like the Phenom TLB bug crop up, which put a kink in a CPU.
 

IlluminAce

New Member
Joined
Aug 6, 2011
Messages
46 (0.01/day)
Location
UK
System Name Ace2
Processor Intel i7 2600
Motherboard ASRock Extreme4 Gen3
Cooling Zalman CNPS10x Extreme
Memory Corsair Vengeance LP 16GB (4x4)
Video Card(s) Asus HD 6970 DirectCUII
Storage 4x Samsung 1TB 7.2krpm
Display(s) 1x 24" 16:10, 1x 20" 16:10, 3x 19" 5:4
Case Fractal Design R3
Audio Device(s) TBD
Power Supply Corsair HX850W
Software Debian dom0 (on Xen hypervisor)
Quite right, the errata lists are surprisingly extensive (or unsurprisingly if you consider the complexity). However, this fault was previously unreported in the errata, and will exhibit as a segfault given the right conditions. Moreover, it's almost impossible to track down. It's far from inconceivable that this issue could be behind a variety of unexplained segfaults on production systems. They certainly were in Matthew's case, and his usage was relatively lightweight, if slightly specific.

Such an issue exhibiting on a home system could be put down to unstable hardware - too high OC/temps for example, or dodgy RAM, or an OS or userland software bug. We all know of them occurring; who knows, the odd one may have had just such a root cause. Ultimately it's not likely to cause us any major headaches.

As for whether it's a big deal, I'd have to disagree erocker. Just because something happens irregularly and under specific workloads doesn't make it unimportant, especially when an entire family of CPUs is affected. With Opterons, we're talking about the backbone of many a prod app/DB server and grid computation node. In the case of the former, a single segfault can be completely catastrophic; in the latter, occasional errors would often go largely uninvestigated, or assumptions made as to unstable hardware. If it only affected one particular model, or was a fault in a keyboard for example, then fair enough; but (probably a large subset of) an entire CPU family is another matter completely. If you have a datacentre of Opterons and do experience occasional segfaults which you haven't managed to track down... you now have an interesting decision to make :)

Whilst we shouldn't jump to conclusions, AMD's final errata statement will make for interesting reading for many infra teams and sysadmins, I'm sure.
 

trickson

OH, I have such a headache
Joined
Dec 5, 2004
Messages
7,595 (1.07/day)
Location
Planet Earth.
System Name Ryzen TUF.
Processor AMD Ryzen7 3700X
Motherboard Asus TUF X570 Gaming Plus
Cooling Noctua
Memory Gskill RipJaws 3466MHz
Video Card(s) Asus TUF 1650 Super Clocked.
Storage CB 1T M.2 Drive.
Display(s) 73" Soney 4K.
Case Antech LanAir Pro.
Audio Device(s) Denon AVR-S750H
Power Supply Corsair TX750
Mouse Optical
Keyboard K120 Logitech
Software Windows 10 64 bit Home OEM
This is not really a big deal at all. Intel has bugs AMD has bugs, Maybe they should hire a good exterminator for there FAB plants.
 
Joined
Jul 19, 2006
Messages
43,587 (6.71/day)
Processor AMD Ryzen 7 7800X3D
Motherboard ASUS TUF x670e
Cooling EK AIO 360. Phantek T30 fans.
Memory 32GB G.Skill 6000Mhz
Video Card(s) Asus RTX 4090
Storage WD m.2
Display(s) LG C2 Evo OLED 42"
Case Lian Li PC 011 Dynamic Evo
Audio Device(s) Topping E70 DAC, SMSL SP200 Headphone Amp.
Power Supply FSP Hydro Ti PRO 1000W
Mouse Razer Basilisk V3 Pro
Keyboard Tester84
Software Windows 11
I said "doesn't seem to be a big deal". I've run AMD for years.. Still have a s754 system that's been running 24/7 for about 6 years now. No bugs to report. People can make this bug out to whatever they want it to be or mean to them. ;)
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.98/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
There's an update to the story now over at tng, part of which are exclusive. ;)
 

IlluminAce

New Member
Joined
Aug 6, 2011
Messages
46 (0.01/day)
Location
UK
System Name Ace2
Processor Intel i7 2600
Motherboard ASRock Extreme4 Gen3
Cooling Zalman CNPS10x Extreme
Memory Corsair Vengeance LP 16GB (4x4)
Video Card(s) Asus HD 6970 DirectCUII
Storage 4x Samsung 1TB 7.2krpm
Display(s) 1x 24" 16:10, 1x 20" 16:10, 3x 19" 5:4
Case Fractal Design R3
Audio Device(s) TBD
Power Supply Corsair HX850W
Software Debian dom0 (on Xen hypervisor)
I said "doesn't seem to be a big deal". I've run AMD for years.. Still have a s754 system that's been running 24/7 for about 6 years now. No bugs to report. People can make this bug out to whatever they want it to be or mean to them. ;)

Quite, us end users are not likely to suffer much as a result of this - unless you do much compilation on DragonFly BSD ;) (but, seriously, I do like its tenents and the work that's gone into it. I might give it a spin soon). Perhaps on the odd occasion us 24/7'ers might encounter this bug without realising, but that's nothing too serious from our perspectives. Thankfully it corrupts the sp rather than eax for example - if it would occasionally silently corrupt my computations, I'd be a lot more concerned.

But as you say, it's what you make of it, and for those of us doing serious computing - large organisations with big datacentres - this sort of rare, intermittent problem can present pretty nasty real-world problems. Thankfully I deal with programming on grids as opposed to supporting them!
 
Top