• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

My research into AMD's Linux "Performance Marginality" issue:

Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
I've been doing some behind the scenes research into AMD's so called Linux "Performance Marginality." When I initially began researching this, I had big plans to write an independent research script to attempt to prove the crash can happen in Windows with a program to prove it. Unfortunately, I never quite got there, and it appears I may even have been off on my expected results. The crash is triggered by ASLR, and Windows doesn't use this, generally. Javascript might, but find me any webpage that spawns a 16 thread javascript process that isn't mining coins malware style and I'll be genuinely shocked.

What did come of this is a document where I detailed my results with the RMA. It appears if nothing else, there is heavy evidence indicating there is not a new stepping, but actually just improved binning to mitigate the issue amongst those whom complain. It's circumstantial evidence at this point, but given AMD has declined to comment repeatedly when asked how they fix this, I am very very suspicious at this point they aren't simply gluing threadripper grade dies to Ryzen CPUs on request, and standard Ryzen grade CPUs simply don't have a fully functional ASLR function under load (at least, at the binning level they chose).

I'm putting the document I typed up below, including evidence, in hopes you guys can do more research and maybe find enough to make this case a bit more than circumstantial. As it is, I'm out of time and energy to pursue this further, but it certainly seems suspect.

BEGIN PM (Originally sent to W1zzard and company, advised to share with community):

As a user of Gentoo Linux, I have been hit hard by the so-called Ryzen “Performance Marginality.” This manifests itself as an event in which several build jobs running concurrently will crash a random process on the system, usually (but not necessarily) one of the running build jobs. The problem is well documented, and AMD is offering RMAs to affected users. The thing is, that makes it sound like not everyone is affected. Truth be told, after a lot of online research, it is my opinion that anyone with a processor older than build week 25 is affected. Since anything newer than build week 20 has not made it into retail yet (at least, if user reports can be believed), this means nearly all Ryzen processors on the market at present time are affected by this issue.

This is a big deal, and not just on Linux. Why?

The issue vanishes in Linux with nearly all users when they turn off Kernel ASLR (Address Space Layout Randomization). This is a critical security feature that is not presently used much in Windows (and frankly, may never be) but is already being used inside web browsers in VMs like Javascript and similar. I’d be very interested in how a loaded Ryzen VM performs with Javascript longterm, for example. I’m sure this issue can manifest itself elsewhere if ASLR is truly being corrupted under load.

What else is newsworthy here? Well, the issue does not appear to be fixed. By that I mean, there is no new stepping. It appears by all accounts that the most likely “fix” for this issue AMD is employing is to simply bin the processor better (that means picking a better performing wafer of silicon). This also explains why Threadripper and EPYC are “unaffected.” They are ALREADY binned higher.

To test this theory, I submitted my processor for an RMA. All users are reportedly getting “fresh from the presses” Ryzen’s manufactured not too long ago. Personally, my theory is that they are being pulled straight from assembly line binning process and used for RMAs. The fact that my CPU took nearly 2 weeks to “prepare” but got to me almost overnight only supports this theory. Anyhow, my CPU is made in Week 33. You can see this vs my old Week 9 Ryzen compared below:





Note, in the images above, the older CPU container has a plastic shield that is much more “shiny” for some reason. It obscures the laser markings a bit but they should still be legible. I think it is just a packaging difference.

The new CPU has been opened on the bottom (no sticker), as prior reports indicated. It was also shipped rather pathetically. Unfortunately, I forgot to photograph this fact in my excitement, but I can certify there was no bottom “security” sticker and online reports support this. Have a look at the poor packaging anyways for kicks:



The CPU, as predicted, is much higher binned or otherwise a “golden” chip. It does 1.425v 4.1Ghz all cores where it took 1.475v to attain 4.0Ghz All cores on my old Ryzen. It also lets the IMC fly up to 3600Mhz where before, 3200Mhz was a struggle. Here are some relevant comparison shots.

A basic overview of my old Ryzen. Lacking memory/voltage tabs, but this is all I could ever push out of it, and my “daily driver” clocks were lower. IMC was at 3200 MHz with 4 Single rank Samsung B-Die DIMMS. Clock was 4Ghz with 1.475v.




My new Ryzen. Clocks higher, with less volts. Obviously better binned or otherwise golden. IMC goes outrageously high at 3600 MHz. Same memory/DIMMS as above.



Oh, and yes, the issue is fixed.

What does this all mean?

I think AMD is binning run of the mill Ryzen CPUs so low that ASLR is effectively broken as soon as things get "hot" under load. I don't have direct confirmation of this yet, but a lot of circumstantial evidence, mostly found via myself and this thread here:

https://community.amd.com/thread/215773

It's a long read, but the evidence is there, if you look. I'd recommend the later/within last 2 month posts as they cover the RMA process and reports of binning/testing going on prior to chip arrival.
 
Last edited:
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
After editing / typing all that, please let me remind you I'd like to keep this thread a informtation/research thread, no fanboyism allowed.

I'd like to start the discussion by asking if anyone knows a good javascript "stress test" of sorts one could run alongside say, Prime95. If my theory is right, it should eventually crash, or something equally strange will happen.

Right now I have JetStream 1.1 but I have no idea how to loop it long term.

http://browserbench.org/JetStream/
 
Last edited:
Joined
Sep 10, 2016
Messages
805 (0.29/day)
Location
Riverwood, Skyrim
System Name Storm Wrought | Blackwood (HTPC)
Processor AMD Ryzen 9 5900x @stock | i7 2600k
Motherboard Gigabyte X570 Aorus Pro WIFI m-ITX | Some POS gigabyte board
Cooling Deepcool AK620, BQ shadow wings 3 High Spd, stock 180mm |BQ Shadow rock LP + 4x120mm Noctua redux
Memory G.Skill Ripjaws V 2x32GB 4000MHz | 2x4GB 2000MHz @1866
Video Card(s) Powercolor RX 6800XT Red Dragon | PNY a2000 6GB
Storage SX8200 Pro 1TB, 1TB KC3000, 850EVO 500GB, 2+8TB Seagate, LG Blu-ray | 120GB Sandisk SSD, 4TB WD red
Display(s) Samsung UJ590UDE 32" UHD monitor | LG CS 55" OLED
Case Silverstone TJ08B-E | Custom built wooden case (Aus native timbers)
Audio Device(s) Onboard, Sennheiser HD 599 cans / Logitech z163's | Edifier S2000 MKIII via toslink
Power Supply Corsair HX 750 | Corsair SF 450
Mouse Microsoft Pro Intellimouse| Some logitech one
Keyboard GMMK w/ Zelio V2 62g (78g for spacebar) tactile switches & Glorious black keycaps| Some logitech one
VR HMD HTC Vive
Software Win 10 Edu | Ubuntu 22.04
Benchmark Scores Look in the various benchmark threads
Thanks for the information @R-T-B, it was an interesting read and I'm actually considering RMA'ing the ryzen CPU in my brother rig as a result
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
Thanks for the information @R-T-B, it was an interesting read and I'm actually considering RMA'ing the ryzen CPU in my brother rig as a result

If you do, be aware they make you go through a little song and dance routine of making sure your voltage/cooling settings are adequate and have you test a fairly crazy set of voltages. I personally (being this was before I resigned) just got fed up with it, posted my settings and voltages and flashed my press credentials, which got the process escalated immediately and had them overnight me a CPU (lulz). I'm told it "normally" takes a good few months, sadly.

EDIT:

Example:

Thank you for submitting your RMA. I’m sorry to hear that you’re experiencing stability issues with your system. Please be assured that I am here to help find a resolution to your problem


Before approving your RMA, I would like to firstly perform some troubleshooting and focus on your system’s hardware configuration.


Please provide the details of the following hardware components in your system:

• Make and model of motherboard?

• Motherboard BIOS version?

• Make and model of RAM?

• Make and model of the power supply unit?


Please could you let me know the current settings you have for the CPU VCORE, SOC, and RAM? It would be very helpful if you could provide with pictures of your BIOS screens with these settings.


In addition, through troubleshooting with other customers we have found that the layout of the components inside the system case have caused sub-optimal cooling of the CPU causing a variety of issues.


I would like to better understand your system cooling to rule out any thermal issues. Please could you provide a picture of the whole interior of your system showing the CPU cooler?


Also, could you let me know the reported CPU temperature during heavy load or when the errors occur?


Thanks for contacting AMD
 
Last edited:
Joined
Sep 10, 2016
Messages
805 (0.29/day)
Location
Riverwood, Skyrim
System Name Storm Wrought | Blackwood (HTPC)
Processor AMD Ryzen 9 5900x @stock | i7 2600k
Motherboard Gigabyte X570 Aorus Pro WIFI m-ITX | Some POS gigabyte board
Cooling Deepcool AK620, BQ shadow wings 3 High Spd, stock 180mm |BQ Shadow rock LP + 4x120mm Noctua redux
Memory G.Skill Ripjaws V 2x32GB 4000MHz | 2x4GB 2000MHz @1866
Video Card(s) Powercolor RX 6800XT Red Dragon | PNY a2000 6GB
Storage SX8200 Pro 1TB, 1TB KC3000, 850EVO 500GB, 2+8TB Seagate, LG Blu-ray | 120GB Sandisk SSD, 4TB WD red
Display(s) Samsung UJ590UDE 32" UHD monitor | LG CS 55" OLED
Case Silverstone TJ08B-E | Custom built wooden case (Aus native timbers)
Audio Device(s) Onboard, Sennheiser HD 599 cans / Logitech z163's | Edifier S2000 MKIII via toslink
Power Supply Corsair HX 750 | Corsair SF 450
Mouse Microsoft Pro Intellimouse| Some logitech one
Keyboard GMMK w/ Zelio V2 62g (78g for spacebar) tactile switches & Glorious black keycaps| Some logitech one
VR HMD HTC Vive
Software Win 10 Edu | Ubuntu 22.04
Benchmark Scores Look in the various benchmark threads
If you do, be aware they make you go through a little song and dance routine of making sure your voltage/cooling settings are adequate and have you test a fairly crazy set of voltages. I personally (being this was before I resigned) just got fed up with it, posted my settings and voltages and flashed my press credentials, which got the process escalated immediately and had them overnight me a CPU (lulz). I'm told it "normally" takes a good few months, sadly.

EDIT:

Example:
Thanks for the heads up on that it is a massive song and dance routine to go through.
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
Thanks for the heads up on that it is a massive song and dance routine to go through.

What finally got me was when the rep asked if I "had a cooler attached." o_O

I was like... you mean which cooler? No, just like do you, at all? I was like, no, not doing this anymore... summon supervisor! :laugh:
 
Joined
Nov 4, 2005
Messages
11,642 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
A few users experience this and out of thousands and its suddenly everyone has a problem, even when they experience none.
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
A few users experience this and out of thousands and its suddenly everyone has a problem, even when they experience none.

If you'd read this well researched thread, this is basically due to the lack of usage of ASLR outside of linux. It's similar to how no one "experienced" the old Prime95 avx bug despite everyone having it without wait for it... running Prime95.

This isn't a fanboy thread and I'd like to keep it free of that, thanks. The current best outcome would be to develop a windows tool to prove you are affected, and I have come seeking help for that.
 

INSTG8R

Vanguard Beta Tester
Joined
Nov 26, 2004
Messages
7,955 (1.13/day)
Location
Canuck in Norway
System Name Hellbox 5.1(same case new guts)
Processor Ryzen 7 5800X3D
Motherboard MSI X570S MAG Torpedo Max
Cooling TT Kandalf L.C.S.(Water/Air)EK Velocity CPU Block/Noctua EK Quantum DDC Pump/Res
Memory 2x16GB Gskill Trident Neo Z 3600 CL16
Video Card(s) Powercolor Hellhound 7900XTX
Storage 970 Evo Plus 500GB 2xSamsung 850 Evo 500GB RAID 0 1TB WD Blue Corsair MP600 Core 2TB
Display(s) Alienware QD-OLED 34” 3440x1440 144hz 10Bit VESA HDR 400
Case TT Kandalf L.C.S.
Audio Device(s) Soundblaster ZX/Logitech Z906 5.1
Power Supply Seasonic TX~’850 Platinum
Mouse G502 Hero
Keyboard G19s
VR HMD Oculus Quest 2
Software Win 10 Pro x64
A few users experience this and out of thousands and its suddenly everyone has a problem, even when they experience none.
Well that’s his point you “can” create the problem and easily in Linux just not as easy in Windows. Might not be an issue today but next year who knows some ASLR functionality in Windows appears and you’re now just realizing you’re on a bad CPU
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
Well that’s his point you “can” create the problem and easily in Linux just not as easy in Windows. Might not be an issue today but next year who knows some ASLR functionality in Windows appears and you’re now just realizing you’re on a bad CPU

Pretty much.

I'm also slightly alarmed that their "fix" seems to be simply to throw better binned silicon to people who complain, and not globally change the binning process. Unless maybe they have? I don't know, week 25+ cpus have not hit the market yet.
 
Joined
Feb 9, 2009
Messages
1,618 (0.29/day)
wait, how did you conclude it's a 'heat' issue or that different bins should result in different failure rates/times? i dont remember heat being mentioned on phoronix & its user comments

if week 25+, not to mention threadripper/epyc are 'permanently fixed', doesnt that mean it's more to do with physical microscopic manufacturing defects?

for some reason i never thought of this aspect of virtualization, is ASLR of a client actually randomized on the non-ASLR host's memory (at least within the preallocated chunk of the VM process)?

i want to know more about the ram limits, we really need to confirm if different cpus result in different memory support even after all the agesa updates

guess it's a good thing i've still been waiting & waiting due to the ram+nand+gpu price inflations before building...
 

Ahhzz

Moderator
Staff member
Joined
Feb 27, 2008
Messages
8,701 (1.48/day)
System Name OrangeHaze / Silence
Processor i7-13700KF / i5-10400 /
Motherboard ROG STRIX Z690-E / MSI Z490 A-Pro Motherboard
Cooling Corsair H75 / TT ToughAir 510
Memory 64Gb GSkill Trident Z5 / 32GB Team Dark Za 3600
Video Card(s) Palit GeForce RTX 2070 / Sapphire R9 290 Vapor-X 4Gb
Storage Hynix Plat P41 2Tb\Samsung MZVL21 1Tb / Samsung 980 Pro 1Tb
Display(s) 22" Dell Wide/24" Asus
Case Lian Li PC-101 ATX custom mod / Antec Lanboy Air Black & Blue
Audio Device(s) SB Audigy 7.1
Power Supply Corsair Enthusiast TX750
Mouse Logitech G502 Lightspeed Wireless / Logitech G502 Proteus Spectrum
Keyboard K68 RGB — CHERRY® MX Red
Software Win10 Pro \ RIP:Win 7 Ult 64 bit
If you'd read this well researched thread, this is basically due to the lack of usage of ASLR outside of linux. It's similar to how no one "experienced" the old Prime95 avx bug despite everyone having it without wait for it... running Prime95.

This isn't a fanboy thread and I'd like to keep it free of that, thanks. The current best outcome would be to develop a windows tool to prove you are affected, and I have come seeking help for that.
not sure to whom you're replying, but I'd say with your tone, there's a reason for that... *hint hint*
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
wait, how did you conclude it's a 'heat' issue or that different bins should result in different failure rates/times? i dont remember heat being mentioned on phoronix & its user comments

I don't know it's heat for certain (actaully, I more suspect it's load related since I wrote that). Frankly, all we really 100% know is for some reason the rma'd chips are binned better. Why is anyones guess, but I would assume it's because of poor binning causing the issue if we're going to conjecture.

not sure to whom you're replying, but I'd say with your tone, there's a reason for that... *hint hint*

I was replying to the quoted party.

Reason for what? Your comment is confusing. I'm not attempting any sort of tone, though maybe the old PM I copied and pasted to support these claims has one, I really didn"t check... my bad there. I'm all about sorting out what makes this issue tick and how AMD is handling it, nothing more.

For the record, AMD support deserves a gold star for how they treated me, though telling them I was a press member probably helped with that...
 
Last edited:

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.97/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
ASLR is effectively broken as soon as things get "hot" under load
I wonder if running more volts through the IMC would result in ASLR becoming more stable. It's entirely possible that ASLR is doing something in a particular way where the CPU becomes unstable and doesn't sound too different from another linux issue with the ocaml compiler where certain conditions could make the machine unstable. A lot like AVX, there are a number of things happening within a given CPU cycle and transistors that are more leaky are going to have more trouble switching at such high frequencies. If you're right and they're giving out better binned CPUs to get around it, it's entirely possible that a little more voltage in the right place might have the same effect but, resulting in more heat.
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
I wonder if running more volts through the IMC would result in ASLR becoming more stable. It's entirely possible that ASLR is doing something in a particular way where the CPU becomes unstable and doesn't sound too different from another linux issue with the ocaml compiler where certain conditions could make the machine unstable. A lot like AVX, there are a number of things happening within a given CPU cycle and transistors that are more leaky are going to have more trouble switching at such high frequencies. If you're right and they're giving out better binned CPUs to get around it, it's entirely possible that a little more voltage in the right place might have the same effect but, resulting in more heat.


Pre-RMA, I nearly fixed the issue by upping SOC voltage to 1.2v (later it came back with a vengance though), so you might be onto something.
 
Joined
Feb 9, 2009
Messages
1,618 (0.29/day)
Pre-RMA, I nearly fixed the issue by upping SOC voltage to 1.2v (later it came back with a vengance though), so you might be onto something.
it's not adding up, how can week25 or ALL threadrippers/epycs not have the issue? binning isnt exact, there are still variances, how would some small difference in target voltage or temperature or stable clock result in a very specific calculation error being permanently fixed?

the only logical way to test the bin hypothesis is by (running the errata scripts people made while) underclocking/overvolting/watercooling/timing loosening old ryzen cpus & overclocking/undervolting/overheating/timing tightening new ryzen cpus
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
it's not adding up, how can week25 or ALL threadrippers/epycs not have the issue?

Threadripper/EPYC have always been top 5% binned.

My current theory is that the reason all the rma'd cpus are "hot off the presses" is that they are essentially made to order with higher binned dies. Of course I could be wrong, but my build number was very close to when my RMA was approved.

We'll only really know when week 25+ cpus make it to market. It will be interesting to see if all of them are higher binned as well. All I know is RMA requests, for whatever reason, seem to be higher binned. It could be that AMD is just doing that for "added insurance" against a re-rma.

Oddly however, in contrary to my hypothesis, I can't seem to make my new Ryzen segfault by lowering soc volts to low low voltage (I tried 0.8v). I may be completely off on this afterall. I will fully admit a lot of this is my "best guess" for what is going on.
 
Joined
Nov 18, 2010
Messages
7,096 (1.46/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX + under waterblock.
Storage Optane 900P[W11] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO[FEDORA]
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) Sound Blaster ZxR
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 39 / Windows 11 insider
The thing with bins is not only with desktop parts.

Mobile does it and always did. You can buy two same phone models, but the difference between worst and best voltage bin is HUGE, heat and battery life wise. Community often does make graphs of their samples, pretty much logic looking charts. Also the the cheating with NAND speeds etc things... like screens with useless gorilla his a** or not... there are batches...

It is a lottery IMHO.
 
Joined
Feb 9, 2009
Messages
1,618 (0.29/day)
Threadripper/EPYC have always been top 5% binned.

My current theory is that the reason all the rma'd cpus are "hot off the presses" is that they are essentially made to order with higher binned dies. Of course I could be wrong, but my build number was very close to when my RMA was approved.

We'll only really know when week 25+ cpus make it to market. It will be interesting to see if all of them are higher binned as well. All I know is RMA requests, for whatever reason, seem to be higher binned. It could be that AMD is just doing that for "added insurance" against a re-rma.

Oddly however, in contrary to my hypothesis, I can't seem to make my new Ryzen segfault by lowering soc volts to low low voltage (I tried 0.8v). I may be completely off on this afterall. I will fully admit a lot of this is my "best guess" for what is going on.
how are they going to give old stock during rma? the old stock has been shipped to stores, there is no reason for them to keep some for rma since they are constantly manufacturing new ones, take some new ones as needed to fill rmas

i thought some week25 did hit the market, but dont remember

if TR/E is 5%, that's no guarantee, amd would have to be sure that something like top 30% are fine, but this goes against the official statement that week25+ is fine (unless they make a more convoluted binning process, but TR/E got released... around week25 didnt they, what's the oldest known week for one?)

was this issue confirmed on the fewer core models or only the 8cores?
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
Was there ever an official statement from AMD that week 25+ are ok? I was under the impression that was just a phronix claim/guesstimate.
 
Joined
Jul 25, 2006
Messages
11,955 (1.85/day)
Location
Nebraska, USA
System Name Brightworks Systems BWS-6 E-IV
Processor Intel Core i5-6600 @ 3.9GHz
Motherboard Gigabyte GA-Z170-HD3 Rev 1.0
Cooling Quality case, 2 x Fractal Design 140mm fans, stock CPU HSF
Memory 32GB (4 x 8GB) DDR4 3000 Corsair Vengeance
Video Card(s) EVGA GEForce GTX 1050Ti 4Gb GDDR5
Storage Samsung 850 Pro 256GB SSD, Samsung 860 Evo 500GB SSD
Display(s) Samsung S24E650BW LED x 2
Case Fractal Design Define R4
Power Supply EVGA Supernova 550W G2 Gold
Mouse Logitech M190
Keyboard Microsoft Wireless Comfort 5050
Software W10 Pro 64-bit
I am very very suspicious at this point they aren't simply gluing threadripper grade dies to Ryzen CPUs on request
Please explain what you mean by this. Was that a tongue in cheek comment? Or do you really mean they delidded and replaced the lid on a different processor die?

I only ask because I wonder if one of those Frankenstein processors escaped AMD and somehow got released into the retail distribution channel? That might explain why a poster I was helping on another site received a "brand new" :rolleyes: ??? Ryzen 1600 from Overclockers in the UK where the lid clearly had been removed and replaced as a "blue substance" (I am assuming TIM) was oozing out from all around the edges of the lid. The box was sealed with an ESD precaution label. Customer Support at Overclockers seemed shocked and puzzled and even paid for return shipping, suggesting ("guessing") it was a "warehouse/packing error" at AMD because it should have really been brand new.

Still waiting on the OP to see what the replacement processor looks like but it appears, at least, that Overclockers is stepping up and taking care of their customer. :)
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
Please explain what you mean by this. Was that a tongue in cheek comment? Or do you really mean they delidded and replaced the lid on a different processor die?

Well, I mean I don't actually mean/think they are delidding and replacing dies. I think they simply build these RMA'd cpus to order with better binned parts. But I could be wrong. The whole thing is an information vacuum which is half the issue.

I did ask AMD directly what was going on, but my previously quite talkitive person helping me with my RMA went silent on that. (Not unexpected mind you, he's probably not qualified to comment there).

As for the rest of your comment, it sounds very much like what I got. Have him check his heatspreader label. I bet it's a week 25 or newer CPU. That would be an RMA-return at this point. They do look otherwise new, so maybe it went something like this:

Overclockers.co.uk gets returned CPU, RMA's. -> Gets replacement CPU, looks new, puts on shelf -> Customer gets replacement cpu, notices missing sticker and thermal paste, complains -> Overclockers support is clueless, as they don't handle RMAs.

EDIT: Scratch all that. You mean the lid had actually been removed? Like the processor heatspreader? If so, no, that's not at all what mine was like.
 
Last edited:
Joined
Jul 25, 2006
Messages
11,955 (1.85/day)
Location
Nebraska, USA
System Name Brightworks Systems BWS-6 E-IV
Processor Intel Core i5-6600 @ 3.9GHz
Motherboard Gigabyte GA-Z170-HD3 Rev 1.0
Cooling Quality case, 2 x Fractal Design 140mm fans, stock CPU HSF
Memory 32GB (4 x 8GB) DDR4 3000 Corsair Vengeance
Video Card(s) EVGA GEForce GTX 1050Ti 4Gb GDDR5
Storage Samsung 850 Pro 256GB SSD, Samsung 860 Evo 500GB SSD
Display(s) Samsung S24E650BW LED x 2
Case Fractal Design Define R4
Power Supply EVGA Supernova 550W G2 Gold
Mouse Logitech M190
Keyboard Microsoft Wireless Comfort 5050
Software W10 Pro 64-bit
EDIT: Scratch all that. You mean the lid had actually been removed? Like the processor heatspreader?
That's exactly what I mean. It appeared the lid was removed and an excessive amount of TIM was applied that then squished out when the lid was replaced. And the retail box was still sealed so it does appears Overclockers did not do anything funny here as they too thought they were selling a "new" CPU.

Note this was (or was supposed to be) a new retail boxed CPU. Not an OEM. So I guess this was something totally different from your scenarios. Sorry for the OT sidetrack.
 
Joined
Aug 20, 2007
Messages
20,674 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
That's exactly what I mean. It appeared the lid was removed and an excessive amount of TIM was applied that then squished out when the lid was replaced. And the retail box was still sealed so it does appears Overclockers did not do anything funny here as they too thought they were selling a "new" CPU.

Note this was (or was supposed to be) a new retail boxed CPU. Not an OEM. So I guess this was something totally different from your scenarios. Sorry for the OT sidetrack.

No apology necessary. Makes me wonder what went on there but your correct it's likely unrelated.
 
Joined
Mar 18, 2008
Messages
5,717 (0.98/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
I do hope we will not see an large proportion of RyZen owners RMA their stuff for a higher binned processor
 
Top