• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

My research into AMD's Linux "Performance Marginality" issue:

Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#1
I've been doing some behind the scenes research into AMD's so called Linux "Performance Marginality." When I initially began researching this, I had big plans to write an independent research script to attempt to prove the crash can happen in Windows with a program to prove it. Unfortunately, I never quite got there, and it appears I may even have been off on my expected results. The crash is triggered by ASLR, and Windows doesn't use this, generally. Javascript might, but find me any webpage that spawns a 16 thread javascript process that isn't mining coins malware style and I'll be genuinely shocked.

What did come of this is a document where I detailed my results with the RMA. It appears if nothing else, there is heavy evidence indicating there is not a new stepping, but actually just improved binning to mitigate the issue amongst those whom complain. It's circumstantial evidence at this point, but given AMD has declined to comment repeatedly when asked how they fix this, I am very very suspicious at this point they aren't simply gluing threadripper grade dies to Ryzen CPUs on request, and standard Ryzen grade CPUs simply don't have a fully functional ASLR function under load (at least, at the binning level they chose).

I'm putting the document I typed up below, including evidence, in hopes you guys can do more research and maybe find enough to make this case a bit more than circumstantial. As it is, I'm out of time and energy to pursue this further, but it certainly seems suspect.

BEGIN PM (Originally sent to W1zzard and company, advised to share with community):

As a user of Gentoo Linux, I have been hit hard by the so-called Ryzen “Performance Marginality.” This manifests itself as an event in which several build jobs running concurrently will crash a random process on the system, usually (but not necessarily) one of the running build jobs. The problem is well documented, and AMD is offering RMAs to affected users. The thing is, that makes it sound like not everyone is affected. Truth be told, after a lot of online research, it is my opinion that anyone with a processor older than build week 25 is affected. Since anything newer than build week 20 has not made it into retail yet (at least, if user reports can be believed), this means nearly all Ryzen processors on the market at present time are affected by this issue.

This is a big deal, and not just on Linux. Why?

The issue vanishes in Linux with nearly all users when they turn off Kernel ASLR (Address Space Layout Randomization). This is a critical security feature that is not presently used much in Windows (and frankly, may never be) but is already being used inside web browsers in VMs like Javascript and similar. I’d be very interested in how a loaded Ryzen VM performs with Javascript longterm, for example. I’m sure this issue can manifest itself elsewhere if ASLR is truly being corrupted under load.

What else is newsworthy here? Well, the issue does not appear to be fixed. By that I mean, there is no new stepping. It appears by all accounts that the most likely “fix” for this issue AMD is employing is to simply bin the processor better (that means picking a better performing wafer of silicon). This also explains why Threadripper and EPYC are “unaffected.” They are ALREADY binned higher.

To test this theory, I submitted my processor for an RMA. All users are reportedly getting “fresh from the presses” Ryzen’s manufactured not too long ago. Personally, my theory is that they are being pulled straight from assembly line binning process and used for RMAs. The fact that my CPU took nearly 2 weeks to “prepare” but got to me almost overnight only supports this theory. Anyhow, my CPU is made in Week 33. You can see this vs my old Week 9 Ryzen compared below:





Note, in the images above, the older CPU container has a plastic shield that is much more “shiny” for some reason. It obscures the laser markings a bit but they should still be legible. I think it is just a packaging difference.

The new CPU has been opened on the bottom (no sticker), as prior reports indicated. It was also shipped rather pathetically. Unfortunately, I forgot to photograph this fact in my excitement, but I can certify there was no bottom “security” sticker and online reports support this. Have a look at the poor packaging anyways for kicks:



The CPU, as predicted, is much higher binned or otherwise a “golden” chip. It does 1.425v 4.1Ghz all cores where it took 1.475v to attain 4.0Ghz All cores on my old Ryzen. It also lets the IMC fly up to 3600Mhz where before, 3200Mhz was a struggle. Here are some relevant comparison shots.

A basic overview of my old Ryzen. Lacking memory/voltage tabs, but this is all I could ever push out of it, and my “daily driver” clocks were lower. IMC was at 3200 MHz with 4 Single rank Samsung B-Die DIMMS. Clock was 4Ghz with 1.475v.




My new Ryzen. Clocks higher, with less volts. Obviously better binned or otherwise golden. IMC goes outrageously high at 3600 MHz. Same memory/DIMMS as above.



Oh, and yes, the issue is fixed.

What does this all mean?

I think AMD is binning run of the mill Ryzen CPUs so low that ASLR is effectively broken as soon as things get "hot" under load. I don't have direct confirmation of this yet, but a lot of circumstantial evidence, mostly found via myself and this thread here:

https://community.amd.com/thread/215773

It's a long read, but the evidence is there, if you look. I'd recommend the later/within last 2 month posts as they cover the RMA process and reports of binning/testing going on prior to chip arrival.
 
Last edited:
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#2
After editing / typing all that, please let me remind you I'd like to keep this thread a informtation/research thread, no fanboyism allowed.

I'd like to start the discussion by asking if anyone knows a good javascript "stress test" of sorts one could run alongside say, Prime95. If my theory is right, it should eventually crash, or something equally strange will happen.

Right now I have JetStream 1.1 but I have no idea how to loop it long term.

http://browserbench.org/JetStream/
 
Last edited:
Joined
Sep 10, 2016
Messages
414 (0.61/day)
Likes
349
Location
Riverwood, Skyrim
System Name I haven't decided yet
Processor Intel i5 6500
Motherboard ASRock H170M-ITX/AC
Cooling Stock cooler
Memory G.Skill Aegis 1x16GB 2133MHz
Video Card(s) Sapphire RX480 Nitro+ 4GB
Storage Samsung 850EVO 500GB, 2TB Seagate Barracuda
Display(s) 32' Sony TV
Case Cooler Master Elite 130
Audio Device(s) Onboard, HD 599 cans
Power Supply Antec High Current Gamer HCG-520M (520W)
Mouse Rapoo (can't remember the model number)
Keyboard Rapoo v56
Benchmark Scores Look in the various benchmark threads
#3
Thanks for the information @R-T-B, it was an interesting read and I'm actually considering RMA'ing the ryzen CPU in my brother rig as a result
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#4
Thanks for the information @R-T-B, it was an interesting read and I'm actually considering RMA'ing the ryzen CPU in my brother rig as a result
If you do, be aware they make you go through a little song and dance routine of making sure your voltage/cooling settings are adequate and have you test a fairly crazy set of voltages. I personally (being this was before I resigned) just got fed up with it, posted my settings and voltages and flashed my press credentials, which got the process escalated immediately and had them overnight me a CPU (lulz). I'm told it "normally" takes a good few months, sadly.

EDIT:

Example:

Thank you for submitting your RMA. I’m sorry to hear that you’re experiencing stability issues with your system. Please be assured that I am here to help find a resolution to your problem


Before approving your RMA, I would like to firstly perform some troubleshooting and focus on your system’s hardware configuration.


Please provide the details of the following hardware components in your system:

• Make and model of motherboard?

• Motherboard BIOS version?

• Make and model of RAM?

• Make and model of the power supply unit?


Please could you let me know the current settings you have for the CPU VCORE, SOC, and RAM? It would be very helpful if you could provide with pictures of your BIOS screens with these settings.


In addition, through troubleshooting with other customers we have found that the layout of the components inside the system case have caused sub-optimal cooling of the CPU causing a variety of issues.


I would like to better understand your system cooling to rule out any thermal issues. Please could you provide a picture of the whole interior of your system showing the CPU cooler?


Also, could you let me know the reported CPU temperature during heavy load or when the errors occur?


Thanks for contacting AMD
 
Last edited:
Joined
Sep 10, 2016
Messages
414 (0.61/day)
Likes
349
Location
Riverwood, Skyrim
System Name I haven't decided yet
Processor Intel i5 6500
Motherboard ASRock H170M-ITX/AC
Cooling Stock cooler
Memory G.Skill Aegis 1x16GB 2133MHz
Video Card(s) Sapphire RX480 Nitro+ 4GB
Storage Samsung 850EVO 500GB, 2TB Seagate Barracuda
Display(s) 32' Sony TV
Case Cooler Master Elite 130
Audio Device(s) Onboard, HD 599 cans
Power Supply Antec High Current Gamer HCG-520M (520W)
Mouse Rapoo (can't remember the model number)
Keyboard Rapoo v56
Benchmark Scores Look in the various benchmark threads
#5
If you do, be aware they make you go through a little song and dance routine of making sure your voltage/cooling settings are adequate and have you test a fairly crazy set of voltages. I personally (being this was before I resigned) just got fed up with it, posted my settings and voltages and flashed my press credentials, which got the process escalated immediately and had them overnight me a CPU (lulz). I'm told it "normally" takes a good few months, sadly.

EDIT:

Example:
Thanks for the heads up on that it is a massive song and dance routine to go through.
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#6
Thanks for the heads up on that it is a massive song and dance routine to go through.
What finally got me was when the rep asked if I "had a cooler attached." o_O

I was like... you mean which cooler? No, just like do you, at all? I was like, no, not doing this anymore... summon supervisor! :laugh:
 
Joined
Nov 4, 2005
Messages
10,047 (2.17/day)
Likes
2,416
System Name MoFo 2
Processor AMD PhenomII 1100T @ 4.2Ghz
Motherboard Asus Crosshair IV
Cooling Swiftec 655 pump, Apogee GT,, MCR360mm Rad, 1/2 loop.
Memory 8GB DDR3-2133 @ 1900 8.9.9.24 1T
Video Card(s) HD7970 1250/1750
Storage Agility 3 SSD 6TB RAID 0 on RAID Card
Display(s) 46" 1080P Toshiba LCD
Case Rosewill R6A34-BK modded (thanks to MKmods)
Audio Device(s) ATI HDMI
Power Supply 750W PC Power & Cooling modded (thanks to MKmods)
Software A lot.
Benchmark Scores Its fast. Enough.
#7
A few users experience this and out of thousands and its suddenly everyone has a problem, even when they experience none.
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#8
A few users experience this and out of thousands and its suddenly everyone has a problem, even when they experience none.
If you'd read this well researched thread, this is basically due to the lack of usage of ASLR outside of linux. It's similar to how no one "experienced" the old Prime95 avx bug despite everyone having it without wait for it... running Prime95.

This isn't a fanboy thread and I'd like to keep it free of that, thanks. The current best outcome would be to develop a windows tool to prove you are affected, and I have come seeking help for that.
 
Joined
Nov 26, 2004
Messages
4,699 (0.94/day)
Likes
1,790
Location
Canuck in Norway
System Name Hellbox 3.0(same case new guts)
Processor i7 4790K
Motherboard Asus Z97 Sabertooth Mark 1
Cooling TT Kandalf L.C.S.(Water/Air)AC Cuplex Kryos CPU Block/Noctua
Memory 2x8GB Corsair Vengance Pro 2400
Video Card(s) Sapphire Nitro+ Vega 64
Storage WD Caviar Black SATA 3 1TB x2 RAID 0 2xSamsung 850 Evo 500GB RAID 0 1TB WD Blue
Display(s) ASUS MG279Q 1440 IPS 144Hz FreeSync
Case TT Kandalf L.C.S.
Audio Device(s) Soundblaster ZX/Logitech Z906 5.1
Power Supply Seasonic X-1050W 80+ Gold
Mouse G502 Proteus Spectrum
Keyboard G19s
Software Win 10 Pro x64
#9
A few users experience this and out of thousands and its suddenly everyone has a problem, even when they experience none.
Well that’s his point you “can” create the problem and easily in Linux just not as easy in Windows. Might not be an issue today but next year who knows some ASLR functionality in Windows appears and you’re now just realizing you’re on a bad CPU
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#10
Well that’s his point you “can” create the problem and easily in Linux just not as easy in Windows. Might not be an issue today but next year who knows some ASLR functionality in Windows appears and you’re now just realizing you’re on a bad CPU
Pretty much.

I'm also slightly alarmed that their "fix" seems to be simply to throw better binned silicon to people who complain, and not globally change the binning process. Unless maybe they have? I don't know, week 25+ cpus have not hit the market yet.
 
Joined
Feb 9, 2009
Messages
1,554 (0.45/day)
Likes
410
Location
Toronto
Processor i7-2670QM / Q9550 3.6ghz
Motherboard laptop / Asus P5Q-E
Cooling laptop / Cooler Master Hyper 212
Memory 2x4gb ddr3sd / 2x2gb ddr2
Video Card(s) 570m / MSI 660 Gaming OC
Storage ST9750420AS / ST1000DM003
Display(s) BenQ FP241VW / BenQ GW2265HM
Case MSI gx780 / Corsair 500r
Audio Device(s) onboard
Power Supply laptop / Corsair 750tx
Mouse Steelseries Kinzu V2 / Logitech M120
Keyboard Logitech Deluxe 250 / Logitech K120
Software Windows 7
#11
wait, how did you conclude it's a 'heat' issue or that different bins should result in different failure rates/times? i dont remember heat being mentioned on phoronix & its user comments

if week 25+, not to mention threadripper/epyc are 'permanently fixed', doesnt that mean it's more to do with physical microscopic manufacturing defects?

for some reason i never thought of this aspect of virtualization, is ASLR of a client actually randomized on the non-ASLR host's memory (at least within the preallocated chunk of the VM process)?

i want to know more about the ram limits, we really need to confirm if different cpus result in different memory support even after all the agesa updates

guess it's a good thing i've still been waiting & waiting due to the ram+nand+gpu price inflations before building...
 
Joined
Feb 27, 2008
Messages
4,537 (1.20/day)
Likes
3,936
System Name Ironic
Processor Intel 2500k 4.4Ghz
Motherboard ASROCK|Z68 PROFESSIONAL Gen 3
Cooling Corsair H60
Memory 32GB GSkill Ripjaw X 1866
Video Card(s) Sapphire R9 290 Vapor-X 4Gb
Storage Western Digital Caviar Black 2TB SATA 3 (6G/s)
Display(s) 22" Dell Wide/ 22" Acer wide/24" Asus
Case Antec Lanboy Air Black & Blue
Audio Device(s) SB Audigy 7.1
Power Supply Corsair Enthusiast TX750
Mouse Logitech G9x, custom frame
Keyboard Roccat Ryos MK
Software Win 7 Ult 64 bit (with a side of XP64)
#12
If you'd read this well researched thread, this is basically due to the lack of usage of ASLR outside of linux. It's similar to how no one "experienced" the old Prime95 avx bug despite everyone having it without wait for it... running Prime95.

This isn't a fanboy thread and I'd like to keep it free of that, thanks. The current best outcome would be to develop a windows tool to prove you are affected, and I have come seeking help for that.
not sure to whom you're replying, but I'd say with your tone, there's a reason for that... *hint hint*
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#13
wait, how did you conclude it's a 'heat' issue or that different bins should result in different failure rates/times? i dont remember heat being mentioned on phoronix & its user comments
I don't know it's heat for certain (actaully, I more suspect it's load related since I wrote that). Frankly, all we really 100% know is for some reason the rma'd chips are binned better. Why is anyones guess, but I would assume it's because of poor binning causing the issue if we're going to conjecture.

not sure to whom you're replying, but I'd say with your tone, there's a reason for that... *hint hint*
I was replying to the quoted party.

Reason for what? Your comment is confusing. I'm not attempting any sort of tone, though maybe the old PM I copied and pasted to support these claims has one, I really didn"t check... my bad there. I'm all about sorting out what makes this issue tick and how AMD is handling it, nothing more.

For the record, AMD support deserves a gold star for how they treated me, though telling them I was a press member probably helped with that...
 
Last edited:

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
10,729 (4.54/day)
Likes
5,847
Location
Concord, NH
System Name Kratos
Processor Intel Core i7 3930k @ 4.5Ghz
Motherboard ASUS P9X79 Deluxe
Cooling Corsair H100i V2
Memory G.Skill DDR3-2133, 16gb (4x4gb) @ 9-11-10-28-108-1T 1.65v
Video Card(s) MSI AMD Radeon R9 390 GAMING 8GB @ PCI-E 3.0
Storage 2x120Gb SATA3 Corsair Force GT Raid-0, 4x1Tb RAID-5, 1x500GB
Display(s) 1x LG 27UD69P (4k), 2x Dell S2340M (1080p)
Case Antec 1200
Audio Device(s) Onboard Realtek® ALC898 8-Channel High Definition Audio
Power Supply Seasonic 1000-watt 80 PLUS Platinum
Mouse Logitech G602
Keyboard Rosewill RK-9100
Software Ubuntu 18.04
Benchmark Scores Benchmarks aren't everything.
#14
ASLR is effectively broken as soon as things get "hot" under load
I wonder if running more volts through the IMC would result in ASLR becoming more stable. It's entirely possible that ASLR is doing something in a particular way where the CPU becomes unstable and doesn't sound too different from another linux issue with the ocaml compiler where certain conditions could make the machine unstable. A lot like AVX, there are a number of things happening within a given CPU cycle and transistors that are more leaky are going to have more trouble switching at such high frequencies. If you're right and they're giving out better binned CPUs to get around it, it's entirely possible that a little more voltage in the right place might have the same effect but, resulting in more heat.
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#15
I wonder if running more volts through the IMC would result in ASLR becoming more stable. It's entirely possible that ASLR is doing something in a particular way where the CPU becomes unstable and doesn't sound too different from another linux issue with the ocaml compiler where certain conditions could make the machine unstable. A lot like AVX, there are a number of things happening within a given CPU cycle and transistors that are more leaky are going to have more trouble switching at such high frequencies. If you're right and they're giving out better binned CPUs to get around it, it's entirely possible that a little more voltage in the right place might have the same effect but, resulting in more heat.

Pre-RMA, I nearly fixed the issue by upping SOC voltage to 1.2v (later it came back with a vengance though), so you might be onto something.
 
Joined
Feb 9, 2009
Messages
1,554 (0.45/day)
Likes
410
Location
Toronto
Processor i7-2670QM / Q9550 3.6ghz
Motherboard laptop / Asus P5Q-E
Cooling laptop / Cooler Master Hyper 212
Memory 2x4gb ddr3sd / 2x2gb ddr2
Video Card(s) 570m / MSI 660 Gaming OC
Storage ST9750420AS / ST1000DM003
Display(s) BenQ FP241VW / BenQ GW2265HM
Case MSI gx780 / Corsair 500r
Audio Device(s) onboard
Power Supply laptop / Corsair 750tx
Mouse Steelseries Kinzu V2 / Logitech M120
Keyboard Logitech Deluxe 250 / Logitech K120
Software Windows 7
#16
Pre-RMA, I nearly fixed the issue by upping SOC voltage to 1.2v (later it came back with a vengance though), so you might be onto something.
it's not adding up, how can week25 or ALL threadrippers/epycs not have the issue? binning isnt exact, there are still variances, how would some small difference in target voltage or temperature or stable clock result in a very specific calculation error being permanently fixed?

the only logical way to test the bin hypothesis is by (running the errata scripts people made while) underclocking/overvolting/watercooling/timing loosening old ryzen cpus & overclocking/undervolting/overheating/timing tightening new ryzen cpus
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#17
it's not adding up, how can week25 or ALL threadrippers/epycs not have the issue?
Threadripper/EPYC have always been top 5% binned.

My current theory is that the reason all the rma'd cpus are "hot off the presses" is that they are essentially made to order with higher binned dies. Of course I could be wrong, but my build number was very close to when my RMA was approved.

We'll only really know when week 25+ cpus make it to market. It will be interesting to see if all of them are higher binned as well. All I know is RMA requests, for whatever reason, seem to be higher binned. It could be that AMD is just doing that for "added insurance" against a re-rma.

Oddly however, in contrary to my hypothesis, I can't seem to make my new Ryzen segfault by lowering soc volts to low low voltage (I tried 0.8v). I may be completely off on this afterall. I will fully admit a lot of this is my "best guess" for what is going on.
 
Joined
Nov 18, 2010
Messages
3,998 (1.43/day)
Likes
2,317
Location
Rīga, Latvia
System Name HELLSTAR
Processor Intel 5960X @ 4.4GHz
Motherboard Gigabyte GA-X99-UD3
Cooling Custom Loop. 360+240 rads.
Memory 4x8GB Corsair Vengeance LPX 2966MHz 16-17-17-35
Video Card(s) ASUS 1080 Ti FE + water block
Storage Optane 900P + Samsung 950Pro 256GB NVMe + 750 EVO 500GB
Display(s) Philips PHL BDM3270
Case Phanteks Enthoo Evolv ATX Tempered Glass
Audio Device(s) Sound Blaster ZxR
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer Deathstalker
Software Windows 10 insider
#18
The thing with bins is not only with desktop parts.

Mobile does it and always did. You can buy two same phone models, but the difference between worst and best voltage bin is HUGE, heat and battery life wise. Community often does make graphs of their samples, pretty much logic looking charts. Also the the cheating with NAND speeds etc things... like screens with useless gorilla his a** or not... there are batches...

It is a lottery IMHO.
 
Joined
Feb 9, 2009
Messages
1,554 (0.45/day)
Likes
410
Location
Toronto
Processor i7-2670QM / Q9550 3.6ghz
Motherboard laptop / Asus P5Q-E
Cooling laptop / Cooler Master Hyper 212
Memory 2x4gb ddr3sd / 2x2gb ddr2
Video Card(s) 570m / MSI 660 Gaming OC
Storage ST9750420AS / ST1000DM003
Display(s) BenQ FP241VW / BenQ GW2265HM
Case MSI gx780 / Corsair 500r
Audio Device(s) onboard
Power Supply laptop / Corsair 750tx
Mouse Steelseries Kinzu V2 / Logitech M120
Keyboard Logitech Deluxe 250 / Logitech K120
Software Windows 7
#19
Threadripper/EPYC have always been top 5% binned.

My current theory is that the reason all the rma'd cpus are "hot off the presses" is that they are essentially made to order with higher binned dies. Of course I could be wrong, but my build number was very close to when my RMA was approved.

We'll only really know when week 25+ cpus make it to market. It will be interesting to see if all of them are higher binned as well. All I know is RMA requests, for whatever reason, seem to be higher binned. It could be that AMD is just doing that for "added insurance" against a re-rma.

Oddly however, in contrary to my hypothesis, I can't seem to make my new Ryzen segfault by lowering soc volts to low low voltage (I tried 0.8v). I may be completely off on this afterall. I will fully admit a lot of this is my "best guess" for what is going on.
how are they going to give old stock during rma? the old stock has been shipped to stores, there is no reason for them to keep some for rma since they are constantly manufacturing new ones, take some new ones as needed to fill rmas

i thought some week25 did hit the market, but dont remember

if TR/E is 5%, that's no guarantee, amd would have to be sure that something like top 30% are fine, but this goes against the official statement that week25+ is fine (unless they make a more convoluted binning process, but TR/E got released... around week25 didnt they, what's the oldest known week for one?)

was this issue confirmed on the fewer core models or only the 8cores?
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#20
Was there ever an official statement from AMD that week 25+ are ok? I was under the impression that was just a phronix claim/guesstimate.
 
Joined
Jul 25, 2006
Messages
4,152 (0.95/day)
Likes
2,749
Location
Nebraska, USA
System Name Brightworks Systems BWS-6 E-IV
Processor Intel Core i5-6600 @ 3.9GHz
Motherboard Gigabyte GA-Z170-HD3 Rev 1.0
Cooling Quality case, 2 x Fractal Design 140mm fans, stock CPU HSF
Memory 16GB (2 x 8GB) DDR4 3000 Corsair Vengeance
Video Card(s) EVGA GEForce GTX 1050Ti 4Gb GDDR5
Storage Samsung 850 Pro 256GB SSD, Samsung 860 Evo 500GB SSD
Display(s) Samsung S24E650BW LED x 2
Case Fractal Design Define R4
Power Supply EVGA Supernova 550W G2 Gold
Mouse Microsoft Wireless 5000
Keyboard Microsoft Wireless Comfort 5050
Software W10 Pro 64-bit
#21
I am very very suspicious at this point they aren't simply gluing threadripper grade dies to Ryzen CPUs on request
Please explain what you mean by this. Was that a tongue in cheek comment? Or do you really mean they delidded and replaced the lid on a different processor die?

I only ask because I wonder if one of those Frankenstein processors escaped AMD and somehow got released into the retail distribution channel? That might explain why a poster I was helping on another site received a "brand new" :rolleyes: ??? Ryzen 1600 from Overclockers in the UK where the lid clearly had been removed and replaced as a "blue substance" (I am assuming TIM) was oozing out from all around the edges of the lid. The box was sealed with an ESD precaution label. Customer Support at Overclockers seemed shocked and puzzled and even paid for return shipping, suggesting ("guessing") it was a "warehouse/packing error" at AMD because it should have really been brand new.

Still waiting on the OP to see what the replacement processor looks like but it appears, at least, that Overclockers is stepping up and taking care of their customer. :)
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#22
Please explain what you mean by this. Was that a tongue in cheek comment? Or do you really mean they delidded and replaced the lid on a different processor die?
Well, I mean I don't actually mean/think they are delidding and replacing dies. I think they simply build these RMA'd cpus to order with better binned parts. But I could be wrong. The whole thing is an information vacuum which is half the issue.

I did ask AMD directly what was going on, but my previously quite talkitive person helping me with my RMA went silent on that. (Not unexpected mind you, he's probably not qualified to comment there).

As for the rest of your comment, it sounds very much like what I got. Have him check his heatspreader label. I bet it's a week 25 or newer CPU. That would be an RMA-return at this point. They do look otherwise new, so maybe it went something like this:

Overclockers.co.uk gets returned CPU, RMA's. -> Gets replacement CPU, looks new, puts on shelf -> Customer gets replacement cpu, notices missing sticker and thermal paste, complains -> Overclockers support is clueless, as they don't handle RMAs.

EDIT: Scratch all that. You mean the lid had actually been removed? Like the processor heatspreader? If so, no, that's not at all what mine was like.
 
Last edited:
Joined
Jul 25, 2006
Messages
4,152 (0.95/day)
Likes
2,749
Location
Nebraska, USA
System Name Brightworks Systems BWS-6 E-IV
Processor Intel Core i5-6600 @ 3.9GHz
Motherboard Gigabyte GA-Z170-HD3 Rev 1.0
Cooling Quality case, 2 x Fractal Design 140mm fans, stock CPU HSF
Memory 16GB (2 x 8GB) DDR4 3000 Corsair Vengeance
Video Card(s) EVGA GEForce GTX 1050Ti 4Gb GDDR5
Storage Samsung 850 Pro 256GB SSD, Samsung 860 Evo 500GB SSD
Display(s) Samsung S24E650BW LED x 2
Case Fractal Design Define R4
Power Supply EVGA Supernova 550W G2 Gold
Mouse Microsoft Wireless 5000
Keyboard Microsoft Wireless Comfort 5050
Software W10 Pro 64-bit
#23
EDIT: Scratch all that. You mean the lid had actually been removed? Like the processor heatspreader?
That's exactly what I mean. It appeared the lid was removed and an excessive amount of TIM was applied that then squished out when the lid was replaced. And the retail box was still sealed so it does appears Overclockers did not do anything funny here as they too thought they were selling a "new" CPU.

Note this was (or was supposed to be) a new retail boxed CPU. Not an OEM. So I guess this was something totally different from your scenarios. Sorry for the OT sidetrack.
 
Joined
Aug 20, 2007
Messages
9,125 (2.29/day)
Likes
8,215
System Name Pioneer
Processor Intel i7 8700k @ 4.8 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
#24
That's exactly what I mean. It appeared the lid was removed and an excessive amount of TIM was applied that then squished out when the lid was replaced. And the retail box was still sealed so it does appears Overclockers did not do anything funny here as they too thought they were selling a "new" CPU.

Note this was (or was supposed to be) a new retail boxed CPU. Not an OEM. So I guess this was something totally different from your scenarios. Sorry for the OT sidetrack.
No apology necessary. Makes me wonder what went on there but your correct it's likely unrelated.
 
Joined
Mar 18, 2008
Messages
3,325 (0.88/day)
Likes
2,334
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) Sapphire R9 Fury X
Storage Samsung 960 Pro 1TB, Crucial MX200 500GB
Display(s) Acer K272HUL, HTC Vive
Case Fractal Design R5
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
Software Windows 10 Professional/Linux Mint
#25
I do hope we will not see an large proportion of RyZen owners RMA their stuff for a higher binned processor
 
Top