• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Exquisitely sensitive stability testing - the linux kernel!

Joined
Jun 11, 2007
Messages
192 (0.03/day)
Processor i7-3770K @ 45x100
Motherboard P8Z77-V Pro
Cooling Noctua NH-D14
Memory G.SKILL Ripjaws X Series (2 x 8GB) DDR3 1600/F3-1600C9D-16GXM
Video Card(s) Onboard HD4000
Storage Vertex 4 128 GB + other HDDs
Case P183
Power Supply Seasonic SS-560KM
TL; DR Summary
The linux kernel is a powerful tool to detect instabilities in your overclock settings with both greater accuracy and sensitivity than either Prime95 or IBT/LinX.

More Details
The linux kernel supplies users with a dead simple method for measuring hardware instabilities -- like those caused by an 'unstable' overclock. There is nothing special to install as this functionality seems to be naively included in the kernel itself. To use it, simply run a standard stress test such as Prime95 or Linpack and watch the output from dmesg. If the system is unstable due to insufficient voltage settings, excessive heat, it will report:

Code:
[Hardware Error]: Machine check events logged

I have seen the kernel throw these errors during a prime95 run before prime95 gave an error in the math. Further, I have seen these errors appear when and linpack did not detect the settings are unstable as evident by the residual number not chaining during the run when the error occurred.

How to Stress Test Under Linux
Probably the most newb-friendly flavor of Linux is Ubuntu. Users can run it live off a CD or a USB without installing it to their systems. Further, it is pre-configured to boot into a GUI with network and hardware autodetected. Download an image from http://www.ubuntu.com - I recommend the 64-bit version as the 32-bit Linux suffers from the same <4 GB of memory limitation that the 32-bit Windows does,

Note: don't feel like Ubuntu is your only option. There are many other Linux distributions out there from which to choose.

Download the iso, burn it to media or to a USB and boot. Ubuntu prompts users to either "try ubuntu" or "install ubuntu." Just hit the "try ubuntu" button and you will be dumped into the live linux environment.

Here are a few suggestions for stress testing:
1) mprime ---> linux version of prime95. Help to download and run mprime.
2) linpack ---> back end to both LinX and IBT. Help to download and run linpack.

Fine, run mprime using your favorite torture test (small FFTs for example). Now to see the output from the kernel, you need to print the output of the kernel ring buffer. You can do this in one of two ways:

1) Open a terminal and type dmesg to see a snapshot.
2) Perhaps more useful is to be informed when something happens rather than typing dmesg over and over again! You can do this with the following command:
Code:
sudo cat /proc/kmsg

It looks like nothing is happening, but actually, the command more or less opened a connection to the ring buffer; it will update when something happens. To test it, plug in a USB thumb drive.

Example on my box:
Code:
<5>[13393.025582] scsi 10:0:0:0: Direct-Access     Kingston DataTraveler 112 1.00 PQ: 0 ANSI: 2
<5>[13393.026103] sd 10:0:0:0: [sdc] 7831552 512-byte logical blocks: (4.00 GB/3.73 GiB)
<5>[13393.026449] sd 10:0:0:0: [sdc] Write Protect is of<>133065]s 0000 sc oeSne 30 00

Anyway, you will want to watch for that message I posted above:
Code:
[Hardware Error]: Machine check events logged
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
27,032 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
this is the equivalent to a bluescreen under Windows
 
Joined
Jun 11, 2007
Messages
192 (0.03/day)
Processor i7-3770K @ 45x100
Motherboard P8Z77-V Pro
Cooling Noctua NH-D14
Memory G.SKILL Ripjaws X Series (2 x 8GB) DDR3 1600/F3-1600C9D-16GXM
Video Card(s) Onboard HD4000
Storage Vertex 4 128 GB + other HDDs
Case P183
Power Supply Seasonic SS-560KM
this is the equivalent to a bluescreen under Windows

No.. that would be the kernel panic. This is a warning that happens before the hard lock so users can hopefully stop whatever is causing it.
 

OneMoar

There is Always Moar
Joined
Apr 9, 2010
Messages
8,744 (1.71/day)
Location
Rochester area
System Name RPC MK2.5
Processor Ryzen 5800x
Motherboard Gigabyte Aorus Pro V2
Cooling Enermax ETX-T50RGB
Memory CL16 BL2K16G36C16U4RL 3600 1:1 micron e-die
Video Card(s) GIGABYTE RTX 3070 Ti GAMING OC
Storage ADATA SX8200PRO NVME 512GB, Intel 545s 500GBSSD, ADATA SU800 SSD, 3TB Spinner
Display(s) LG Ultra Gear 32 1440p 165hz Dell 1440p 75hz
Case Phanteks P300 /w 300A front panel conversion
Audio Device(s) onboard
Power Supply SeaSonic Focus+ Platinum 750W
Mouse Kone burst Pro
Keyboard EVGA Z15
Software Windows 11 +startisallback
its not advisable to argue with a Wizzard he might turn you into a toad
and no one runs stress tests under linux
unless Linux is there main os .... and who runs Linux as there main os .... no one /trolling the linux kernel is not a TOOL for anything its a Kernel ... I suggest you go read up on the subject windows has a Kernel to and its far more sensitive to OC faults then Linux
http://en.wikipedia.org/wiki/Kernel_(computing)
 
Joined
Jun 11, 2007
Messages
192 (0.03/day)
Processor i7-3770K @ 45x100
Motherboard P8Z77-V Pro
Cooling Noctua NH-D14
Memory G.SKILL Ripjaws X Series (2 x 8GB) DDR3 1600/F3-1600C9D-16GXM
Video Card(s) Onboard HD4000
Storage Vertex 4 128 GB + other HDDs
Case P183
Power Supply Seasonic SS-560KM
and no one runs stress tests under linux
unless Linux is there main os .... and who runs Linux as there main os ....

I do on both counts.

I suggest you go read up on the subject windows has a Kernel to and its far more sensitive to OC faults then Linux

I disagree with your statement: if it [the windows kernel] were far more sensitivity to OC faults, you wouldn't need to rely on the stress tester itself to inform you of a fault. It would tell you before they happen as what I posted does.

graysky said:
I have seen the kernel throw these errors during a prime95 run before prime95 gave an error in the math. Further, I have seen these errors appear when linpack did not detect the settings are unstable as evident by the residual number not chaining during the run when the error occurred.

The whole point of my post was to give you a tool to, "detect instabilities in your overclock settings with both greater accuracy and sensitivity than either Prime95 or IBT/LinX." Guess I should have extended the statement to encompass the windows kernel as well.
 

OneMoar

There is Always Moar
Joined
Apr 9, 2010
Messages
8,744 (1.71/day)
Location
Rochester area
System Name RPC MK2.5
Processor Ryzen 5800x
Motherboard Gigabyte Aorus Pro V2
Cooling Enermax ETX-T50RGB
Memory CL16 BL2K16G36C16U4RL 3600 1:1 micron e-die
Video Card(s) GIGABYTE RTX 3070 Ti GAMING OC
Storage ADATA SX8200PRO NVME 512GB, Intel 545s 500GBSSD, ADATA SU800 SSD, 3TB Spinner
Display(s) LG Ultra Gear 32 1440p 165hz Dell 1440p 75hz
Case Phanteks P300 /w 300A front panel conversion
Audio Device(s) onboard
Power Supply SeaSonic Focus+ Platinum 750W
Mouse Kone burst Pro
Keyboard EVGA Z15
Software Windows 11 +startisallback
I do on both counts.



I disagree with your statement: if it [the windows kernel] were far more sensitivity to OC faults, you wouldn't need to rely on the stress tester itself to inform you of a fault. It would tell you before they happen as what I posted does.

The whole point of my post was to give you a tool to, "detect instabilities in your overclock settings with both greater accuracy and sensitivity than either Prime95 or IBT/LinX." Guess I should have extended the statement to encompass the windows kernel as well.

problem is you are wrong
Linux IS more resistant to crashing due to a hardware fault
we don't care about WHY is it crashed or IF it crashed we don't run our systems for daily use untill we know they are stable iv booted way to many machines with faulty ram into linux where windows would't boot id rater have it hardlock
 
Joined
Jun 11, 2007
Messages
192 (0.03/day)
Processor i7-3770K @ 45x100
Motherboard P8Z77-V Pro
Cooling Noctua NH-D14
Memory G.SKILL Ripjaws X Series (2 x 8GB) DDR3 1600/F3-1600C9D-16GXM
Video Card(s) Onboard HD4000
Storage Vertex 4 128 GB + other HDDs
Case P183
Power Supply Seasonic SS-560KM
I disagree with you and will end with that. I think we can both agree upon the fact that having multiple tools in a toolbox is a nice thing.
 

OneMoar

There is Always Moar
Joined
Apr 9, 2010
Messages
8,744 (1.71/day)
Location
Rochester area
System Name RPC MK2.5
Processor Ryzen 5800x
Motherboard Gigabyte Aorus Pro V2
Cooling Enermax ETX-T50RGB
Memory CL16 BL2K16G36C16U4RL 3600 1:1 micron e-die
Video Card(s) GIGABYTE RTX 3070 Ti GAMING OC
Storage ADATA SX8200PRO NVME 512GB, Intel 545s 500GBSSD, ADATA SU800 SSD, 3TB Spinner
Display(s) LG Ultra Gear 32 1440p 165hz Dell 1440p 75hz
Case Phanteks P300 /w 300A front panel conversion
Audio Device(s) onboard
Power Supply SeaSonic Focus+ Platinum 750W
Mouse Kone burst Pro
Keyboard EVGA Z15
Software Windows 11 +startisallback
as far as whole system stability goes there is no replacement for gaming load/whole system stress I have had cpu's pass every stability test in the box and still throw a tantrum when gaming
and you don't need a full Linux distro to run the linpack binary and there is NO difference between the win32 binary and the nix one if you don't belive me look at the src recommending someone boot into linux to stress test a windows install is boarder-line just plain old stupid
this whole thread smells like a ploy to get people to try linux or maby thats the beer talking
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
27,032 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
I think we can both agree upon the fact that having multiple tools in a toolbox is a nice thing.

it most certainly is. i didnt mean to suggest that your method is bad. i just wanted to point out the relation between machine check and bluescreens
 
Top