New build, new troubles, LinPack unstable at stock - I need some help.
Hi, a buddy of mine got a new AMD build we put together last night and the computer is giving us trouble. This is my first AMD build in around 7-8 years, so I need some advice on a few things.
The specs are:
CPU: Phenom II X3 710 @ Stock with the Zalman CNPS-7500Alcu cooler (MX-2 paste).
Motherboard: Gigabyte GA-MA770T-UD3P
Memory: 2x1Gb OCZ DDR3 1600Mhz CL7-7-7-18 (1.9v sticks, ugh), running on default 1.62v at 1066Mhz currently, unganged, with 9-9-9-30 timings.
HDDs: 1 Western Digital 80Gb HDD (Ubuntu install), 2 Western Digital 160Gb HDDs (Vista Business 64-bit on a partition on one drive).
PSU: FSP Epsilon 500W
GPU: Powercolor HD2600XT 256Mb GDDR4
Some random case.
All BIOS settings on auto/default, ACC is off.
Vista installed just fine, the computer boots just fine but the system won't remain stable under Linpack, producing errors around the 3rd or 4th loop, it did pass around 20 minutes of Prime95 (Very short I know, we'll let it run more of it today when I get off work). I am sure as heck hoping there's nothing faulty with it because this is a budget build using lots of spares so troubleshooting will be annoying as heck (No additional mobo or CPU on hand and my i7 rig is at my parents' place, meaning that at least for a few days we got no additional DDR3, either).
I tried giving the DIMMs more juice (But did not touch VDIMM VTT), to no avail. Temperatures reported by HWMonitor are around 30c idle and 46c LinPack load, so assuming it is reading temps accurately there should be no overheating issues, the motherboard components also keep below 56c under load, except for some sensor on the motherboard which seems to be stuck reporting 79c regardless or load, so I assume this is a faulty sensor.
My questions are:
1) Any general tips or ideas (Other than the usual - leave one HDD, one stick of RAM, etc usual troubleshoot routine, this will be attempted once I get off work today, we just barely got it running at 2am last night and I was at my desk at work at 8am).
2) Any good accurate temp reporting software for AMD builds ? I have no experience with this, so I have no idea whether HWMonitor is reliable with AMD rigs.
3) Is there any stress testing software for linux ? Since we got two OS installed, we could make sure it isn't some OS related issue, although it is unlikely.
4) Any general ideas as to what might be causing it aside of some hardware fault ? Maybe some of the default settings aren't good enough, if so, what are good ideas to try and tweak to get it running.
Thanks a bunch to anyone who'll help - This is rather annoying.
Just a guess here , but try OCCTPT3 : Http://www.ocbase.com/perestroika_en/index.php
I'm just taking a shot in the dark here, but some MB's have their own internal voltage / fsb regulation & throttling systems that act beneath the BIOS's influence - and are driven by readings on the thermal sensors.
If you really DO have a faulty thermal sensor - it may be causing the board to behave abnormally after an amount of sustained load.
OCCTPT has a number of stress tests that (if you have never used it before) also generate graphs of all of your frequencies / voltages / temperatures depending on what you are testing - and you may possibly be able to figure out what's happening at the point you receive errors by watching what happens (if anything) on these graphs just before you get errors.
I'm thinking that the MOBO may possibly be reaching a certain amount of load and based on a false reading from an (assumed) faulty sensor, may start to throttle some or other component to try and control the heat, and that might be the cause of your instability, and you may see a sudden change in a voltage or frequency just before you receive an error.
Touch the north bridge while running it, is it insanely hot? I had that motherboard and it was also unstable. Are you sure the sensor which I assume is for the north bridge and reads 79c is actually stuck? Mine read about that high and when the system was completely cooled down it did read a little lower but then the temps immediately increased in a minute or so. It could be pinned to 79c because of thermal throttling.
I do not really know which sensor is the one that is stuck, since HWM just reads it as <something> sensor number 3, I am not near the system at the moment (At work), but I will be later today. I am pretty sure it is stuck, since every other temperature is quite low during load and varies with the load, while this one reports 79c no matter what.
I sure hope this isn't a bad board. With no parts to verify it is the motherboard, I'll end up paying the comp shop to run their tests and quite frankly, we don't have money to spare for that, and I don't trust them a whole damned lot, either.
That's the problem with buying parts and building it yourself. For other rigs, I have spares to verify things myself, especially my LGA775 builds, but the AM3 is a new toy for me and I don't have any way to check things by hardware swapping.
erocker, did you RMA the board and got a replacement that worked or did you get another board ? Is there an issue with those boards in general ?
Thank you both guys, I'll try all this later today. Hopefully I get it working, or get an indication as to what is screwed up.
I hate when things like this happen on brand new systems. :banghead:
Update: Alright. Via telephone testing continues. I told my friend to try something unorthodox.
We're now 99% certain that this is a heat issue and something overheats. The 79c sensor is not stuck, but we still have no idea what it is. This was determined like this: He took a huge fan (Larger than the case and extremely powerful) and opened the case, then directed the fan into the case. Temps went down on everything, and we managed to get the 79c sensor down to 77c, and we passed Linpack.
It must be something with no heatsink (Hence the small temp variance even with the massive heat flow) overheating and causing an instability when the case isn't ventilated with the equivalent of a frakkin' hurricane.
Any ideas what it might be on this particular board ?
are there any sensor reading in BIOS that are 79C?
Perhaps it is the VRMs ?
The GB page says: "Please apply sufficient cooling to CPU VRM zone when ACC function is enabled". Now we do not have it on, but could this indicate that the cooling is somewhat lacking as standard ?
If you look at the pic, the area above the CPU has a bunch of chips which have two holes for a heatsink to be mounted on top of them, but the board does not come with any heatsink for the area.
I might try some ramsinks there tonight.
That area is the vrms and it certainly cant hurt to put some heatsinks on. Is the CPU fan spinning fast enough? sometimes fan controllers turn them down too much.
you need some vrm cooling, urgently!
i have ramsinks on mine+ 120mm fan, and they get like 80c° easily with it.
northbridge can overheat easily too... but it could also be a bios issue... mine is 87-90 too... but these values were only shown after a bios update... and the nb is like handwarm after load...
maybe the 120mm fan you aimed at nb cooled the vrms too?
I head over to tweaktown forums (a lot of gigabyte stuff there) and do some research.
£10 says the ICs in the red box-out are what are getting shit hot. Find a decent copper HS that will fit over them (Enzotech MST-81 maybe?) and I bet those temps go down to around 50c. If its not the ICs in the boxout, may be the NB and/or SB heatsinks aren't seated properly or have very little thermal paste. Its worth taking those off too and cleaning the thermal gunk off and putting something decent on like AS Ceramique.
|All times are GMT. The time now is 01:26 AM.|
Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.