1. Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Strange RAID issues.

Discussion in 'Storage' started by Aquinus, Apr 27, 2012.

  1. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    Hi everyone,

    Lately I started having issues with my RAID, it has started ever since I moved my RAID to my LGA2011 rig. It is kicking drives out, at first it kicked my WD green out of the RAID, so I removed it, bought a WD black to replace it and popped in and boom, quick and simple rebuild. A couple days later one of my Hitachi drives gets kicked out of the RAID. So I'm skeptical, I rebuild and it works fine for the next couple days. So I took the WD green drive I took out to work and plugged it into one of our servers with an eSATA dock and I run SMART on it. 26000 hours of time active and not a single error or any smart attribute out of whack. So I bring it back home and when the hitachi "failed" again, I took it out and switched drives (I tried changing ports in-between as well, but it didn't seem to make a difference.) So I bring the Hitachi drive to work and check the SMART logs on that drive, 3 relocated sectors was the worst I could see, and 23000 hours of run time.

    With all of this said now, Intel Rapid Storage enterprise will say "SATA array disk removed." then it says, "Unknown disk on Controller 0, Port Unknown Detected.", Then "SATA Disk on Controller 0, Port 3 Detected." the right back to "SATA array disk removed." (It repeats this over and over and over again until you restart.)

    If I try to re-add the disk to the RAID right now, it will error out. If I restart and use either the RAID BIOS or the Utility in Windows it will re-add it fine and start rebuilding.

    Has anyone else noticed this behavior? This never happened to me on the nVidia RAID, and I'm wondering if anyone else has something to add. I might also add that I have two Corsair Force GT 120Gbs in RAID-0 on the SATA6 ports and I haven't had a single issue with that, so I'm not inclined to believe that it is the driver (it could, I mean there is data parity being calculated data gets checksumed.) I also replaced all of the SATA cables as well.

    Right now the drives in RAID are (in this order from port 3 to port 5) are:
    Hitachi 1
    WD Black
    WD Green

    with Hitachi 2 on my desk in a static proof bag. WD Green and Hitachi 2 both exhibit the same behavior, but the other two drives appear fine, even though Hitachi one is from the same batch as 2 and should have the same exact active time. The drives are getting old but that isn't reason enough for me to get rid of them and buy another new 1Tb drive.
     
  2. newtekie1

    newtekie1 Semi-Retired Folder

    Joined:
    Nov 22, 2005
    Messages:
    20,061 (6.15/day)
    Thanks Received:
    6,122
    I say try new cables, particularly try to use cables that don't have the locks on the connectors. I've had issues before with boards that use the 90° SATA connectors and those locking style cables, where the SATA ports on the board didn't have the little notches cut out for the lock, and they would wiggle loose. From what I can see on the P9X79, the SATA 3Gbps ports don't have the notches for the locks, so my guess would be the cables are wiggling loose.

    Also, in the end I'd try avoiding using the WD Green drive is possible, even the Black might give you issues. Since WD killed the TLER option on their drives they have had issues when used in RAID, the controller can in-accurately mark the drive as bad. It doesn't happy often, but it can happen. I doubt this is the issue you are having here, due to the Hitachi drive having issues too, but it is just something to be aware of.
     
    Last edited: Apr 27, 2012
    Crunching for Team TPU 50 Million points folded for TPU
  3. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    The ones before didn't have the latches, changing the cables didn't appear to have any discernible effect.
     
  4. newtekie1

    newtekie1 Semi-Retired Folder

    Joined:
    Nov 22, 2005
    Messages:
    20,061 (6.15/day)
    Thanks Received:
    6,122
    Well, there goes my idea.:(
     
    Crunching for Team TPU 50 Million points folded for TPU
  5. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    Once I get a little extra money, I think I might get another case and bring my Phenom II 940 back to life and put my RAID back on there. Maybe run Ubuntu Server and just use a samba share for files. I just don't have a whole lot of room for it though. :confused:
     
  6. Duekay New Member

    Joined:
    Oct 31, 2007
    Messages:
    258 (0.10/day)
    Thanks Received:
    1
    Location:
    Brisbane AU
    I tend to try and use the same make and model in my server, I use 4 of the Samsung green 2TB drives in raid10 but then again that's on a highpoint card, all my SSD array are on the chipset though and they are rock solid.
    I think it might be a driver issue is well, you could try uninstalling RST, I I remember correctly you don't need RST to make a chipset array you can just do it in the bios? Not sure on that though
     
  7. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    If I had money, they would all be WD Caviar Blacks. Ideally I would too, but I've been waiting for drive prices to settle a bit more.

    I did. It didn't help, granted the driver is as old as the release date for X79. :twitch:

    Not mine. The Intel server boards we have at work let you switch between the RST bios and an LSI bios, but that could just be because they have hardware LSI raid cards in them, not because it has the ability built in to use it. Let me tell you though, LSI makes damn good RAID cards. Put a BBU on one of their cards and turn write caching on and you're living the good life. It's too bad you have to dish out over 300 USD for one though (and that is just the 4-port one!), then you have to give up 8 PCI-E lanes for it as well. 5-disk RAID-5 with 500Gb WD Blacks will turn out over 400Mbs/s in RAID-5 with it, which isn't too shabby.
     
  8. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,380 (2.55/day)
    Thanks Received:
    1,230
    Its the head parking and spindown. Disable ASPM in the BIOS and see if that helps, the chipset goes to a low power state and the drives go to low power and it wakes and expects to find the drives ready to go, when they aren't it marks them failed as the time for response passes.

    Also disable drive sleep, and PCI Link State Power Management in windows power control panel and see if that fixes it.
     
    Aquinus says thanks.
    10 Million points folded for TPU
  9. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    I was thinking that something like that might be happening. I'm going to go look through the bios.

    So I checked the BIOS and you would be surprised at how little there is with regard to power saving on an X79 board. :p

    All the Windows settings are as you described already. :(

    Edit: Found a newer driver, I plopped that on, see if that helps.
     
    Last edited: Apr 28, 2012
  10. nleksan

    Joined:
    Mar 26, 2012
    Messages:
    239 (0.25/day)
    Thanks Received:
    36
    Location:
    Ohio
    Just wanted to ask for your further opinion on an LSI HW RAID Controller Card? I am considering using one for a home build to add an extra 4-8 6Gbps SATA Ports (4x SSDs, 4x WD RE4 1TB HDDs, and 4x 2TB WD Cav Black). The Z77/X79 chipsets don't have enough ports for all of the drives I need and I don't want to spend a week trying to get some shitty $30 software RAID Card to work only to find out I am getting a max 200Mbps in RAID10 over a x4/x8 lane... Thanks!
     
  11. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    Wow, 8 disk? Not a whole lot of servers even have HDD setups that large. You're looking at more than 500 USD. I think this is what you're looking for as you described. Also keep in mind this is the price without the BBU.

    LSI MegaRAID Internal SAS 9265-8i 6Gb/s Dual Core ...

    I would also be careful about what you exactly need. Rotational media hard drives won't really benefit from SATA6G, they doesn't go fast enough to really matter and SATA6 on the Z77 PCH should be able to RAID SSDs (it has 4 to 6xSATA6 ports on the PCH, right?) I don't know how well the controller works on Windows, but it worked amazingly on Ubuntu Linux 10.04.3 LTS.
     
  12. DanTheBanjoman Señor Moderator

    Joined:
    May 20, 2004
    Messages:
    10,553 (2.77/day)
    Thanks Received:
    1,383
    Sounds like TLER. Which makes the issue not strange at all. It's a feature.
     
  13. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    I'm not sure why that would cause it to just start happening with my drives now, and only 2 of them for that matter two of which are from the same batch, and only 1 is acting up. I'm not convinced that it is TLER. I've updated the drivers (which were kind of annoying to find on Intel's site I might add they have multiple pages for RSTe drivers,) and so far my RAID has been healthy for a couple days. I'll give it a week since it always seems to do it anywhere between 12 hour after a completing a rebuild to 4 days.
     
  14. DanTheBanjoman Señor Moderator

    Joined:
    May 20, 2004
    Messages:
    10,553 (2.77/day)
    Thanks Received:
    1,383
    Because it works like that by design. Until a few generations ago basically any drive supported TLER, so no issues. With the latest generations, however, TLER is only supported in enterprise drives. Basically what happens is that the drive takes too long to recover from an error and the RAID controller decides to drop it. TLER basically cuts off the error recovery after 7 seconds or something along those lines. Some RAID controllers don't have these issues, software arrays tend to be immune as well (ie mdadm).

    I've had this issue as well until recently. I ran an Axus SATA to SCSI disk cabinet with 6 drives, ran fine. I also had a 8 bay eSATA das with a RR622 running 8 green drives which caused huge problems. Large writes cause the controller to hang. Resetting the bus fixed it. Moving large amounts of data to the array was near impossible though.

    I ended up finding me a 16 bay SATA to SCSI cabinet (also Axus) for free which now replaces both devices. The firmware of these devices are immune to TLER issues.
    Most NAS devices use mdadm, which is also immune. I couldn't find a decent solution for Windows though. Using all disks in passthrough and then creating a software array would have been fine.
     
  15. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    This never happens in write intensive situations in my case, it's always when the disks are spun up and idle. Like I said, I'm waiting on the drivers because a lot of issues in the past that I've had with Intel hardware has been resolved through some form of software, be it a driver or application. We shall see though.
     
  16. DanTheBanjoman Señor Moderator

    Joined:
    May 20, 2004
    Messages:
    10,553 (2.77/day)
    Thanks Received:
    1,383
    Write intensive operations are the most likely to cause TLER issues. You might be in luck if it doesn't happen in such situations.

    While idle sounds odd though, tried turning off any power saving features? Like spinning down after x minutes?
     
  17. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    All power saving wrt the hard drives and pci-e is disabled. I'm going to stress it out a bit later to see how stable it really is, but I suspect it was the drivers considering the ones I got are pretty new and the ones I found before were the same ones from release day. Earlier I was going to take screenshots of the bios' options but come to find out that the P9X79 Deluxe has practically nothing in terms of power saving, but I guess that really isn't the goal of an SB-E build, is it? :p

    It did it again. Maybe the RSTe RAID just doesn't like the drives. Maybe I'll buy a new case for my old Phenom II 940 and turn that into a server and pop the drives in there.
     
    Last edited: Apr 29, 2012
  18. nleksan

    Joined:
    Mar 26, 2012
    Messages:
    239 (0.25/day)
    Thanks Received:
    36
    Location:
    Ohio
    I am really into HD video and Uncompressed Audio editing and recording, so 1TB can fill up fast. The SATA3 is because, in RAID0 or 10,the throughput of even mech drives should be high enough to take advantage, yes? Or am I misunderstanding the concept?
     
  19. Steevo

    Steevo

    Joined:
    Nov 4, 2005
    Messages:
    8,380 (2.55/day)
    Thanks Received:
    1,230
    You are misunderstanding, each drive has SATA 3 speeds available.

    The connection of the controller is the next saturation point, and it isn't an issue either in that case.

    Few if any mechanical hard drives alone could saturate the connections as provided on your board. Most SSD's would have issues saturating any single, or set of connections in a RAID array.


    If you are working with media files you are doing it wrong. Huge, and almost obscene amounts of RAM, then SSD's with the windows cache left on them, and then Mechanical hard drives to complete the need for massive storage.
     
    10 Million points folded for TPU
  20. slyfox2151

    slyfox2151

    Joined:
    Jan 14, 2009
    Messages:
    2,610 (1.23/day)
    Thanks Received:
    525
    Location:
    Brisbane, Australia
    Sata HDDs barely reach 150mps (Sata1) as long as there is enough bandwidth for the controller to run all sata ports at full speed then upgrading controllers to sata2/3 wont help.
     
  21. happy New Member

    Joined:
    Jun 22, 2011
    Messages:
    638 (0.52/day)
    Thanks Received:
    25
    I thought you can't RAID Caviar blacks?
     
  22. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    Tell that to the 20+ Caviar Blacks on our servers at work. Our LSI and 3Ware RAID controllers don't appear to have any issues with them, granted we're not using any of the onboard Intel RAID solutions on our servers which would be must closer to what I'm trying to do. I seriously think that I'm going to resurrect my Phenom II and turn it into a server and rely on nVidia fake-raid since it worked really well with these drives.
     
    Last edited: Apr 30, 2012
  23. theeldest

    theeldest

    Joined:
    Feb 7, 2006
    Messages:
    652 (0.20/day)
    Thanks Received:
    140
    Location:
    Austin, TX
    How old are the Blacks?

    Western Digital only made the change somewhat recently. I have a bunch of caviar blues in RAID that work great because they have TLER enabled. WD made the change to push people to the RE series if you are doing RAID.
     
  24. Aquinus

    Aquinus Resident Wat-man

    Joined:
    Jan 28, 2012
    Messages:
    6,497 (6.45/day)
    Thanks Received:
    2,197
    Location:
    Concord, NH
    some are 4 years old, some are two. Nothing newer than 2010.

    If this is a recent change, my 4 year old drives shouldn't even be having an issue with this.
     
  25. theeldest

    theeldest

    Joined:
    Feb 7, 2006
    Messages:
    652 (0.20/day)
    Thanks Received:
    140
    Location:
    Austin, TX
    Ok, that's stranger. I'm running 4x WD 6400AAKS (caviar blue) drives in RAID10 on the Intel chipset (z68).

    These drives are only about 3 years old. If these don't have problems I doubt it's specific to the drives and is most likely a driver problem.


    Sorry, really not sure what else you could try.
     

Currently Active Users Viewing This Thread: 1 (0 members and 1 guest)

Share This Page