• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Is there a Linux/Ubuntu god among us?

phill

Moderator
Staff member
Joined
Jun 8, 2011
Messages
17,884 (3.47/day)
Location
Somerset, UK
System Name Not so complete or overkill - There are others!! Just no room to put! :D
Processor Ryzen Threadripper 3970X
Motherboard Asus Zenith 2 Extreme Alpha
Cooling Lots!! Dual GTX 560 rads with D5 pumps for each rad. One rad for each component
Memory Viper Steel 4 x 16GB DDR4 3600MHz not sure on the timings... Probably still at 2667!! :(
Video Card(s) Asus Strix 3090 with front and rear active full cover water blocks
Storage I'm bound to forget something here - 250GB OS, 2 x 1TB NVME, 2 x 1TB SSD, 4TB SSD, 2 x 8TB HD etc...
Display(s) 3 x Dell 27" S2721DGFA @ 7680 x 1440P @ 144Hz or 165Hz - working on it!!
Case The big Thermaltake that looks like a Case Mods
Audio Device(s) Onboard
Power Supply EVGA 1600W T2
Mouse Corsair thingy
Keyboard Razer something or other....
VR HMD No headset yet
Software Windows 11 OS... Not a fan!!
Benchmark Scores I've actually never benched it!! Too busy with WCG and FAH and not gaming! :( :( Not OC'd it!! :(
Hey guys, I'm hoping and wondering if there's any hope left but also any chance that someone in this amazing forum, can help me fix a RAID 10 array :)

For some reason on Sunday morning when I was just going through the back up of my Synology box and whilst acting too quickly, I have lost access to my Synology.. For some reason, when I tried accessing it, it errors but shows the shared folders I have made on the system. Like so....

Windows error for Synology.png


I did manage to get this to work, via installing Ubuntu and performing a few commands that I had picked up from the Synology help links -

Screenshot from 2023-07-16 13-47-01.png


Like so..
But, for some reason whilst I was trying to copy across some other folders, I believe two of the drives, dropped out and then even more fun array stuff started happening... i.e. it stopped working :(

Screenshot from 2023-07-16 21-39-01.png


When it stopped working, I stupidily as I was slightly paniced, removed the drive data cables rather than the power and tested each one, so to make sure they worked, which they did..

I've been trying to find a command or something to help with the repair of the array. I've even tried swapping the drives around so that the array matches the first picture, but I have been unsuccessful with that happening. It seems when trying to think cleverly and test a drive at a time, Ubuntu puts the same label on the drive, so whenever there's just one drive in the system, it will go to SBB3.... If there was a way of reading the information so I could find a serial number for each drive letter, I'd gladly go through and try to put it back how it was but it seems that would be too easy and well, I don't know Ubuntu hardly at all to do something like that.

I've tried re-enabling the array but I'm not sure what to really do and I don't wish to run the risk of damaging things even further. I have been able to gain information from each of the drives in the array -

RAID 10 Array.png


I would dearly love to get the remaining data from the array, there's quite a bit on there but most of it I believe now, has been backed up, so I don't believe I'd be loosing masses but I could loose more of my time re-gaining the data and having to re-sort the few thousand photo's I've been very luckly able to gain back before it went silly after I had it copying nicely from Ubuntu the first time. I'm not 100% sure what happened but I believe I might have slightly tapped a cable or something that caused the issue.

Is there anyone out there who can save me from this mess?? Is there anything else you might require? I've got the drives installed on a separate machine now, with the latest Ubuntu install, 22.05 I believe it is?? I can't wait to hear from you :) Massive thank you's in advance :)
 
For the every drive is sdb issue. Linux will add drives starting with sda. With your os drive and one raid drive, the solo raid drive will always end up as sdb. Two raid drives will show as sdb and sdc, and so on.

When you examine 6 of the drives the array state show 1 drive as missing. In the sdd and sde screenshots, array state is showing 8 active drives. Those two drives are also showing 603 event count while the other 6 are at 607. Could be a intermittent hardware failure, or probably a result of you yanking cables. The raid will normally survive two dropped drives, unless they are the same slots in each raid 0 array, since raid 10 is two raid 0 arrays mirrored. Meaning the array will work in a degraded state with 1 to 4 dead drives, depending on your "luck". Or put the other way, it will fail with somewhere between 2 to 5 dead drives.

Your md2 raid device is inactive. This is the actual device you access your raid with. What is the output of mdadm -D /dev/md2?

Looking at your pictures, and without the result of the above code, I would probably attempt: mdadm --assemble /dev/md2. If that don't work: mdadm --assemble /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3. You can also stick a --force in, if the previous commands won't work. But this have a higher chance of data loss: mdadm --assemble --force /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3

Do you have a md0 and md1 raid device btw?
And is the md2 device actually built with the #3 partition of each drive? Or is the actual array the whole drive? If that is the case you need to replace all the sdX3 with just sdX in the above command. But hopefully the first assemble command will work fine. But I am making a couple assumptions based on your screenshots.

Fair warning: it has been a while since I have messed around with Linux software raid. I can't guarantee that this will solve your issue.
 
As long as the SMART stats on the drives are okay, it looks like your RAID is intact. You should be able to recreate the RAID given what you've shown.
Looking at your pictures, and without the result of the above code, I would probably attempt: mdadm --assemble /dev/md2. If that don't work: mdadm --assemble /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3. You can also stick a --force in, if the previous commands won't work. But this have a higher chance of data loss: mdadm --assemble --force /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3
It would actually probably be easier to just run this:
Code:
mdadm --assemble --scan
You can also add the `-v` flag to the above to get more verbose output and it will tell you if it's skipping a drive for one reason or another.

mdadm is smart enough to know that the drives with the same RAID UUID are for the same array. You can try doing what Calenhad suggested with `--force`, but that should be a last resort and only if you understand why it's failing in the first place, do not run that blindly.
 
It would actually probably be easier to just run this:
Code:
mdadm --assemble --scan
mdadm --assemble /dev/md2 implies --scan since only one device is specified. I added the raid device to narrow the operation, in case there are more than one raid device present.
 
mdadm --assemble /dev/md2 implies --scan since only one device is specified. I added the raid device to narrow the operation, in case there are more than one raid device present.
That's fair, although I don't think it would hurt. It won't fiddle with active arrays and it won't bring one active unless all the drives are there. Generally speaking, I'd call --scan safe 99% of the time. It only becomes potentially unsafe if you add the --force flag in the condition you described. In which case, my previous statement applies.
You can try doing what Calenhad suggested with `--force`, but that should be a last resort and only if you understand why it's failing in the first place, do not run that blindly.
Edit: I mean, you could just slap --force on there and YOLO it. That's an option if one is so inclined. :laugh:
 
Firstly, thank you for the replies and help! :)

Secondly, here's the results of the first few commands...
Screenshot from 2023-07-18 15-31-44.png



Screenshot from 2023-07-18 15-38-53.png


Screenshot from 2023-07-18 15-39-15.png


Screenshot from 2023-07-18 15-40-07.png


Screenshot from 2023-07-18 15-41-17.png


Screenshot from 2023-07-18 15-41-24.png


So... I'm not sure why the device is 'busy' as far as I'm aware there's no disk accessing going on whatsoever??
For the every drive is sdb issue. Linux will add drives starting with sda. With your os drive and one raid drive, the solo raid drive will always end up as sdb. Two raid drives will show as sdb and sdc, and so on.

When you examine 6 of the drives the array state show 1 drive as missing. In the sdd and sde screenshots, array state is showing 8 active drives. Those two drives are also showing 603 event count while the other 6 are at 607. Could be a intermittent hardware failure, or probably a result of you yanking cables. The raid will normally survive two dropped drives, unless they are the same slots in each raid 0 array, since raid 10 is two raid 0 arrays mirrored. Meaning the array will work in a degraded state with 1 to 4 dead drives, depending on your "luck". Or put the other way, it will fail with somewhere between 2 to 5 dead drives.

Your md2 raid device is inactive. This is the actual device you access your raid with. What is the output of mdadm -D /dev/md2?

Looking at your pictures, and without the result of the above code, I would probably attempt: mdadm --assemble /dev/md2. If that don't work: mdadm --assemble /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3. You can also stick a --force in, if the previous commands won't work. But this have a higher chance of data loss: mdadm --assemble --force /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3

Do you have a md0 and md1 raid device btw?
And is the md2 device actually built with the #3 partition of each drive? Or is the actual array the whole drive? If that is the case you need to replace all the sdX3 with just sdX in the above command. But hopefully the first assemble command will work fine. But I am making a couple assumptions based on your screenshots.

Fair warning: it has been a while since I have messed around with Linux software raid. I can't guarantee that this will solve your issue.
As for the md0 and md1, these are the drives I have showing at the moment....

Screenshot from 2023-07-17 16-27-42.png


Is there anything else I could try and do before I do anything else? I'm wondering if I could try it building the sdi1 or sdi2 for example? The sdi3 I believe is just the data of the drive... I'm not 100% sure of where the information on the drives sit for the RAID... I'll have a look...

Here's a link I've been following for the recovery so far...

Synology RAID Recovery
Ubuntu ISO Create
Ubuntu Install

I will try the other partitions and see what happens with those if its just going to error rather than do anything.

mdadm --assemble /dev/md2

Didn't see to do anything but I'm not sure if that's good or bad... I await your awesome knowledge!!
 
This is a similar SuperUser question and the person hit the same disk busy message.

This sounds like the raid was shutdown uncleanly. Your solution might be the same (adding the --force flag to assemble.) If I were you, I'd go for it at this point.
 
You can find the manual page for mdadm usage/options here: https://linux.die.net/man/8/mdadm

cat /proc/mdstat show that you have just the md2 array, and no md1 or md0. There should be no need to dig further into that.

The output from mdadm -D /dev/md2 show what appear to be a properly configured raid array. All 8 drives are listed etc. They are however not assigned a raid device number. But this should be due to the array being inactive.

You can try mdadm --stop /dev/md2, this should deactivate the array and release the resources. Hopefully solving the device busy problem. Your array is already flagged as inactive, so this might not do anything. But it is possible the drives are stuck since the array did not gracefully go inactive.

Since the config appear to be intact (as shown by mdadm -D), the next step should still be be to reassemble the array. I would personally use mdadm --assemble /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3 because this will skip all the unnecessary drives that are being scanned with the --scan option. We already know which drives (partitions) this array should use. And you can see in the output you got from mdadm --assemble --scan above, that all other drives in your system were scanned and none of them matched your raid array.

You probably have to use the --force option by now. Either to get round the device busy problem, or some other problem with reassembly. But I would still try without it first.
 
If the above doesnt work, see if you have access to the smartctl command, it is part of smartmon tools package. If you have access to it (or you already have smart access in the NAS UI).

Raid 10 is essentially two raid 0's put together as a mirror, so if one of your drives is bad enough to prevent it being used then you basically lose one side of the raid, however all your data will be there on the working side. The problem is for me I have never used mdadm for raid10 before, and your images dont seem to show a reasonable presentation of how the drives are assigned in the raid10, instead just showing the total assigned drives all in one list. So for that reason I have stayed quiet, so no comment on me from the mdadm side of things, but smartctl -a 'drive label' will show the state of individual drives.
 
I'll see what I can do this evening guys, thank you for the help :) Had to take a step back from it as it was starting to do my head in :laugh:

If and when this gets fixed, the next question is what to replace the Synology setup for.. I'm wondering if I should stick with it, but will deal with that after the RAID 10 save....
 
I'll see what I can do this evening guys, thank you for the help :) Had to take a step back from it as it was starting to do my head in :laugh:

If and when this gets fixed, the next question is what to replace the Synology setup for.. I'm wondering if I should stick with it, but will deal with that after the RAID 10 save....
Did the Synology box itself fail? What model was it?
 
Firstly, apologies to all who have replied and helped, been caught up with a few things and not getting online much to the forum hasn't helped aside from the usual updates to WCG and FAH etc. But, its all gone well thankfully!

So I'll answer that the RAID is now back up and running this evening, I've managed to get that working with help from a guy called Brian from Superuser.com ( Linky ) and its working its way through the few TB's of data to get that grabbed off so I can figure out what my next move is :)

Also, whilst I will grab some screen grabs later, I've got these commands to add, just in case anyone searches and needs some help with this :)

if you are trying to reassemble, do mdadm --assemble --readonly /dev/[md array] [disks...]. What I would like to try but am hesitant without a backup is mdadm --assemble --readonly /dev/[md disk] [first 6 disks only]

To Clone the drives - dd if=/dev/[source] of=[destination]

dd if=/dev/[source] skip=1000000000000 bs=1M count=10 | sha256sum

Here is what I am hoping will ultimately work (Don't do now!): 1. mdadm --assemble /dev/md2 [disk 0] ... [disk 5] 2. mdadm --re-add /dev/md2 [disk 6] 3. mdadm --re-add /dev/md2 [disk 7]. 4. mdadm --assemble --run --force --update=resync /dev/md2 [disk 6] [disk 7]

Sorry about the confusion on the dd if=/dev/[source] skip=1000000000000 bs=1M count=10 | sha256sum. You do not include the brackets - they were simply a space holder. If you could re-run that as follows (with a real disk in this example), it would be appreciated: dd if=/dev/sde3 skip=1000000000000 bs=1M count=10 | sha256sum

Since /dev/sdb3 and /dev/sdc3 are the removed drives but their mirrored disks remain online, the array should reassemble without a problem. So, if you aren't worry by data loss, we can try taking steps to bring everything back online. 1. mdadm --assemble /dev/md2 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3; 2. mdadm --re-add /dev/md2 /dev/sdb3; 3. mdadm --re-add /dev/md2 /dev/sdc3. Wait about 5 minutes and take a shot of cat /proc/mdstat to see the status of the array. Also post, mdadm --detail /dev/md2

For re-adding the drives and setting off the array check and rebuild -
cat /proc/mdstat and mdadm --detail /dev/md2, we can take the following two steps. 1. mdadm --stop /dev/md2; 2. mdadm --assemble --run --force --update=resync /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3. Wait about 5 minutes and again take a shot of cat /proc/mdstat to see the status of the array. Also post, mdadm --detail /dev/md2. Then you can mount the array mount /dev/md2 [mount point]

For Cloning -
dd if=/dev/[source] of=/dev/[target] bs=512K status=progress (without brackets. If you were to clone it to a file to a drive larger than the source (and mounted itself: dd if=/dev/[source] of=/[targetPath]/[targetFile].dd bs=512K status=progress (not including brackets). (example: dd if=/dev/sda1 of=/media/usb/backup/sda1.dd bs=512K status=progress. The .dd file extension is also optimal but I find it helps ensure you remember what the file is.


Can't thank Brian enough so massive shout out and thanks to him for making that possible :)

Did the Synology box itself fail? What model was it?
It wasn't the box that failed I don't think as such and I use a Synology hack, Xpenology :) I run a slightly older OS version than what is current (I believe its 6.1.7?? I think the latest is 7.2??) but it does its job or it at least did up until a few weeks ago now nearly.
Not sure what went on, but it was transferring fine a day before, I'd left it on over night and then all of a sudden in the morning things where acting strange and decided not to work when trying to connect to it. Windows could see it, see the shares but I couldn't access any of the shared folder data, which was somewhat frustrating :( The model I use is a DS3617XS I believe (might be 3615??) but it basically does all I need to be honest :) I just need to figure out about how to do the RAID again when I set it up .... I'm wondering about getting a 'dump' drive and then maybe using a RAID 5 or 6 (currently its a 8 x 8TB RAID 10 array, but as with RAID 10, its loosing 1/2 my storage space but the drive performance is good.... decisions decisions......)

But still, thankfully its hopefully going to spend the night, copying data over and verifying the array :) Hopefully it won't take too long! Saying that, its not 24GB of data its trying to move..... :eek: :laugh:

Thank you everyone for the support and help with this!!
 
Back
Top