Is there a Linux/Ubuntu god among us?

phill · Jul 18, 2023

Hey guys, I'm hoping and wondering if there's any hope left but also any chance that someone in this amazing forum, can help me fix a RAID 10 array

For some reason on Sunday morning when I was just going through the back up of my Synology box and whilst acting too quickly, I have lost access to my Synology.. For some reason, when I tried accessing it, it errors but shows the shared folders I have made on the system. Like so....

I did manage to get this to work, via installing Ubuntu and performing a few commands that I had picked up from the Synology help links -

Like so..
But, for some reason whilst I was trying to copy across some other folders, I believe two of the drives, dropped out and then even more fun array stuff started happening... i.e. it stopped working

When it stopped working, I stupidily as I was slightly paniced, removed the drive data cables rather than the power and tested each one, so to make sure they worked, which they did..

I've been trying to find a command or something to help with the repair of the array. I've even tried swapping the drives around so that the array matches the first picture, but I have been unsuccessful with that happening. It seems when trying to think cleverly and test a drive at a time, Ubuntu puts the same label on the drive, so whenever there's just one drive in the system, it will go to SBB3.... If there was a way of reading the information so I could find a serial number for each drive letter, I'd gladly go through and try to put it back how it was but it seems that would be too easy and well, I don't know Ubuntu hardly at all to do something like that.

I've tried re-enabling the array but I'm not sure what to really do and I don't wish to run the risk of damaging things even further. I have been able to gain information from each of the drives in the array -

I would dearly love to get the remaining data from the array, there's quite a bit on there but most of it I believe now, has been backed up, so I don't believe I'd be loosing masses but I could loose more of my time re-gaining the data and having to re-sort the few thousand photo's I've been very luckly able to gain back before it went silly after I had it copying nicely from Ubuntu the first time. I'm not 100% sure what happened but I believe I might have slightly tapped a cable or something that caused the issue.

Is there anyone out there who can save me from this mess?? Is there anything else you might require? I've got the drives installed on a separate machine now, with the latest Ubuntu install, 22.05 I believe it is?? I can't wait to hear from you

Massive thank you's in advance

Calenhad · Jul 18, 2023

For the every drive is sdb issue. Linux will add drives starting with sda. With your os drive and one raid drive, the solo raid drive will always end up as sdb. Two raid drives will show as sdb and sdc, and so on.

When you examine 6 of the drives the array state show 1 drive as missing. In the sdd and sde screenshots, array state is showing 8 active drives. Those two drives are also showing 603 event count while the other 6 are at 607. Could be a intermittent hardware failure, or probably a result of you yanking cables. The raid will normally survive two dropped drives, unless they are the same slots in each raid 0 array, since raid 10 is two raid 0 arrays mirrored. Meaning the array will work in a degraded state with 1 to 4 dead drives, depending on your "luck". Or put the other way, it will fail with somewhere between 2 to 5 dead drives.

Your md2 raid device is inactive. This is the actual device you access your raid with. What is the output of mdadm -D /dev/md2?

Looking at your pictures, and without the result of the above code, I would probably attempt: mdadm --assemble /dev/md2. If that don't work: mdadm --assemble /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3. You can also stick a --force in, if the previous commands won't work. But this have a higher chance of data loss: mdadm --assemble --force /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3

Do you have a md0 and md1 raid device btw?
And is the md2 device actually built with the #3 partition of each drive? Or is the actual array the whole drive? If that is the case you need to replace all the sdX3 with just sdX in the above command. But hopefully the first assemble command will work fine. But I am making a couple assumptions based on your screenshots.

Fair warning: it has been a while since I have messed around with Linux software raid. I can't guarantee that this will solve your issue.

Aquinus · Jul 18, 2023

As long as the SMART stats on the drives are okay, it looks like your RAID is intact. You should be able to recreate the RAID given what you've shown.

Calenhad said:
Looking at your pictures, and without the result of the above code, I would probably attempt: mdadm --assemble /dev/md2. If that don't work: mdadm --assemble /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3. You can also stick a --force in, if the previous commands won't work. But this have a higher chance of data loss: mdadm --assemble --force /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3

It would actually probably be easier to just run this:

Code:

mdadm --assemble --scan

You can also add the `-v` flag to the above to get more verbose output and it will tell you if it's skipping a drive for one reason or another.

mdadm is smart enough to know that the drives with the same RAID UUID are for the same array. You can try doing what Calenhad suggested with `--force`, but that should be a last resort and only if you understand why it's failing in the first place, do not run that blindly.

Calenhad · Jul 18, 2023

Aquinus said:
It would actually probably be easier to just run this:

Code:

mdadm --assemble --scan

mdadm --assemble /dev/md2 implies --scan since only one device is specified. I added the raid device to narrow the operation, in case there are more than one raid device present.

Aquinus · Jul 18, 2023

Calenhad said:
mdadm --assemble /dev/md2 implies --scan since only one device is specified. I added the raid device to narrow the operation, in case there are more than one raid device present.

That's fair, although I don't think it would hurt. It won't fiddle with active arrays and it won't bring one active unless all the drives are there. Generally speaking, I'd call --scan safe 99% of the time. It only becomes potentially unsafe if you add the --force flag in the condition you described. In which case, my previous statement applies.

Aquinus said:
You can try doing what Calenhad suggested with `--force`, but that should be a last resort and only if you understand why it's failing in the first place, do not run that blindly.

Edit: I mean, you could just slap --force on there and YOLO it. That's an option if one is so inclined. :laugh:

phill · Jul 18, 2023

Firstly, thank you for the replies and help!

Secondly, here's the results of the first few commands...

So... I'm not sure why the device is 'busy' as far as I'm aware there's no disk accessing going on whatsoever??

Calenhad said:
For the every drive is sdb issue. Linux will add drives starting with sda. With your os drive and one raid drive, the solo raid drive will always end up as sdb. Two raid drives will show as sdb and sdc, and so on.

When you examine 6 of the drives the array state show 1 drive as missing. In the sdd and sde screenshots, array state is showing 8 active drives. Those two drives are also showing 603 event count while the other 6 are at 607. Could be a intermittent hardware failure, or probably a result of you yanking cables. The raid will normally survive two dropped drives, unless they are the same slots in each raid 0 array, since raid 10 is two raid 0 arrays mirrored. Meaning the array will work in a degraded state with 1 to 4 dead drives, depending on your "luck". Or put the other way, it will fail with somewhere between 2 to 5 dead drives.

Your md2 raid device is inactive. This is the actual device you access your raid with. What is the output of mdadm -D /dev/md2?

Looking at your pictures, and without the result of the above code, I would probably attempt: mdadm --assemble /dev/md2. If that don't work: mdadm --assemble /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3. You can also stick a --force in, if the previous commands won't work. But this have a higher chance of data loss: mdadm --assemble --force /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3

Do you have a md0 and md1 raid device btw?
And is the md2 device actually built with the #3 partition of each drive? Or is the actual array the whole drive? If that is the case you need to replace all the sdX3 with just sdX in the above command. But hopefully the first assemble command will work fine. But I am making a couple assumptions based on your screenshots.

Fair warning: it has been a while since I have messed around with Linux software raid. I can't guarantee that this will solve your issue.

As for the md0 and md1, these are the drives I have showing at the moment....

Is there anything else I could try and do before I do anything else? I'm wondering if I could try it building the sdi1 or sdi2 for example? The sdi3 I believe is just the data of the drive... I'm not 100% sure of where the information on the drives sit for the RAID... I'll have a look...

Here's a link I've been following for the recovery so far...

Synology RAID Recovery
Ubuntu ISO Create
Ubuntu Install

I will try the other partitions and see what happens with those if its just going to error rather than do anything.

mdadm --assemble /dev/md2

Didn't see to do anything but I'm not sure if that's good or bad... I await your awesome knowledge!!

Aquinus · Jul 18, 2023

This is a similar SuperUser question and the person hit the same disk busy message.

How do I reactivate my MDADM RAID5 array?

I've just moved house which involved dismantling my server and re-connecting it. Since doing so, one of my MDADM RAID5 arrays is appearing as inactive: root@mserver:/tmp# cat /proc/mdstat Persona...

superuser.com

This sounds like the raid was shutdown uncleanly. Your solution might be the same (adding the --force flag to assemble.) If I were you, I'd go for it at this point.

Calenhad · Jul 19, 2023

You can find the manual page for mdadm usage/options here: https://linux.die.net/man/8/mdadm

cat /proc/mdstat show that you have just the md2 array, and no md1 or md0. There should be no need to dig further into that.

The output from mdadm -D /dev/md2 show what appear to be a properly configured raid array. All 8 drives are listed etc. They are however not assigned a raid device number. But this should be due to the array being inactive.

You can try mdadm --stop /dev/md2, this should deactivate the array and release the resources. Hopefully solving the device busy problem. Your array is already flagged as inactive, so this might not do anything. But it is possible the drives are stuck since the array did not gracefully go inactive.

Since the config appear to be intact (as shown by mdadm -D), the next step should still be be to reassemble the array. I would personally use mdadm --assemble /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3 because this will skip all the unnecessary drives that are being scanned with the --scan option. We already know which drives (partitions) this array should use. And you can see in the output you got from mdadm --assemble --scan above, that all other drives in your system were scanned and none of them matched your raid array.

You probably have to use the --force option by now. Either to get round the device busy problem, or some other problem with reassembly. But I would still try without it first.

chrcoluk · Jul 19, 2023

If the above doesnt work, see if you have access to the smartctl command, it is part of smartmon tools package. If you have access to it (or you already have smart access in the NAS UI).

Raid 10 is essentially two raid 0's put together as a mirror, so if one of your drives is bad enough to prevent it being used then you basically lose one side of the raid, however all your data will be there on the working side. The problem is for me I have never used mdadm for raid10 before, and your images dont seem to show a reasonable presentation of how the drives are assigned in the raid10, instead just showing the total assigned drives all in one list. So for that reason I have stayed quiet, so no comment on me from the mdadm side of things, but smartctl -a 'drive label' will show the state of individual drives.

phill · Jul 19, 2023

I'll see what I can do this evening guys, thank you for the help

Had to take a step back from it as it was starting to do my head in :laugh:

If and when this gets fixed, the next question is what to replace the Synology setup for.. I'm wondering if I should stick with it, but will deal with that after the RAID 10 save....

A Computer Guy · Jul 19, 2023

phill said:
I'll see what I can do this evening guys, thank you for the help Had to take a step back from it as it was starting to do my head in

If and when this gets fixed, the next question is what to replace the Synology setup for.. I'm wondering if I should stick with it, but will deal with that after the RAID 10 save....

Did the Synology box itself fail? What model was it?

phill · Jul 27, 2023

Firstly, apologies to all who have replied and helped, been caught up with a few things and not getting online much to the forum hasn't helped aside from the usual updates to WCG and FAH etc. But, its all gone well thankfully!

So I'll answer that the RAID is now back up and running this evening, I've managed to get that working with help from a guy called Brian from Superuser.com ( Linky ) and its working its way through the few TB's of data to get that grabbed off so I can figure out what my next move is

Also, whilst I will grab some screen grabs later, I've got these commands to add, just in case anyone searches and needs some help with this

if you are trying to reassemble, do mdadm --assemble --readonly /dev/[md array] [disks...]. What I would like to try but am hesitant without a backup is mdadm --assemble --readonly /dev/[md disk] [first 6 disks only]

To Clone the drives - dd if=/dev/[source] of=[destination]

dd if=/dev/[source] skip=1000000000000 bs=1M count=10 | sha256sum

Here is what I am hoping will ultimately work (Don't do now!): 1. mdadm --assemble /dev/md2 [disk 0] ... [disk 5] 2. mdadm --re-add /dev/md2 [disk 6] 3. mdadm --re-add /dev/md2 [disk 7]. 4. mdadm --assemble --run --force --update=resync /dev/md2 [disk 6] [disk 7]

Sorry about the confusion on the dd if=/dev/[source] skip=1000000000000 bs=1M count=10 | sha256sum. You do not include the brackets - they were simply a space holder. If you could re-run that as follows (with a real disk in this example), it would be appreciated: dd if=/dev/sde3 skip=1000000000000 bs=1M count=10 | sha256sum

Since /dev/sdb3 and /dev/sdc3 are the removed drives but their mirrored disks remain online, the array should reassemble without a problem. So, if you aren't worry by data loss, we can try taking steps to bring everything back online. 1. mdadm --assemble /dev/md2 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3; 2. mdadm --re-add /dev/md2 /dev/sdb3; 3. mdadm --re-add /dev/md2 /dev/sdc3. Wait about 5 minutes and take a shot of cat /proc/mdstat to see the status of the array. Also post, mdadm --detail /dev/md2

For re-adding the drives and setting off the array check and rebuild -
cat /proc/mdstat and mdadm --detail /dev/md2, we can take the following two steps. 1. mdadm --stop /dev/md2; 2. mdadm --assemble --run --force --update=resync /dev/md2 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 /dev/sdg3 /dev/sdh3 /dev/sdi3. Wait about 5 minutes and again take a shot of cat /proc/mdstat to see the status of the array. Also post, mdadm --detail /dev/md2. Then you can mount the array mount /dev/md2 [mount point]

For Cloning -
dd if=/dev/[source] of=/dev/[target] bs=512K status=progress (without brackets. If you were to clone it to a file to a drive larger than the source (and mounted itself: dd if=/dev/[source] of=/[targetPath]/[targetFile].dd bs=512K status=progress (not including brackets). (example: dd if=/dev/sda1 of=/media/usb/backup/sda1.dd bs=512K status=progress. The .dd file extension is also optimal but I find it helps ensure you remember what the file is.

Can't thank Brian enough so massive shout out and thanks to him for making that possible

A Computer Guy said:
Did the Synology box itself fail? What model was it?

It wasn't the box that failed I don't think as such and I use a Synology hack, Xpenology

I run a slightly older OS version than what is current (I believe its 6.1.7?? I think the latest is 7.2??) but it does its job or it at least did up until a few weeks ago now nearly.
Not sure what went on, but it was transferring fine a day before, I'd left it on over night and then all of a sudden in the morning things where acting strange and decided not to work when trying to connect to it. Windows could see it, see the shares but I couldn't access any of the shared folder data, which was somewhat frustrating

The model I use is a DS3617XS I believe (might be 3615??) but it basically does all I need to be honest

I just need to figure out about how to do the RAID again when I set it up .... I'm wondering about getting a 'dump' drive and then maybe using a RAID 5 or 6 (currently its a 8 x 8TB RAID 10 array, but as with RAID 10, its loosing 1/2 my storage space but the drive performance is good.... decisions decisions......)

But still, thankfully its hopefully going to spend the night, copying data over and verifying the array

Hopefully it won't take too long! Saying that, its not 24GB of data its trying to move..... :eek:

Thank you everyone for the support and help with this!!

System Name	Not so complete or overkill - There are others!! Just no room to put! :D
Processor	Ryzen Threadripper 3970X
Motherboard	Asus Zenith 2 Extreme Alpha
Cooling	Lots!! Dual GTX 560 rads with D5 pumps for each rad. One rad for each component
Memory	Viper Steel 4 x 16GB DDR4 3600MHz not sure on the timings... Probably still at 2667!! :(
Video Card(s)	Asus Strix 3090 with front and rear active full cover water blocks
Storage	I'm bound to forget something here - 250GB OS, 2 x 1TB NVME, 2 x 1TB SSD, 4TB SSD, 2 x 8TB HD etc...
Display(s)	3 x Dell 27" S2721DGFA @ 7680 x 1440P @ 144Hz or 165Hz - working on it!!
Case	The big Thermaltake that looks like a Case Mods
Audio Device(s)	Onboard
Power Supply	EVGA 1600W T2
Mouse	Corsair thingy
Keyboard	Razer something or other....
VR HMD	No headset yet
Software	Windows 11 OS... Not a fan!!
Benchmark Scores	I've actually never benched it!! Too busy with WCG and FAH and not gaming! :( :( Not OC'd it!! :(

Processor	AMD Ryzen 7 5800X3D
Motherboard	Asus Crosshair VIII Dark Hero
Cooling	Custom Watercooling
Memory	G.Skill Trident Z Royal 2x16GB
Video Card(s)	MSi RTX 3080ti Suprim X
Storage	2TB Corsair MP600 PRO Hydro X
Display(s)	Samsung G7 27" x2
Audio Device(s)	Sound Blaster ZxR
Power Supply	Be Quiet! Dark Power Pro 12 1500W
Mouse	Logitech G903
Keyboard	Steelseries Apex Pro

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, AirPods Max
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.5

Processor	AMD Ryzen 7 5800X3D
Motherboard	Asus Crosshair VIII Dark Hero
Cooling	Custom Watercooling
Memory	G.Skill Trident Z Royal 2x16GB
Video Card(s)	MSi RTX 3080ti Suprim X
Storage	2TB Corsair MP600 PRO Hydro X
Display(s)	Samsung G7 27" x2
Audio Device(s)	Sound Blaster ZxR
Power Supply	Be Quiet! Dark Power Pro 12 1500W
Mouse	Logitech G903
Keyboard	Steelseries Apex Pro

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, AirPods Max
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.5

Is there a Linux/Ubuntu god among us?

phill

Moderator

Calenhad

Aquinus

Resident Wat-man

Calenhad

Aquinus

Resident Wat-man

phill

Moderator

Aquinus

Resident Wat-man

How do I reactivate my MDADM RAID5 array?

Calenhad

chrcoluk

phill

Moderator

A Computer Guy

phill

Moderator

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 4TB WD SA510, 2x 3TB WD Red, 1x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

System Name	Still not a thread ripper but pretty good.
Processor	Ryzen 9 7950x, Thermal Grizzly AM5 Offset Mounting Kit, Thermal Grizzly Extreme Paste
Motherboard	ASRock B650 LiveMixer (BIOS/UEFI version P3.08, AGESA 1.2.0.2)
Cooling	EK-Quantum Velocity, EK-Quantum Reflection PC-O11, D5 PWM, EK-CoolStream PE 360, XSPC TX360
Memory	V-Color DDR5 96GB (48GBx2) 6400MHz CL52 2Rx8 ECC Unbuffered DIMM 1.1v (TE548G64D852K) + JONSBO NF-1
Video Card(s)	XFX Radeon RX 5700 & EK-Quantum Vector Radeon RX 5700 +XT & Backplate
Storage	Samsung 4TB 980 PRO, 2 x Optane 905p 1.5TB (striped), AMD Radeon RAMDisk
Display(s)	2 x 4K LG 27UL600-W (and HUANUO Dual Monitor Mount)
Case	Lian Li PC-O11 Dynamic Black (original model)
Audio Device(s)	Corsair Commander Pro for Fans, RGB, & Temp Sensors (x4)
Power Supply	Corsair RM750x
Mouse	Logitech M575
Keyboard	Corsair Strafe RGB MK.2
Software	Windows 10 Professional (64bit)
Benchmark Scores	RIP Ryzen 9 5950x, ASRock X570 Taichi (v1.06), 128GB Micron DDR4-3200 ECC UDIMM (18ASF4G72AZ-3G2F1)