• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

2012R2 first oh shit moment

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
25,887 (3.79/day)
Location
Alabama
System Name Rocinante
Processor I9 14900KS
Motherboard EVGA z690 Dark KINGPIN (modded BIOS)
Cooling EK-AIO Elite 360 D-RGB
Memory 64GB Gskill Trident Z5 DDR5 6000 @6400
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 500GB 980 Pro | 1x 1TB 980 Pro | 1x 8TB Corsair MP400
Display(s) Odyssey OLED G9 G95SC
Case Lian Li o11 Evo Dynamic White
Audio Device(s) Moondrop S8's on Schiit Hel 2e
Power Supply Bequiet! Power Pro 12 1500w
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Akko Crystal Blues
VR HMD Quest 3
Software Windows 11
Benchmark Scores I dont have time for that.
Few months ago I rebuilt a server into a much more powerful machine; during an infrastructure restructuring. I also took some primary programs that were run on the HOST OS and split them into VMs to segregate them. (This was a completely new system so I did abre metal reinstalls of all server OSs both on the host and inside each VM)

Well today was my first oh shit moment. I found that one of the data bases for the software got corrupted somehow. I did not have a readily available backup. However I DO bare metal backups of the entire system via windows Server backup every night at 3am to a seperate drive.

I was blown away at how easy it was to recover. Because Im not seasoned with recoverys (thankfully?) I shut down the offending VM. I then proceeded to restore the entire VM (not the whole bare metal just that VM in the bare metal) after it had finished I closed the prompt. went back to Hyper-v manager and booted the VM back up. It came online instantly and everything was fine.

I was so impressed I just looked at it for a few minutes.

The good news is I wont by replicating the problem. turns out some issues in the software caused bad data to be written. the root problem has been fixed I just thought I'd share my story since I was literally blown away by how simple it was and how robust the backup was.


I did pat myself on the back a little for setting it up in this way though, I can only imagine it would have been far more disastrous if all the software we use was running on the surface of one machine.

Anyone else have stories to tell about recovery situations? We all know what usually happens during data failures. Does windows Server backup genuinely deserve its rep? Did I just get lucky?
 
Joined
Feb 8, 2012
Messages
3,013 (0.68/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit
Not a Windows Server Backup experience story, but it is VM related.
My experience with VMs running Windows Server is on Amazon Cloud. They do a complete daily snapshot of a system disk image, so restoring a VM to an earlier state is really trivial.
Deployment was the same even in tricky situations: we had an auto scaling group of VMs behind a load balancer for horizontal scaling, in production, with substantial load on several machines in the auto scaling group. We would prepare next version for the deployment on a separate VM, saved system disk image of that VM and reconfigured auto scaling group to boot new instances with a new snapshot image. We would then manually start an additional new version instance in the group, then proceed slowly to kill off old version instances in the group while letting the auto scaling group to repopulate itself with new version instances.
I remember being pleasantly surprised how stuff "just worked".
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.98/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
Ah, the value of backups, couldn't agree with you more. Well done Solaris! :toast:
 

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
25,887 (3.79/day)
Location
Alabama
System Name Rocinante
Processor I9 14900KS
Motherboard EVGA z690 Dark KINGPIN (modded BIOS)
Cooling EK-AIO Elite 360 D-RGB
Memory 64GB Gskill Trident Z5 DDR5 6000 @6400
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 500GB 980 Pro | 1x 1TB 980 Pro | 1x 8TB Corsair MP400
Display(s) Odyssey OLED G9 G95SC
Case Lian Li o11 Evo Dynamic White
Audio Device(s) Moondrop S8's on Schiit Hel 2e
Power Supply Bequiet! Power Pro 12 1500w
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Akko Crystal Blues
VR HMD Quest 3
Software Windows 11
Benchmark Scores I dont have time for that.
Not a Windows Server Backup experience story, but it is VM related.
My experience with VMs running Windows Server is on Amazon Cloud. They do a complete daily snapshot of a system disk image, so restoring a VM to an earlier state is really trivial.
Deployment was the same even in tricky situations: we had an auto scaling group of VMs behind a load balancer for horizontal scaling, in production, with substantial load on several machines in the auto scaling group. We would prepare next version for the deployment on a separate VM, saved system disk image of that VM and reconfigured auto scaling group to boot new instances with a new snapshot image. We would then manually start an additional new version instance in the group, then proceed slowly to kill off old version instances in the group while letting the auto scaling group to repopulate itself with new version instances.
I remember being pleasantly surprised how stuff "just worked".

Thats sounds fun, I have yet to play with load balancing Im emulating other things on my lab ATM but iv always found it pretty fascinating.

Ah, the value of backups, couldn't agree with you more. Well done Solaris! :toast:

Thanks! I actually had a few other sysadmins years back tell me it was a bit much to do bare metals every day. Which I can honestly maybe agree with? but The drives I use are commissioned specifically for the purpose of backups so I mean fuck it.
 
Joined
Nov 10, 2006
Messages
4,665 (0.73/day)
Location
Washington, US
System Name Rainbow
Processor Intel Core i7 8700k
Motherboard MSI MPG Z390M GAMING EDGE AC
Cooling Corsair H115i, 2x Noctua NF-A14 industrialPPC-3000 PWM
Memory G. Skill TridentZ RGB 4x8GB (F4-3600C16Q-32GTZR)
Video Card(s) ZOTAC GeForce RTX 3090 Trinity
Storage 2x Samsung 950 Pro 256GB | 2xHGST Deskstar 4TB 7.2K
Display(s) Samsung C27HG70
Case Xigmatek Aquila
Power Supply Seasonic 760W SS-760XP
Mouse Razer Deathadder 2013
Keyboard Corsair Vengeance K95
Software Windows 10 Pro
Benchmark Scores 4 trillion points in GmailMark, over 144 FPS 2K Facebook Scrolling (Extreme Quality preset)
Our trouble ticket/account management system is basically just a SQL server. Server has, I can't remember if it's 6 or 8 drives. I put it in RAID 5 because almost every other server we have is RAID 0 (with the occasional 1 in the mix) and RAID 5 is an improvement for us (others were built before me).

Well, drive 3 had an electrical failure and not only that, but it starts screwing with drive 2 above it. Drive 2 then starts writing corrupt data to itself (didn't quite realize this yet) and becomes unrecognizable. Pull drive 3 and drive 2 becomes recognizable again, but the OS fails to boot. Two drive failures in a RAID 5 array means data loss, of course, so I move from trying to recover it and just get down to the post-mortem.

Now, the only computer I know of with enough SATA ports to hook up all of the non-failed server hard drives plus a drive to boot off of is my main computer at home (a little mATX build I'm pretty fond of). Found some raid recovery program that worked really well, despite the half corrupt drive 2, and managed to pull the semi-corrupt SQL database off of it.
We dug up a month or so old backup of the database and managed to overlay the bits of the current database that weren't corrupt over the month old backup. There was some chunks missing, sure, but it could have been worse.

Drive 2 tested good on its own, so I just replaced drive 3 and built a RAID 6 array instead. Shortly after, the brand new drive 3 failed again (suspecting the port on the backplane now) and I think it took out drive 2 again, but the nice RAID 6 array is keeping it all running (with zero redundancy, I do realize). It's still running, so I get assigned other projects and don't have time to revisit.

We backup more often now. :p
 

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
25,887 (3.79/day)
Location
Alabama
System Name Rocinante
Processor I9 14900KS
Motherboard EVGA z690 Dark KINGPIN (modded BIOS)
Cooling EK-AIO Elite 360 D-RGB
Memory 64GB Gskill Trident Z5 DDR5 6000 @6400
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 500GB 980 Pro | 1x 1TB 980 Pro | 1x 8TB Corsair MP400
Display(s) Odyssey OLED G9 G95SC
Case Lian Li o11 Evo Dynamic White
Audio Device(s) Moondrop S8's on Schiit Hel 2e
Power Supply Bequiet! Power Pro 12 1500w
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Akko Crystal Blues
VR HMD Quest 3
Software Windows 11
Benchmark Scores I dont have time for that.
Our trouble ticket/account management system is basically just a SQL server. Server has, I can't remember if it's 6 or 8 drives. I put it in RAID 5 because almost every other server we have is RAID 0 (with the occasional 1 in the mix) and RAID 5 is an improvement for us (others were built before me).

Well, drive 3 had an electrical failure and not only that, but it starts screwing with drive 2 above it. Drive 2 then starts writing corrupt data to itself (didn't quite realize this yet) and becomes unrecognizable. Pull drive 3 and drive 2 becomes recognizable again, but the OS fails to boot. Two drive failures in a RAID 5 array means data loss, of course, so I move from trying to recover it and just get down to the post-mortem.

Now, the only computer I know of with enough SATA ports to hook up all of the non-failed server hard drives plus a drive to boot off of is my main computer at home (a little mATX build I'm pretty fond of). Found some raid recovery program that worked really well, despite the half corrupt drive 2, and managed to pull the semi-corrupt SQL database off of it.
We dug up a month or so old backup of the database and managed to overlay the bits of the current database that weren't corrupt over the month old backup. There was some chunks missing, sure, but it could have been worse.

Drive 2 tested good on its own, so I just replaced drive 3 and built a RAID 6 array instead. Shortly after, the brand new drive 3 failed again (suspecting the port on the backplane now) and I think it took out drive 2 again, but the nice RAID 6 array is keeping it all running (with zero redundancy, I do realize). It's still running, so I get assigned other projects and don't have time to revisit.

We backup more often now. :p

Thats pretty crazy. I haven't dealt with a successful raid failure. I think Iv handled 1 5 that was gone. The others were understandably dead. The clients were running RAID 0. I can only imagine that was a satisfying feeling.
 
  • Like
Reactions: xvi
Joined
May 13, 2010
Messages
5,703 (1.12/day)
System Name RemixedBeast-NX
Processor Intel Xeon E5-2690 @ 2.9Ghz (8C/16T)
Motherboard Dell Inc. 08HPGT (CPU 1)
Cooling Dell Standard
Memory 24GB ECC
Video Card(s) Gigabyte Nvidia RTX2060 6GB
Storage 2TB Samsung 860 EVO SSD//2TB WD Black HDD
Display(s) Samsung SyncMaster P2350 23in @ 1920x1080 + Dell E2013H 20 in @1600x900
Case Dell Precision T3600 Chassis
Audio Device(s) Beyerdynamic DT770 Pro 80 // Fiio E7 Amp/DAC
Power Supply 630w Dell T3600 PSU
Mouse Logitech G700s/G502
Keyboard Logitech K740
Software Linux Mint 20
Benchmark Scores Network: APs: Cisco Meraki MR32, Ubiquiti Unifi AP-AC-LR and Lite Router/Sw:Meraki MX64 MS220-8P
Acronis has saved my ass a lot
 
Top