So the Raid 1 pair on our main server (sbs08) had an error – which got worse when the UPS failed in a spectacular fashion resulting in the rebuild sticking at 99.83% for a week. This would have been fine it not for the fact that everytime the backup process ran it triggered alerts every 4 seconds as it tried to read bad sectors and before long the server was out of action for all network traffic.
At first we tried Disk2VHD from sysinternals, but after a few attempts it would not sucessfully copy the partition with the bad data on it. So we did a bit of pruning, moved out all the data files we could and then moved other non-core files (like WSUS). Disk2VHD still failed so we needed a different approach.
Now to try with shadow protect. Using the IT edition we were able to image the drives, but again the 2nd partition (the one with exchange on it) got stuck. After waiting for 2 hours for ShadowProtect to try and complete the data transfer rate had fallen from “30Mb/s with 10 sec to go” to “8Kb/s with 3 sec to go”. It was time to accept some data loss and we quit the process. We then ran the .spf to .vhd conversion tool and hit a new problem – a corrupt .spf.
From the command line you can run shadow protect to convert a corrupt image from one which has no EOF to a brand new one which is closed at the end – although there is a good chance of data loss. Based on the data rate above there should only be a loss of 24Kb out of 600Gb – not that bad.
So now to bring up the 2 new drives in a virtual machine and wait while all the drivers get installed. This always takes a lot longer than you think, but after about 30 minutes we had the correct drivers installed and all the missing hardware removed. Now this should have been it, email was now flowing and network traffic was resonable, but there were a lot of errors in the log.
While this was a really messy migration it left a lot of issues with no apparent cause. The companyweb site would not come up, and the monitoring database would not start. The resolution took the best part of another day and the highlights were:
1) certificate managment database got corrupted – we brought up the old server in isolation and migrated across.
2) Microsoft##SSEE database corruption – one of the internal tables got corrupted and needed overwriting with a known good copy.
3) All the networking tweaks the BPA recomend had to be re-done. These seemed to have been re-set when the network cards were replaced with virtual cards. Running the wizards again seemed to fix all of this.