On Monday morning, I got to work and discovered that our main VMWare server had rebooted, and was telling me that fsck had failed, and needed to be run manually to fix unrecoverable hard drive errors.
What followed was the most awful day of my IT career so far. Running on this VMWare server were two Windows Server 2003 virtual machines on which the day-to-day operations of our company depend – our Active Directory server and our primary file share server.
As it happened, it was the first day on the job for a programmer, and instead of getting his dev machine set up and familiarizing himself with the codebase, he spend 15 hours assisting us in bringing up two new VMs and restoring those servers from backup, and then, because the AD database was unrecoverably corrupted, installing a new domain and moving every desktop computer in the office to that new domain.
Surprisingly, his wife let him come back to work on Tuesday, which was largely spent cleaning up the mess that is left when you move people to a new domain and their desktop settings are associated with the old account.
Oh, yeah, and two of my team are in Buenos Aires presenting at CakeFest, of which I am dreadfully proud, but it could hardly have been worse timing.
As it happens, last Monday, our ASA 5510 Cisco firewall went bad, and that was a pretty rough day, but, in comparison, it was positively delightful.
This is the point at which I’m supposed to say “it could be a lot worse”, and, yeah, I suppose it could. we could have been completely unable to restore anything. We could have not had the file server backed up. We could have lost one of the servers that … ahem … we *don’t* have backed up.
But it was plenty bad enough, and I don’t want to have to EVER deal with a day like Monday again.