It has been a long time since I had to swap out a drive in a server with a RAID-5 array, but yesterday I got my chance. An old venerable server ("Ol' Bessie") that I've kept running at least 5 years past its reasonable lifespan was showing the tell-tale signs of a failed hard disk in Event Viewer - System Log. I think this server was built in the mid-90's by DEBCO Computers in Hyde Park, Ohio. It's an old battle tank of a server with 3 redundant power supplies (one which died 4 years ago). They don't build servers like this anymore -- but I like it because it takes me back to the days when I installed servers just like this one for Spencer Stuart.
Ol' Bessie doesn't have a hardware RAID controller. There are 6 physical hard disks and RAID is managed through the Windows 2003 Disk Manager. 2 of the disks are in a RAID 1 set (the system drive), and the other 4 are in the RAID-5 set. For the work this server does, software RAID is just fine and it has performed well for the 5+ years that I've been this server's steward.
Opening up Disk Manager, I could see that sure enough one of the 4 drives in the RAID-5 set was "Missing". Right clicking on the drive in Disk Manager, I attempted to REACTIVATE to no avail.
No problem, I have a spare hard disk for just this occassion -- still sitting in the unopened box from 2007 when I bought it for just this moment. But wait -- WHICH of the 4 disks is the one that I need to pull out and replace?? Can I identify it with serial numbers? Can I identify it with the LED lights? Sounds simple but it's not.
Like I said, it has been awhile since I had to do this -- at least 4 years -- and as such I wasn't feeling real snappy about it. So, like a good geek, I grabbed my copy of Mark Minassi's Windows Server 2003 to see what he had to say about it. Mark's book is great, and it has helped me out alot over the years, but unfortunately, I found nothing about replacing a failed disk in a software managed RAID-5 array.
I tried to Bing some responses but could find nothing concrete -- just a bunch of forum posts that had been syndicated into these awful amalgam services. What is it with these "services" that compile all these different forum posts from different messaging sites into one big pile of crap? All I found was the same exact questions being asked over and over again -- but with different branding (and different annoying ads). Of course, none of the questions that I saw repeatedly had any suitable answers. How typical anymore of the Internet -- reminds me of the advent of cable television -- a million channels and there is nothing worth watching. How I yearn for the simpler days of 5 channels with good content, or the days when the Internet wasn't filled with vast miles of crap and ads. But I digress.
Microsoft Technet wasn't helpful either -- just couldn't find what I needed -- which was "how do I identify the failed physical disk". In my searches, I found several different pieces of software that could identify hard drive serial numbers -- but these all cost money and I didn't want to introduce some malware infected junk onto my domain controller. Egghead Cafe (and others) also had some code snippets in various languages (Delphi -- man who uses that anymore, C, and others) that I could compile to figure this out -- great, if only I had an hour or two (and the will) to do that -- but even then it might not tell me the answer to my quandry.
So, I retreated to the little server room and set about figuring it out, no thanks to the Internet or the countless, nameless thousands of IT professional who have had to do this before. You know you are out there!
The good news, I got it figured out and it's working! Here is how I did it:
1. Go into Device Manager and expand the hard drives. You'll see all of them, minus the one that has failed sitting there all pretty. Right click on each one and choose Properties. Look for the TARGET value. Here you will see a LUN number. Start at the top and work your way down the list of drives. The Target LUN values are zero-based, meaning the first hard disk starts at 0, the next 1, the next 2, etc. As you work your way down, you will find that one drive is missing in the sequence. In my case, I had 6 drives. I saw 0,1,2,4,5. No three. 3 was the missing disk. Then, I got down on my hands and knees, because the server is on the floor, and I counted down from the top -- 0,1,2...THREE. I marked that drive by wiping off a year of dust and then shut the server down.
2. Once shut down, I yanked #3 out and slid in the replacement disk -- which was an exact duplicate of the one I just pulled out.
3. Boot the server up and start up Disk Manager (Computer Management...Disk Manager). Disk Manager recognized right away that something was up and prompted me to import the new disk 3 and convert it to Dynamic.
4. Now, I could see my 3 working drives with their Sky Blue colored stripe to indicate a RAID-5 set, and this one new disk with a black stripe. I could also see at the bottom that it still thought a disk was "missing". No worries about the missing disk -- that's just old crap showing up. Deal with that in a minute. Now, up top above the stripes, right click on Drive F "Raid-5" array and choose REPAIR Volume. Disk Manager wisely prompts to ask if I want to use the newly imported disk to replace the one that was missing. Yes!!
5. Now, the black strip on disk 3 turns Teal like the others and I can see they are Resynching -- and boy this looks like it will take forever. No problem, let it work.
6. Right click the MISSING disk down low and choose to Remove. Snap, it's gone and Disk Manager is looking all dressed up and pretty again. Job done!
Tuesday, February 16, 2010
Subscribe to:
Comments (Atom)
