Designing for Safety - Part 3

by Wayne M. Krakau - Chicago Computer Guide, November 1999
Here’s the third article in my series on designing safer networks. While most people wouldn’t purposely design unsafe ones, lots of folks, including those who should know better, do so through the neglect of basic design issues.

Now, on to a server’s disk subsystem (geekspeak for various combinations of disk drives). The first decision is where to put the disk. If you put them inside the computer enclosure, they are vulnerable to electrical and heat problems that can affect the motherboard, power supply, and other components within that enclosure. This means that if the computer itself fails catastrophically, it could take your disk drives with it. Internal drives also make it difficult, if not impossible, to create an easy-to-use twin server as I described earlier in this series.

Finally, being good at making computers doesn’t mean that a company has expertise in making fast and efficient disk systems. For most high-performance internal disk systems, you are stuck with the proprietary disk and controller combinations offered by the computer manufacturer. These tend to be nowhere near as fast, efficient, or cost effective as aftermarket systems available from manufacturers who specialize in making high-performance disk systems.

Because of these factors, I recommend using an external disk subsystem for the NOS (Network Operating System) drives of a server. An external system is not subject to the internal electrical and temperature variations of the server enclosure. It can also be moved to a twin server (as defined in the first article in this series) by any civilian (that is non-computer-geek) with only a trivial amount of training.

There are several choices available in safe disk technologies. At the low end, you can start out with mirrored disks. You use two disk drives attached to a single controller. The two drives hold duplicate data, so that if the first one fails, the other can automatically take over. There is a disadvantage here in that the NOS must do all of the detail work of writing to both drives, thereby incurring a performance penalty of anywhere from 5 to 15% as compared to a single drive.

The next level up from simple drive mirroring is an enhancement of mirroring known as disk duplexing. This method uses two controllers as well as two drives, with each drive connected to its own controller. The most obvious advantage is redundancy of controllers, though that advantage has been reduced over time due to the increase in reliability of controllers. The big advantage is that while plain mirroring is saddled with a performance penalty, duplexing brings a performance increase of 5 to 15% over a single disk.

There are two reasons for this performance enhancement. The first is the fact that the two controllers can take over much of the overhead of writing to both drives, so the NOS doesn’t have to do it. The other is the NOS can track the relative idleness of each drive and also the relative position of the drive heads in relation to the desired data and can split the read commands between the drives to optimize performance. (At least that’s the technique that NetWare uses.)

The ultimate in safe disk subsystems is an external (see above) RAID (Redundant Array of Independent Disks) system, typically using Level 5 RAID. In this system, a minimum of three disks are tied together to hold data. The data is spanned across these disks in such a way as to provide redundant information so that the array can keep running even if one disk fails. In that case, the information that would have been on the failed disk is recreated on the fly using the redundant data on the other two disks. Note that there is a performance penalty when running with a failed drive due to the extra effort involved in recreating the missing data.

The basic formula for calculating the total available storage in a RAID system is N-1, where N is the total number of disks. Therefore, a three-disk array has the effective capacity of two disks. Similarly, a four-disk array has a useable capacity of three disks, and so on. The redundant information is spread across all of the disks, not placed on a single disk within the array, so the array can tolerate any single drive failure. For an extra measure of safety, you can add additional drives, called hot spares, that automatically activate if one of the live drives fails.

Typically, RAID systems include a hot-swap feature. This means that you can physically disconnect and remove an individual (presumably failed) drive from the array and insert a replacement drive while the array is running. You do, however, have to be careful not to shake the system while swapping drives to because of the potential to destroy more drives while replacing a bad one.

Most traditional RAID systems are in a tower configuration, where multiple drives fit in an enclosure with one or more power supplies. They are available in fixed sizes, with seven drives being the most common. You can also choose between using a hardware RAID controller in the enclosure or one that fits in a slot in the server. Software RAID has become unpopular due to reductions in the cost of hardware-based RAID controllers. I strongly prefer the embedded (inside the RAID system as opposed to inside the file server) RAID controllers, as these work no matter what NOS brand or version runs on the file server. Using an embedded controller eliminates possible compatibility problems and tends to perform faster, too.

The RAID system that I sell most often is the Radion system from Peripheral Technology Group ( (www.ptgs.com & www.radionsystems.com). It is a modular system that eliminates the limitations of fixed-size RAID enclosures. Each module contains a disk drive, a power supply, a fan, a handle (for pulling it out), and a hot-swap connector. You start out with a stack of three hot-swappable drives with a four-channel hardware RAID controller in the base. You can add up to 24 more drives, distributed across four stacks, with each stack attached to its own separate channel.

Next month I’ll continue covering safety issues. For now, I’ll play with my newest toy. No, it isn’t the latest electronic gadget, it’s a Y2K bug (as in insect) that I got at the local Hallmark card shop. When you drop it (or spank it if you’re into that sort of thing) it makes a loud crashing sound. I’m thinking of sending one to each of my clients who is still procrastinating about Y2K upgrades.

�1999, Wayne M. Krakau