RAID is not a backup!

This post describes the authors experience, almost losing his data on a RAID disk set. He also gives good details on why RAID is not a backup and how he rectified the situation. Remember: RAID is not a backup!

When working with corporate systems, a complete, reliable, and tested backup system is important. RAID does not protect you against many (or even most) disasters that could happen.

RAID is designed to protect against one thing: disk failure. It does not protect against user error, operator error, site destruction, and many more possibilities.

So how do I back things up? I must admit, I’ve improved my backup strategies of late. I currently have several tools that I use and would recommend to you:

  • SpiderOak. This is an online backup service which offers the first 2Gb backup free. They also maintain multiple version backup, so if you want a file from two versions back, it’ll still be there. This service is worth paying for, I’d say.
  • For my Mac, I’ve used PsyncX periodically (albeit not automated). It has come in handy more than once as my laptop died several times – I’ve one of those iBooks that was notorious for video hardware that failed annually (and Apple would fix for free, but never admitted fault). If you’ve a Mac, get an external drive and use PsyncX to save your home directory off. Also recommended: put your applications in your home directory, not the system directory: restoring your home directory will then be enough to get your applications back.
  • For UNIX, the similar alternative to PsyncX is rsync: again, get an external drive and save your home directory off to it regularly.
  • Also, come at it from the other direction: save your configuration by putting it into a cfengine or puppet setup and saving that as well. If the machine fails, running cfengine or puppet on startup will restore the system to its original state.
  • One other item – that may seem a bit unusual – is using Thinkfree Office. Thinkfree Office gives you a way to save documents locally and have them mirrored in the Internet cloud – and you can also manipulate your documents on the web as well. Of course, this is only entirely true for documents that Thinkfree Office can edit.

It would seem that cfengine v3 is now available for download – that will have to be a subject for a new article.

Automation: Live and Breathe It!

Automation should be second nature to a system administrator. I have a maxim that I try to live by: “If I can tell someone how to do it, I can tell a computer how to do it.” I put this into practice by automating everything I can.

Why is this so important? If you craft every machine by hand, then you wind up with a number of problems (or possible problems):

  • Each machine is independently configured, and each machine is different. No two machines will be alike – which means instead of one machine replicated one hundred times, you’ll have one hundred different machines.
  • Problems that exist on a machine may or may not exist on another – and may or may not get fixed when found. If machine alpha has a problem, how do you know that machine beta or machine charlie don’t have the same problem? How do you know the problem is fixed on all machines? You don’t.
  • How do you know all required software is present? You don’t. It might be present on machine alpha, but not machine delta.
  • How do you know all software is up to date and at the same revision? You don’t. If machine alpha and machine delta both have a particular software, maybe it is the same one and maybe not.
  • How do you know if you’ve configured two machines in the same way? Maybe you missed a particular configuration requirement – which will only show up later as a problem or service outage.
  • If you have to recover any given machine, how do you know it will be recovered to the same configuration? Often, the configuration may or may not be backed up – so then it has to be recreated. Are the same packages installed? The same set of software? The same patches?

To avoid these problems and more, automation should be a part of every system wherever possible. Automate the configuration – setup – reconfiguration – backups – and so forth. Don’t miss anything – and if you did, add the automation as soon as you know about it.

Things like Perl, TCL, Lua, and Ruby are all good for this.

Other tools that help tremendously in this area are automatic installation tools: Red Hat Kickstart (as well as Spacewalk), Solaris Jumpstart, HP’s Ignite-UX, and OpenSUSE Autoyast. These systems can, if configured properly, automatically install a machine unattended.

When combined with a tool like cfengine or puppet, these automatic installations can be nearly complete – from turning the system on for the very first time to full operation without operator intervention. This automated install not only improves reliability, but can free up hours of your time.