Disaster recovery planning

17 April 2008

Planning for a disaster is not necessarily as easy as it sounds. It helps if you have a rampant imagination. Throughout disaster planning, the dominant question is What if…? Following the planning, testing is required: the best plans are worthless if they don’t work in practice.

Consider an Internet server serving web pages. Let’s assume that downtime is not an option: this is a typical point to start at. The best thing to do is to start with the most specific to the system (the complete environment) and work out:

  • What if… a disk goes bad?
  • What if… the software stops?
  • What if… memory runs out?
  • What if… the power goes out?
  • What if… the kernel panics?
  • What if… the cluster failover fails?
  • What if… the network switch fails?
  • What if… the network firewall fails?
  • What if… the internet link goes down?
  • What if… the internet provider drops off the grid?

Each one of the questions must be answered and the results tested. To test for power outage, pull the power. For a failed network switch, pull the network cable - and so forth.

Most of the answers will include some form of redundancy - clusters, dual facilities (such as power and network and internet providers), and so on. However redundancy is only one solution; there is prevention and alerts as well.

Each risk must be weighed against the cost to mitigate that risk. However, assuming that the risk is minimal does not eliminate the risk; the biggest problem is not accounting for a risk that eventually happens. There is nothing like downtime of a critical server to get an unforeseen risk taken care of; better to handle the risk before it happens.

It also does not matter if the plans have not been tested. If tests are not done, then the actual event will be the first time things have been put to the test - and what if something was missed and the system goes down? During a test, preventive measures can be taken to make sure that things work as they should - during an unexpected event, it is not possible to back out or prepare; if things go down they go hard. Don’t let that happen to you!

And disaster planning is not limited to servers (or virtual servers) - what about the possibility of a server hosting multiple virtual servers going down? What if your server is hacked into? What about your monitoring system failing? What about getting paged? Have you planned contingencies for all of these events?

Plan, then test, test, test - and you will make it.

Entry Filed under: Linux, Security, UNIX. Tags: , , , , .

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


David Douthitt

David is an experienced UNIX and Linux system administrator, a former Linux distribution maintainer, and author of two books ("Advanced Topics in System Administration" and "GNU Screen: A Comprehensive Manual"). View David Douthitt's profile on LinkedIn

Recent Posts

Top Posts

RSS Sharky's Column!

Calendar

April 2008
M T W T F S S
« Mar   May »
 123456
78910111213
14151617181920
21222324252627
282930  

Recent Comments

bharat on The Demise of the HP-UX System…
H4mm3r on Avoiding catastrophe!
Vladimir on Argument list too long?
ddouthitt on The UNIX find command and…
Mihir G joshi on The UNIX find command and…

Category Cloud

BSD Career Debian Debugging Fedora FreeBSD HPUX Learning Linux MacOS X Mind Hacks Mobile Computing NetBSD Networking OpenBSD OpenSolaris Open Source OpenVMS Personal Notes Portable Presentations Red Hat Scripting Security Solaris Tips Ubuntu UNIX Wheel Group Windows

Archives

Feeds

Links