Disaster recovery planning

17 April 2008

Planning for a disaster is not necessarily as easy as it sounds. It helps if you have a rampant imagination. Throughout disaster planning, the dominant question is What if…? Following the planning, testing is required: the best plans are worthless if they don’t work in practice.

Consider an Internet server serving web pages. Let’s assume that downtime is not an option: this is a typical point to start at. The best thing to do is to start with the most specific to the system (the complete environment) and work out:

  • What if… a disk goes bad?
  • What if… the software stops?
  • What if… memory runs out?
  • What if… the power goes out?
  • What if… the kernel panics?
  • What if… the cluster failover fails?
  • What if… the network switch fails?
  • What if… the network firewall fails?
  • What if… the internet link goes down?
  • What if… the internet provider drops off the grid?

Each one of the questions must be answered and the results tested. To test for power outage, pull the power. For a failed network switch, pull the network cable – and so forth.

Most of the answers will include some form of redundancy – clusters, dual facilities (such as power and network and internet providers), and so on. However redundancy is only one solution; there is prevention and alerts as well.

Each risk must be weighed against the cost to mitigate that risk. However, assuming that the risk is minimal does not eliminate the risk; the biggest problem is not accounting for a risk that eventually happens. There is nothing like downtime of a critical server to get an unforeseen risk taken care of; better to handle the risk before it happens.

It also does not matter if the plans have not been tested. If tests are not done, then the actual event will be the first time things have been put to the test – and what if something was missed and the system goes down? During a test, preventive measures can be taken to make sure that things work as they should – during an unexpected event, it is not possible to back out or prepare; if things go down they go hard. Don’t let that happen to you!

And disaster planning is not limited to servers (or virtual servers) – what about the possibility of a server hosting multiple virtual servers going down? What if your server is hacked into? What about your monitoring system failing? What about getting paged? Have you planned contingencies for all of these events?

Plan, then test, test, test – and you will make it.

Entry Filed under: Linux, Security, UNIX. Tags: , , , , .

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


David Douthitt

David is an experienced UNIX and Linux system administrator, a former Linux distribution maintainer, and author of two books ("Advanced Topics in System Administration" and "GNU Screen: A Comprehensive Manual"). View David Douthitt's profile on LinkedIn Support freedom The Internet Traffic Report monitors the flow of data around the world. It then displays a value between zero and 100. Higher values indicate faster and more reliable connections.

Recent Posts

Top Posts

RSS Sharky’s Column!

Calendar

April 2008
M T W T F S S
« Mar   May »
 123456
78910111213
14151617181920
21222324252627
282930  

Recent Comments

Peter on Using Open Source in the Enter…
Anthony on About
MikeT on Stress Relief: Laugh Out Loud…
yungchin on Sparse files – what, why…
Randal L. Schwartz on Perl Tidbits: Annoyances and…

Category Cloud

BSD Career Conferences Debian Debugging Disaster recovery Fedora FreeBSD HP-UX Legal Linux MacOS X Mobile Computing Networking OpenBSD OpenSolaris OpenVMS Personal Notes Portable Code Presentations Productivity Programming Red Hat Scripting Security Solaris Storage Tips Ubuntu UNIX

Archives

Feeds

Blogroll

Pages

Meta