Some time ago, Elizabeth Zwicky wrote an article for LISA V (1991) titled Torture-testing Backup and Archive Programs (PDF) – and followed it up in LISA 2003 with Further Torture: More Testing of Backup and Archive Programs. The articles describe the tests of backup clients and archive programs extensively, and finds that all come short in some way or another – though the programs improved significantly over time.
These articles are real eye-opener; they show why a restore test is a critical part of any backup solution. Without testing a restore, there is no guarantee that an actual restore will be successful.
There are lots of stories about otherwise brilliant backup solutions that failed when a restore was necessary. My favorite was of a fellow who took the magnetic tape backups home as an offsite measure – except that he kept a massive magnet in the passenger seat of his car. The offsite backups were great – except he erased them (unknowingly) every time he took them home… Guess what happened when the offsite backups were needed during a critical restore?
To create a successful backup strategy, you must first choose how to make the backups:
- Gauge how critical the resource is. Do the backups need to be restored in minutes? Or is a restore in hours suitable?
- What kinds of backups will be taken? Full backups nightly? Incremental?
- Gauge the time and space available to take backups. Will the backup put a strain on the network? Is there enough space?
- Choose a program or programs to fulfill your needs and install.
After the infrastructure is in place, a successful backup strategy must:
- Perform a test backup, and measure the time and space taken.
- Perform a test restore (of a portion of the backup). How easy is it? Is it easy to use under pressure? Was it an accurate restore?
- Do a bare metal restore. How long did it take? Is it accurate?
- Perform a restore test from time to time to make sure that backups are good: once is not enough.
Only through diligent testing of both backup and restore can you be sure that everything is working properly, and your data safe.