The response of the information technology (IT) community to the aftermath of hurricanes Katrina and Rita seems to fall into one of the following buckets. Either it's "We backup nightly so there's nothing to worry about", or "We have a disaster response plan so we know what to do".
Unfortunately for both of them, when disaster strikes again, they will be scrambling just as the Federal Emergency Management Agency did in New Orleans.
Let's be clear. Nightly backups do not constitute a disaster plan. They are nothing more than a way to recover lost information under some circumstances, and a disaster is not one of those circumstances. You will need equipment to load the backups and a location to house the equipment.
As for the written disaster response or business continuity plan, it is a positive first step toward true disaster readiness, but only a first step. Will your IT staff be able to find the plan following a disaster? Will they be able to communicate with each other? Will they be able to travel to an alternate site? What should they do if they encounter one or more of these difficulties?
Assuming you have a formal disaster response plan, there are three things you need to do achieve readiness - practice, practice and practice. Obviously, you're not going to flood the data center to see how the IT staff reacts to a disaster. Thankfully, there are better and cheaper ways to verify the plan.
Most of us do not think clearly or act calmly when faced with an unexpected emergency. The reaction of your team must become second nature in order to overcome their natural instincts. It's the small disasters that are of greatest concern, such as, a fire in the data center, a broken water pipe or a hazardous materials spill near your facility.
Full readiness requires a three-pronged approach including tabletop, functional and full-scale verifications. The goal of these efforts is to position your IT team to respond quickly and skillfully to any emergency situation. Let's examine each verification step.
The Tabletop Check
Reserve a big conference room for at least half a day. Gather together everyone with a role in the IT disaster response effort. Provide everyone with a copy of the plan. Define a simple disaster scenario such as a fire in the data center or water damage to the building. (Note: It is usually best to avoid scenarios involving human injury at the outset because such injury always takes priority over system failures.)
Have the team members walk through the scenario and describe what they would do. If they are to call someone, have them place the call and verify the correct contact information.
Appoint someone to capture errors, omissions and changes to the plan.
The Functional Drill
This drill is best done with minimal preparation or advance warning. The team should have completed the tabletop check and the plan should be ready to go.
Inform the IT team of the disaster scenario using whatever communication means has been decided. Have them perform their response functions up to but not including travel to another facility or loading files onto a backup system.
This is primarily a process verification effort and should not disrupt normal operations though any part of the process that can readily be exercised should be exercised. It's also a good idea to time the response.
The Full-Scale Exercise
Having completed the above steps, the IT team is now ready to conduct a full-blown simulated disaster. Backup facilities and servers should be brought online as all aspects of the disaster response are exercised.
This may require that the exercise be conducted after hours or on the weekend to avoid disrupting normal operations.
The atmosphere around this exercise will be tense because actions are required in real-time and the problems are realistic. The exercise is usually lengthy and complex requiring careful planning and attention to detail.
It is also a good idea to keep the participants interested by varying the exercises. For example, an element of randomness can be introduced by selecting "chance cards" from a deck as in the game of Monopoly. Such cards might introduce technical complications or require that certain team members be unavailable. These ideas add realism and keep exercises interesting.
How often should these exercises be run? Minimally, the Tabletop Check should be run whenever a significant change takes place in the IT architecture or the team.
Running the Functional Drill at least once a year is a good idea because using a real-life disaster scenario improves retention. Finally, the Full-Scale Exercise should also be conducted annually to make sure the plan really works.
Following this regimen will result in about one exercise per quarter at the outset. If the team gets good at responding and the environment doesn't change too rapidly, you may be able to stretch out these exercises to about two per year.
Note that I did not refer to any of these exercises as "tests". The goal is not to judge whether the team passed or failed. The goal is to prepare everyone for a real emergency and make ongoing corrections to the plan as new people, processes and technologies become available.
Always keep good records of the team's performance. Follow-up with corrective actions where needed and seek to improve the team's response over time.
Now you're ready. There's no way to prevent many common disasters but if you write down a plan, exercise it and continually improve it, your IT team will be ready and able to keep your company running during a crisis.
Vin D'Amico is Founder and President of DAMICON, your ADJUNCT CIO. He is an expert in leveraging open software to drive growth. DAMICON provides Freelance Technical Writing, IT Disaster Response Planning, and Network Security Management services to firms throughout New England.
This article appeared in Vin's monthly Virtual Business column for the IndUS Business Journal in October 2005.
To learn more about how DAMICON can help your business, please take a look at our service programs.
This column appears monthly in the IndUS Business Journal.