6 Questions to Ask When Designing a Written Disaster Recovery Plan
Data fuels modern business and decides winners and losers in the digital economy. But collecting, analyzing, and leveraging business and user data is only part of the equation. Prudent organizations must also take steps to protect their systems and data and have a plan in place in case disaster should strike.
Potential causes of data loss are:
- Power Outages
- Data Storage Corruption
- Natural Disasters (Fire, Flood, Earthquake)
- Distributed Denial-of-Service Attacks
- Malware (Viruses, Ransomware)
The key is to have a comprehensive, written, and regularly reviewed disaster recovery plan: rules and procedures to follow in the event of an outage.
Disaster recovery planning is sometimes conflated with business continuity planning, but the latter is a broader term that encompasses all other functional parts of business needed to keep it running without disruption (e.g. cash flow, supply chains, workforces, physical infrastructure, etc.), whereas disaster recovery within the IT context is focused on hardening and restoring digital systems.
Additionally, the primary goal of business continuity planning is to ensure the business can meet its obligations despite external challenges. Disaster recovery planning, as its name suggests, is directed at righting the ship after an interruption.
If your organization doesn’t already have a plan in place, now is the time to put one together. Here are six important questions you should examine to get started:
Question 1: What’s connected to and stored on your network?
You can’t restore things to the way they were before a data disaster if you haven’t been keeping track of the daily changes to your systems. That’s why step one for all disaster recovery planning is mapping your network and auditing all the resources using it (and the data flowing through it).
Leave no node unmapped, no dataset unlabeled, and get as full a picture as possible of the present state of the network and all the devices and users connected to it.
Question 2: What can you afford to lose?
When asked what data they manage is mission critical, many IT administrator’s instinctive response is: all of it! Naturally, losing any data is a stressful situation that preferably would be avoided completely. But in reality, data loss is inevitable. Hard drives sometimes fail without warning, inclement weather and cyberattacks are both on the rise, and, despite the best efforts and guidance of IT pros, user error can never be completely prevented.
Even if you do everything right, some data is going to be lost, which is why it’s important to recognize that not all data is of equal importance. The encrypted file containing all the users’ credentials, for example, should always have more robust backups and disaster planning than, say, the folder with ancient financial estimates that could have safely been deleted years ago.
Plus, just about every business IT network is bogged down with redundant data because departments are siloed and haven’t been sharing information. A broad IT audit can identify low priority or duplicated data and make it less of a priority in disaster planning, which saves on storage and transfer costs and decreases overall system complexity. It’s not just a matter of saving time and money on recovery operations — it’s simply good data hygiene that will deliver greater operational efficiency across the organization over time.
Question 3: What are your recovery goals?
There are two standards by which businesses determine their goals for recovering from a data disaster:
- Recovery Point Objectives (RPO): The maximum amount of data (and the maximum age of data) that can be lost without the organization suffering an unrecoverable disruption. That is, what is the breaking point for the organization? If they lost a single day’s work, would that be a catastrophic loss or could it recover quickly? Some businesses are more vulnerable to data loss than others and thus must invest in more granular recovery points.
- Recovery Time Objectives (RTO): The maximum amount of time that can pass without access to business data before a critical break in continuity of service occurs, i.e. how long can the business afford to wait for a recovery process? For even very large systems, rapid recovery can be feasible, but it requires greater planning, oversight, and technical resources.
Some datasets are less time sensitive than others and hence can be deprioritized and perhaps protected with much slower and cheaper backup technologies and can safely be backed up less frequently. The same can’t be said of things like datasets needed for regulatory compliance, which even at greater expense, must be made available around the clock.
The balance businesses need to strike is how much data (and of what type) they need to function without a major disruption of service and how long they can go without it. In a perfect world, every backup and recovery plan would cover all data of all types and ensure full recovery instantly upon an outage. In reality, that is neither cost effective nor practical to accomplish, so finding a tailored solution that fits the operational needs, technical and financial resources, and cultural risk tolerance of a business is an ongoing mission.
Question 4: Where can you safely store backups?
Many businesses keep their most used data and systems close at hand, whether in or physically near where they are actively being used. That reduces networking requirements, makes it simple to service the equipment, and provides ample opportunity to monitor and secure everything. But that one location, no matter how protected, can never be the only place critical backups are stored. One major cyberattack or natural disaster could be all it takes to separate the company from access to its IT resources and sensitive user data.
Instead, choose a 3-2-1 backup solution: keep 3 copies of all critical files, an original with 2 backups, 1 of which is stored offsite (typically in a third party cloud platform). That way, if anything happens to your primary data, you can quickly access the onsite backup. But if a disaster affects the primary files and the backup, you can still fall back to the offsite copy.
Cloud options for that third copy are increasingly desirable because the price and availability of bandwidth is now low enough to make it affordable to link your systems to a cloud storage provider’s and keep everything synchronized in near real time (or, if that level of backup isn’t required, they can be setup to upload at whatever schedule the administrator prefers).
Question 5: Who is in charge of implementing the disaster recovery plan?
The moment when everything turns upside down is precisely the wrong time for there to be any confusion as to who is responsible for putting them back in order. Every employee in a business has a role to play in recovery. For many, that role will simply be to shut everything down when disaster strikes, don’t click anything, and report their current situation up the chain of command.
Emergencies can be made worse when everyone is pulling in different directions or taking actions on their own initiatives without telling anyone else. A comprehensive disaster recovery plan should delineate unambiguously who is quarterbacking the recovery and the expectations for everyone downstream to support the effort.
Question 6: How often (and how rigorously) will you test and update your backup and recovery plan?
A disaster recovery plan may look comprehensive on paper, but you’ll never know how many points of failure you may have missed until you test it in real world conditions. Network administrators should simulate every type of outage, and an attack on their systems can be expected to determine whether their existing plan can deliver on the RPO and RTO goals they’ve established.
And, because external threats and the makeup of the internal IT infrastructure are constantly changing and evolving, that testing should become a semi-regular occurrence. How regular depends on the level of threat facing the organization and its overall degree of resilience. Hardened networks that store a minimum of critical data obviously demand less testing than those that are more exposed.
D2 Integrated Solutions is a pragmatic and collaborative IT partner. If you have concerns about the safety of your data and networks, contact us to discuss implementing a holistic disaster recovery plan today.