To be be a modern and successful company and succeed your market, you need to own, manage and analyse huge amounts of data. The amount of information we all need to manage is growing rapidly as businesses typically need to manage customer bases, mail servers, user workstations, visitor transactions and many other systems.
However, various emergency situations can happen in business nowadays as IT systems become more complicated, and cyber-attack becomes more likely. These risks might result in harm to business continuity, for example cause problems such as OS and disk failures, RAID controller issues and other infrastructure damage. Your datacenter could experience fires, floodings, hurricanes, tornadoes and even earthquakes – and Bacula Systems hears of these scenarios happening to IT departments regularly.
The intelligent IT disaster recovery plan (DRP) is a vital part of your IT infrastructure strategy, focusing on protecting critical business functions that maintain business continuity. It defines the order of solving the problems that might occur in IT systems from potential failures and catastrophes. Bacula Systems provides industry-leading disaster recovery software and can also help by providing 2 important things:
a) IT disaster recovery planning guide with information on commitments, defining priorities, determining technical approach and developing, implementing and testing the DRP;
b) A sample IT disaster recovery plan template which you can use alongside our disaster recovery software solution for your business continuity.
Download Disaster Recovery Plan Template
Download Disaster Recovery and High Availability Whitepaper
This disaster recovery and business continuity plan is intended to provide insight into the considerations and processes required to establish, test, and implement disaster recovery procedures for crucial IT services at your company. The IT disaster recovery plan basically includes several aspects:
The sample IT disaster recovery plan that you can download on our website outlines a clear and detailed description of what the IT staff should do in case of certain contingencies (outages of services or systems). It is not only the plan, but also the check-list of what has to be done at each stage of backup and recovery procedure. The IT managers should define potential threats and risks, estimate downtime and recovery costs of your hard disks, workstations and server backups, incl. backing up virtual machines. Last but not least – C-level executives should consider being involved in creating the disaster recovery plans and choosing disaster recovery solution to understand the potential risks, then ensure the testing of the created plan in the whole organization and update it when infrastructure changes.
Know your business (BIA)
The hardest and one of the most important steps is to define how the potential shutdowns of IT systems affect the company’s operations. This step is named BIA or business impact analysis. If you’re unable to determine the impact of unplanned outages, it is impossible to establish the current situation’s adequate disaster recovery strategy and find the best disaster recovery software. Some examples of unplanned outages are:
Depending on the nature of the failure, it can lead the organization to a loss of income, the disclosure of confidential customer data or even business stopage. You need to answer several important questions:
While writing the example version let’s imagine the priority function is customers’ order management and fulfillment. In the specific example infrastructure that means several systems that need to be restored immediately as a priority:
a. Workstation restore.
The Windows system restore on the workstations of call center agents. This also includes hard disk data recovery in case several agents experienced disk problems.
b. Restore of Exchange server.
The important part is the backup and recovery of Exchange server which is used while communicating with customers on their potential orders and with suppliers to purchase goods being sold.
c. Restore the database with SQL Server.
The specific CRM application running on SQL Server database should stay up-and-running 99,99% of time in order to fulfill the orders.
d. Linux server data recovery.
The website where customers enter the orders is running on a Linux server which also needs to be backed up.
This data is used to create the example recovery strategy. The most important functions are restored in the first priority while the less important ones (for example, the file server with internal documents in this case) get lower priority in recovery process. While documenting each step the recovery procedure should be evaluated in real-world units (money, hours, customers) and not in abstract terms (“many customers” or “this is an important thing”). Such analysis can’t be performed without the participation of your financial managers.
Assess the risks
If BIA helps to identify the most important business processes and describes the potential impact of disruption of these processes, the risk assessment determines the internal and external situation, which may adversely affect the critical processes. Risks are usually divided into two groups:
a.Natural, such as storms, hurricanes, floods, earthquakes and tsunamis.
b. Man-Initiated (intentional and unintentional). These include hacking, fraud, personnel strikes, terrorism, fires, errors in software, hardware failure, human error, power failures, and even Internet or other network failure.
Some of the risks (especially IT) can be accurately evaluated. For example, the presence of the monitoring system and the user requests’ journal both help to accurately assess the probability of failure of IT systems.
Imagine in our example that the main risks determined are viruses on the sales department’s workstations and malfunctioning power supplies. The installed monitoring system allows system administrators to see some failures in advance. For example, the monitoring system revealed that the workstation’s disk was practically full, and that most of it was populated with critical data. To address this risk while also reducing the risk of viruses, the sales departments’ systems (70 people) were moved to Linux, while maintaining the ability to connect to a Windows Terminal Server. This then allowed the IT department to simplify the recovery of all and any data on the server.
The assessment of the risks should be based on common sense. For example, for the organization, located on the coast of the sea (an example could be Japan) probability of the event “Storm” is much higher than for office, located in Nebraska, USA. And the location of the offices in the hot spots of the planet is much increases the risk of “terrorism” or “bombing” like Istambul or Cairo. The risk of “theft” in a well-guarded office center is much lower than in the premises on the territory of non-working plant.
What threatens these or other risks?
There are five major effects which may have catastrophic consequences for businesses: denial of access, loss of data, loss of human resources, loss of functions and lack of information. For example, internet connection loss will lead to not being able to exchange e-mail, but your data will remain with you. But in case of fire or theft it will no longer be available (or will be, but not to you). This may lead to reduced cashflow, damage to the company’s brand, loss of market share, loss of employee spirit and increased staff turnover, rising legal costs and so on.
In practice, in the process of risk assessment, estimating the probability of an event and the consequences is difficult. In our example the conversation with the manager allows the assumption that the risk of fire in the head office is large enough, because the organization rented an office in the industrial building near one of the plants. The main risks are artificial. Those that lead to data loss (for example – fire). It’s interesting to discover that of 70 sales persons, 50 actually were simple operators. Understanding this point allows us to simplify the recovery process (the first restore priority goes to those 20 most valuable sales persons).
Formulate the disaster recovery plan
Once the critical functions are identified and the effects of failure are evaluated, you can develop a recovery strategy to help prevent or reduce financial loss. A key factor for development should be an understanding that the total recovery value can not exceed the losses which it is designed to prevent.
It is necessary to take into account the geographical location of certain services. For example, the location of the web server at the hosting provider means that the development of the recovery strategy is difficult. Or the change of the organization’s location means you can’t organize internet channels fast via fiber optic or DSL, so it is necessary to take into account the acquisition of 3G equipment.
Based on this risk assessment, the following recovery strategy could be developed in the initial DRP plan: quickly assemble a team to recover, inform customers and partners about the problems, restore at first services bringing the money. Given the high competition in the market, it may be decided to determine the recovery time as one business day.
The first sheet of the disaster recovery strategy can consist from these points:
Important note. The recovery cost should not be more expensive than the cost of the damage. This is important, since the systems must be cost effective, they must support the business, not “pull the juices out of it.”
The next step will be to document recovery strategies and procedures, which is the basis for an IT disaster recovery plan. DRP should be formalized and legalized. Let it initially be simple. For the development of an accurate and detailed disaster recovery plan you’ll need time. On the “high level” (hight point of view), the disaster recovery plan should indicate priorities for recovery systems, estimated recovery time, recovery procedures, as well as the location of the backup data and contacts for key staff involved. Legalization is sometimes required to permit access to the territory of the organization at any time. IT staff may need power to call accountants to check the recovered systems in the non-business hours.
Test the IT disaster recovery plans
Testing the disaster recovery plans always helps to determine which elements are not described and have to be added. Of course, this takes time and testing the hardware, but it is much better than to find gaps in plans already at the time of (or after) the disaster. Each time the recovery is tested, weaknesses are revealed and improvements are made in the plan, your organization has more confidence that the business will continue working despite the troubles.
Conclusions on DRP planning procedure
As the business grows, the IT data recovery plan become increasingly urgent. However, even big businesses are keen to implement an economically viable recovery plan. Reducing the cost of investment in the DRP typically leads directly to the usage of disaster recovery software like Bacula Enterprise Edition with its powerful pricing model, exceptional open core benefits and enterprise grade disaster recovery tool, such as Bare Metal Recovery.