To be be a modern and successful company and succeed your market, you need to own, manage and analyse huge amounts of data. The amount of information we all need to manage is growing rapidly as businesses typically need to manage customer bases, mail servers, user workstations, visitor transactions and many other systems.
However, various emergency situations can happen in business nowadays as IT systems become more complicated, and cyber-attack becomes more likely. These risks might result in harm to business continuity, for example cause problems such as OS and disk failures, RAID controller issues and other infrastructure damage. Your datacenter could experience fires, floodings, hurricanes, tornadoes and even earthquakes – and Bacula Systems hears of these scenarios happening to IT departments regularly.
The intelligent IT disaster recovery plan (DRP) is a vital part of your IT infrastructure strategy, focusing on protecting critical business functions that maintain business continuity. It defines the order of solving the problems that might occur in IT systems from potential failures and catastrophes. Bacula Systems provides industry-leading enterprise backup and recovery software together and can also help with a specific disaster recovery plan creation template which you can use alongside our system recovery software.
The IT disaster recovery plan basically includes several aspects:
- Electricity plan
- Communications plan(internet connection, telephony and messengers)
- Hardware plan
- Backup and restore techniques
The IT disaster recovery planning outlines a clear and detailed description of what the IT staff should do in case of certain contingencies (outages of services or systems). It is not only the plan, but also the check-list of what has to be done at each stage of backup and recovery procedure. The IT managers should define potential threats and risks, estimate downtime and recovery costs of your hard disks, workstations and server backups, incl. backing up virtual machines. Last but not least – C-level executives should consider being involved in creating the disaster recovery plans to understand the potential risks, then ensure the testing of the created plan in the whole organization and update it when infrastructure changes.
Know your business (BIA)
The hardest and one of the most important steps is to define how the potential shutdowns of IT systems affect the company’s operations. This step is named BIA or business impact analysis. If you’re unable to determine the impact of unplanned outages, it is impossible to establish the current situation’s adequate recovery strategy and restore techniques. Some examples of unplanned outages are:
1) IT system failures
2) Fires and floodings
3) Power outages
4) Vulnerability attacks
Depending on the nature of the failure, it can lead the organization to a loss of income, the disclosure of confidential customer data or even business stopage. You need to answer several important questions:
- How important is each function to the organization? The answer, of course, depends on whether this particular service affects revenues.
- How long this can business function can be interrupted? In other words, how much time will pass from the termination of the service to the point where it begins to negatively affect the flow of income?
- How many actual and potential customers can be lost without a serious impact on the business?
- From which IT structures depends the business itself?
While writing the example version let’s imagine the priority function is customers’ order management and fulfillment. In the specific example infrastructure that means several systems that need to be restored immediately as a priority:
a. Workstation restore.
The Windows system restore on the workstations of call center agents. This also includes hard disk data recovery in case several agents experienced disk problems.
b. Restore of Exchange server.
The important part is the backup and recovery of Exchange server which is used while communicating with customers on their potential orders and with suppliers to purchase goods being sold.
c. Restore the database with SQL Server.
The specific CRM application running on SQL Server database should stay up-and-running 99,99% of time in order to fulfill the orders.
d. Linux server data recovery.
The website where customers enter the orders is running on a Linux server which also needs to be backed up.
This data is used to create the example recovery strategy. The most important functions are restored in the first priority while the less important ones (for example, the file server with internal documents in this case) get lower priority in recovery process. While documenting each step the recovery procedure should be evaluated in real-world units (money, hours, customers) and not in abstract terms (“many customers” or “this is an important thing”). Such analysis can’t be performed without the participation of your financial managers.
Assess the risks
If BIA helps to identify the most important business processes and describes the potential impact of disruption of these processes, the risk assessment determines the internal and external situation, which may adversely affect the critical processes. Risks are usually divided into two groups:
a.Natural, such as storms, hurricanes, floods, earthquakes and tsunamis.
b. Man-Initiated (intentional and unintentional). These include hacking, fraud, personnel strikes, terrorism, fires, errors in software,
hardware failure, human error, power failures, and even Internet or other network failure.
Some of the risks (especially IT) can be accurately evaluated. For example, the presence of the monitoring system and the user requests’ journal both help to accurately assess the probability of failure of IT systems.
Imagine in our example that the main risks determined are viruses on the sales department’s workstations and malfunctioning power supplies. The installed monitoring system allows system administrators to see some failures in advance. For example, the monitoring system revealed that the workstation’s disk was practically full, and that most of it was populated with critical data. To address this risk while also reducing the risk of viruses, the sales departments’ systems (70 people) were moved to Linux, while maintaining the ability to connect to a Windows Terminal Server. This then allowed the IT department to simplify the recovery of all and any data on the server.
The assessment of the risks should be based on common sense. For example, for the organization, located on the coast of the sea (an example could be Japan) probability of the event “Storm” is much higher than for office, located in Nebraska, USA. And the location of the offices in the hot spots of the planet is much increases the risk of “terrorism” or “bombing” like Istambul or Cairo. The risk of “theft” in a well-guarded office center is much lower than in the premises on the territory of non-working plant.
What threatens these or other risks?
There are five major effects which may have catastrophic consequences for businesses: denial of access, loss of data, loss of human resources, loss of functions and lack of information. For example, internet connection loss will lead to not being able to exchange e-mail, but your data will remain with you. But in case of fire or theft it will no longer be available (or will be, but not to you). This may lead to reduced cashflow, damage to the company’s brand, loss of market share, loss of employee spirit and increased staff turnover, rising legal costs and so on.
In practice, in the process of risk assessment, estimating the probability of an event and the consequences is difficult. In our example the conversation with the manager allows the assumption that the risk of fire in the head office is large enough, because the organization rented an office in the industrial building near one of the plants. The main risks are artificial. Those that lead to data loss (for example – fire). It’s interesting to discover that of 70 sales persons, 50 actually were simple operators. Understanding this point allows us to simplify the recovery process (the first restore priority goes to those 20 most valuable sales persons).
Formulate the plan
Once the critical functions are identified and the effects of failure are evaluated, you can develop a recovery strategy to help prevent or reduce financial loss. A key factor for development should be an understanding that the total recovery value can not exceed the losses which it is designed to prevent.
It is necessary to take into account the geographical location of certain services. For example, the location of the web server at the hosting provider means that the development of the recovery strategy is difficult. Or the change of the organization’s location means you can’t organize internet channels fast via fiber optic or DSL, so it is necessary to take into account the acquisition of 3G equipment.
Based on this risk assessment, the following recovery strategy could be developed in the initial DRP: quickly assemble a team to recover, inform customers and partners about the problems, restore at first services bringing the money. Given the high competition in the market, it may be decided to determine the recovery time as one business day.
The first sheet of the disaster recovery strategy can consist from these points:
1) Make the owners and managers of the company aware about the problem;
2) The acquisition from the permanent supplier of the suitable server hardware, at least 10 workstations and 2 printers require a prior agreement with the supplier;
3) The installation of equipment in the warehouse and the deployment of personnel requires placing an additional 8 tables and pull network;
4) The installation of system and applications on the server requires automatic file backup software like Bacula Enterprise Edition;
5) The recovery of the CRM requires regular backups and check-lists of the recovery process for single files;
6) The installation of client-bank payment systems requires the agreement with the bank;
7) The usage of free mailboxes at the time of full system recovery required to notify customers and partners in advance and manage client contact lists.
Important note. The recovery cost should not be more expensive than the cost of the damage. This is important, since the systems must be cost effective, they must support the business, not “pull the juices out of it.”
The next step will be to document recovery strategies and procedures, which is the basis for a disaster recovery plan. DRP should be formalized and legalized. Let it initially be simple. For the development of an accurate and detailed disaster recovery plan you’ll need time. On the “high level” (hight point of view), the disaster recovery plan should indicate priorities for recovery systems, estimated recovery time, recovery procedures, as well as the location of the backup data and contacts for key staff involved. Legalization is sometimes required to permit access to the territory of the organization at any time. IT staff may need power to call accountants to check the recovered systems in the non-business hours.
Test the plan
Testing the plan always helps to determine which elements are not described and have to be added. Of course, this takes time and testing the hardware, but it is much better than to find gaps in plans already at the time of (or after) the disaster. Each time the recovery is tested, weaknesses are revealed and improvements are made in the plan, your organization has more confidence that the business will continue working despite the troubles.
As the business grows, the data recovery plan and strategy become increasingly urgent. However, even big businesses are keen to implement an economically viable recovery plan. Reducing the cost of investment in the DRP typically leads directly to the usage of systems like Bacula Enterprise Edition with its powerful pricing model, exceptional open core benefits and enterprise grade tools, such as Bare Metal Recovery.
- Don’t know about Bacula Enterprise Edition’s capabilities? See the full list of our data backup and recovery features.
- Are you an MSP? Look our backup to cloud offering for MSPs.
- Need virtual machine backup for all hypervisors in your infrastructure? Take a look at our virtual machine backup solutions.
- Need database backup support for all DB types? Take a look at our database backup solution, incl. backup of MySQL databases, PostgreSQL DB backup, backup of Oracle databases, SAP backup and others.