### 1. Framework for a Backup Policy

#### 1.1 Deﬁnition

A backup policy helps manage users’ expectations and provides speciﬁc guidance on the “who, what, when, and how” of the data backup and restore process. Collecting information about backing data up before it is needed helps prevent problems and
delays that may be encountered when a user needs data from a backup. There are several beneﬁts to documenting your data backup policy:
• It helps clarify the policies, procedures, and responsibilities.
• It will deﬁne:
– where backups are located
– who can access backups and how they can be contacted
– how often data should be backed up
– what kind of backups are performed
• Other policies or procedures that may already exist or that supersede the policy (such as contingency plans) are identiﬁed.
• A schedule for performing backups is well-deﬁned.
• It will identify who is responsible for performing the backups and their contact information. This should include more than one person, in case the primary backup operator is unavailable.
• It will deﬁne deﬁne who is responsible for checking that backups have been performed successfully, how and when they will perform this checking.
• A policy ensures data can be completely restored.
• A training plan for those responsible for performing the backups and for the users who may need to access the backups should be mandatory.
• The data Backup is partially, if not fully automated.
• The policy will ensure that more than one copy of the backup exists and that it is not located in same location as the originating data.
• It will ensure that a variety of media are used to backup data, as each media type has its own inherent reliability issues.
• It will ensure that anyone new to the project or oﬃce can be given the documentation which will help inform them and provide guidance.
Deﬁning a Data Backup Policy helps overview your infrastructure as well as your backup needs in order to create JobDeﬁnitions, Schedules, Pools and Jobs that will match your environment.

#### 1.2 Example

1. Scope of Policy
2. Purpose
3. Legal and Regulatory Obligations
4. Policy
(a) Schedule of every important piece to be backed up
(b) Storage of the ﬁrst line Data Backup – Disk
(c) Transport and storage of tapes
(d) Tape Rotation and Storage
(e) Regular data backup veriﬁcation
(f) Data recovery test in case of disaster recovery – scope and schedule
(g) Restoration request process
(h) Backup logs management
(i) Backup monitoring
(j) Backup Failure Management
(k) Disposal of redundant/damaged tapes
5. Reporting – Role and Responsibilities
(a) Backup and data recovery
(b) Veriﬁcations
(c) Disaster Recovery situation
(d) Policy Implementation
(e) Policy Review

#### 1.3 Notes

The information given in this Best Practice article is for the purpose of information mainly. It needs to be adapted to the enterprise infrastructure and relevant policy and regulatory obligations of your company.

### 2. Naming Rescources

#### 2.1 Naming Resources

There are a lot of resources you will need to conﬁgure in Bacula. Therefore it is
a very good practice to deﬁne and standardize the naming. The goal is to avoid
having eg Jobs named Job1,Job2,. . . which do not describe what they do and will
be diﬃcult to trace back after several months of use. Your conﬁguration will gain
clarity.

#### 2.2 Naming Examples

Here are some naming examples for your conﬁguration:
◾ Jobs will have the name of the client, the main function and a trailing -job:

ActiveDirectory-SystemState-job
DataServ-MySQL-job
RedHat6-vSphere-job

– Copy/Migration Jobs can have a trailing -copyjob or -migrationjob:

ActiveDirectory-SystemState-copyjob
DataServ-MySQL-migrationjob
RedHat6-vSphere-copyjob

– FileSets can be named according to what they do with a trailing -fs

WindowsAllDrives-fs
WindowsC-fs
Home-fs

– JobDefs can be named according to the group of clients they for, with atrailing -jd

MainProduction-jd
TestMachines-jd
DedupHomes-jd
Database-jd

– Pools can be named according to the location or name of the Storage Daemon, the Media Type, the frequency, the level of the backup and a trailing -pool

ParisWeeklyDiskFull-pool
NewYorkDailyDedupFull-pool
SafeHouseMonthlyVirtualFull-pool
bacula-sd2-ZFS-Differential-pool

– Clients could be named with the hostname and a trailing -fd

hrserver.domain-fd
ceolaptop.domain-fd

– Schedules might contain the frequency and/or the purpose, plus a trailing -sched

DailyMorningCatalog-sched
DailyEveningVM-sched

– Console should be named with the hostname and a description followed by a trailing -console

NewYorkSDrestrictedRestore-console
VPSalesBackup-console

– Director/Storage Daemon(s)/Clients should be named with their location and hostname. If they are dedicated, one can add a description. At the end we add -dir / -sd / -fd

Europe-bacula1-dir
Iceland-baculabkp-DR-dir
NewYork-storage1-dedup-sd
Houston-clt2314-vsphere-fd

– Autochangers will be easy to ﬁnd if their name starts with the Storage Daemons name, then a description and a trailing -autochanger

NewYork-storage1-dedup-xfs-autochanger

– Devices should refer to the naming of the Storage Daemon they are attached to, as well as, in the case of a Device inside an autochanger, its name and a number. A -device can be added at the end.

NewYork-storage1-dedup-xfs-autochanger-tape1-device
NewYork-storage1-dedup-xfs-autochanger-tape2-device

◾ Messages resources can be customized per daemon or per job and a trailing -messages:

Europe-bacula1-dir-messages
ActiveDirectory-SystemState-job-messages

#### 2.3 Notes

◾ Please don’t use spaces in naming, it will work but requires the directives to be quoted.
◾ You can also deﬁne acronyms, like DR for Disaster Recovery, EX for External storage or countries such as FR for France.
◾ Use the “Description” directive to add even more details for your resources (when possible).
◾ Stick to your naming convention regarding “-“, “_”, format. Do not rename resources after they were set and used.( See 2.1 on renaming resources for
more details)
◾ Resource names are limited to a ﬁxed number of characters. Currently the limit is deﬁned as 127 characters.
◾ Add comment with a # on directives with values directly in the conﬁguration ﬁle ( like File Retention = 94608000 # 3 years in seconds ) to know what it means. For example:

ResourceDirective = Value \# This is a comment explaining the value

Or use BWeb’s description ﬁelds.

### 3. Schedules and Retentions

Schedules and retentions play a major role in calculating the requirements of a backup system. The number of tapes and the disk space are closely related to these two parameters. For instance, a 10MB backup taken every hour with a retention period of one month would require at least 7200MB of disk space to hold the full backup cycle.

#### 3.1 Set up

Bacula will do exactly what you tell it to do. This is why you should put an extra precaution on your Schedules and Retentions. If the frequency of your Schedule is high and the retentions are long, you will need a lot of disk space. In addition the catalog will be large, as its size is directly proportional to the number of jobs and the retention periods.
On the other hand, if you miss your retention and scheduling, you can have your last Full backup recycled before a new Full has started. The easiest way to deal with schedules and retention is to use a calendar or a spreadsheet to color the days.

#### 3.2 Schedule Example

Let’s say that we have a pool “Storage1-WeeklyFull-pool” with all retentions set to 7 days. Further we have an incremental pool “Storage1-DailyInc-pool” with the same retention times. The Full backups are scheduled each Monday night and Incrementals daily from Tuesday to Sunday:

Example for a simple schedule with 7 days of retention for Full and Incremental backups.

We can see the Full backups in red and the Incremental backups in blue. The retention times are indicated by a lighter color. The ﬁrst Full F01 will expire on the same day we take the next Full F02. This is dangerous: if for any reason the Full job F02 does not succeed, no Fulls are available at that point in time. In this case, setting the full pool retention to 8 or 9 days allows the backup administrator to run again a Full manually (disabling the Incremental of the day while running) or conﬁgure Maximum Full Interval to 7 days to force the promotion of an Incremental to a Full if no Fulls are available in the catalog for the last 7 days.
If you want to guarantee to your users that they can “go back in time” for 7 days, the retention times in the example scheme depicted above will be too short: Imagine a user on day 11 (Thursday) in the morning (I09 will be taken in the late evening) who wants to have a ﬁle back from the week before, and the only thing he knows is that he last saw the ﬁle on Friday of the previous week. Since the retention time of Full F01 has expired, all the Incremental backup that followed can no longer be used to construct the ﬁle tree, because Bacula will need a Full and all following Incremental for this. Thus at this point in time you will only be able to go back to the Monday of the same week when the most recent Full was taken. You are not able to go back 7 days in time. To accomplish this at all times with the above scheme the retention time for the Full should be at least 15 days. Of course it could be that the ﬁle we are looking for has changed on day 5 (Friday of the week before) and was backed up in the Incremental I04, but this is a special case. If the ﬁle did not change it would not have been in that backup. To cover the general situation and allow to go back in time for 7 days in all scenarios the only way to do it is to choose the retention times that are long enough,according to your backup policies.

#### 3.3 Notes

The example above is a simple one, there are other ways of scheduling: using a speciﬁc week number (beware of 53 weeks years), or depending on a speciﬁc weekday (4th Sunday for example, beware of months with 5 Sundays). Theses variations must be well veriﬁed to avoid overlapping or, worse, missing a Full backup and thus creating a gap. In such a case you would be unable to restore in certain situations. When planning schedules, please pay attention to corner cases, as sometimes the year has 53 weeks and a month, 5 weeks. In such cases if the retention period doesn’t take these situations into account the previous full backup may be pruned before it should. It is recommended to conﬁgure Bacula in such a way that two fulls may co-exist for a while. It is better to be safe than sorry.
Along with storage space, the catalog also grows proportionally to the number of ﬁles and the frequency of the backups.
Be sure that the Storage Daemon is able to handle the aggregate bandwidth of the backups (the network and the storage speed) and all their checksums (CPU) in order to complete within the backup window.

### 4. Partition Scheme

#### 4.1 Requirements

Bacula will require some space for temporary ﬁles and logs.
By default, the directory /opt/bacula/working is used to store the following things:
◾ BWeb log ﬁles
◾ SD attributes
◾ DIR, FD, SD trace and dump ﬁles
◾ Plugin log ﬁles and other plugin-speciﬁc data
It is best practice to have the following partitioning to avoid ﬁlling your root space.

#### 4.2 Proposal

◾ at least 10GB+ for /opt, we recommend this value as /opt/bacula/working is
heavily used for caching purpose,plugin logs and mounts.
◾ a separated partition for your catalog
◾ a separated /opt partition for spooling (if you use spooling to tapes) the size of your spooling (Directive SpoolSize) + 10GB, remember that the spooling of attributes will go by default on /opt/bacula/working, use the Directive Spool Directory to set this spooling partition in your conﬁguration.

#### 4.3 Notes

Bear in mind that partitioning your system will guarantee an eﬃcient load on each partition and avoid fulﬁlling of the root one
Regarding the space allowed for disk based backup, please conﬁgure it to be extensible by using LVM/ZFS or any other mechanism, even by assigning jobs to a dedicated partition and use copy and migration jobs afterwards to keep useful jobs.

### 5 Disaster Recovery Plan

#### 5.1 Template

Your Disaster Recovery plan should include Bacula’s conﬁguration and backups. In order to do so, you can have oﬀ-site backups including your conﬁguration.
The needs are simple, the backups (for example the last fulls of your systems), the content of /opt/bacula/etc and the catalog.
The backups can be copied or migrated using Copy/Migration jobs from one site to a safe site using schedules.
If you are using tapes, you can remove a set of tapes after the Full backups and put them in a safe.
The very important part of this plan is that the catalog and the conﬁguration ﬁles are in one single volume.

#### 5.2 Example

This is an example which you need to adapt to your own system. First, set up a job that will backup /opt/bacula/etc/* and your catalog dump.
We would advise to have all Bacula Enterprise Edition conﬁguration (/opt/bacula/etc/content) as well as the catalog backup in a dedicated pool to ease the process of recovery as the job will not be mixed in dozen of other jobs. Catalog and conﬁguration ﬁles are all you need for recovering your Bacula backup environment.
This way we set up a speciﬁc pool for our Bacula catalog conﬁguration and catalog backup, one job per volume Of course you can tweak this example to have more ﬁles backed up to build your own DR plan (eg. TLS keys etc. . . ).
Here is an example of a Pool on hard drives backing up a disaster recovery job

Pool {
Name = “DisasterRecovery-pool”
ActionOnPurge = Truncate
AutoPrune = yes
FileRetention = 86400 # Adjust to the schedule of the Disaster Recovery job here it is recycled every day
JobRetention = 86400 # Adjust here too
LabelFormat = “Disaster-${JobName}-${NumVols+:p/3/0/r}”
PoolType = “Backup”
Recycle = no # This needs some house-keeping to delete old volumes
Storage = “OnDisk”
UseVolumeOnce = no
MaximumVolumeJobs = 1 # Adjust here if you don’t want 1 job per volume
VolumeRetention = 86400 # Adjust this too according to your job’s schedule and DR plan
VolumeUseDuration = 86400 # Adjust again here with the same constraints as above
}

Here is an example Job and FileSet for a Disaster Recovery plan

Fileset {
Name = “DisasterRecovery-fs”
Include {
Options {
Signature = Md5
}
File = “/opt/bacula/working/bacula.sql” # where the Bacula catalog dump goes
File = “/opt/bacula/etc” # Director’s config files
# you can add other files like keys, content of {\em /etc} to make this FileSet more complete and adapted to your environment
}
}
Job {
Name = “DisasterRecovery-job”
Type = “Backup”
Client = “baculaServer” # change to the name of the fd running on the Bacula DIR
Fileset = “DisasterRecovery-fs”
JobDefs = “Default-jd”
Level = “Full” # full backup is preferable
Messages = “Standard”
Pool = “DisasterRecovery-pool” # the pool we just defined to hols all config and catalog dumps
Priority = 11 # Adjust to your priorities so this job runs the last, after all jobs of your backup window
Runscript {
Command = “/opt/bacula/scripts/make_catalog_backup bacula bacula”
RunsOnClient = no
RunsWhen = Before
}
Runscript {
Command = “/opt/bacula/scripts/delete_catalog_backup”
RunsOnClient = no
RunsWhen = After
}
Schedule = “NightAfterJoba”
Storage = “OnDisk”
WriteBootstrap = “/opt/bacula/working/catalog-backup.bsr” # very important, set this to be able to send it per mail afterwards
}

This way, in case of Disaster, you reinstall the same version of Bacula, then you just need to grab the last volume from the DisasterRecovery-pool and use bextract to extract all its content. You need to re-inject the catalog dump in your database, copy the conﬁg ﬁles in /opt/bacule/etc/ and everything is back online. ( please test with /opt/bacula/bin/bacula-dir -t -u bacula -g bacula )

### 6 SD preconﬁgured for growth

#### 6.1 Scalable Setup Running Multiple Parallel Jobs

Some industries, MSPs or environments may need to separate backed up data between end-customers or department in order to apply to each a diﬀerent retention period, or calculate the volume of the backed up data to be billed, or apply quota, etc… A physical separation with diﬀerent folders or “Archive” directive may be considered, but every addition of a new customer or department, or every subsequent deletion may lead to a conﬁguration modiﬁcation on the Storage Daemon, thus a need to restart it. As a reminder, conﬁguration changes on the Director can be loaded via the bconsole command “reload”, thus it does not need a restart of any daemon cancellng the running backup jobs.

#### 6.2 Example

A logical implementation in Pool resources on the Director side separating customers or departments is more ﬂexible than a separation through directories and Device resources on the SD side. The Pooli{} resource oﬀers a way to create such a logical separation per entity. Every entity may have one or many Pools with diﬀerent retention and diﬀerent volume names, deﬁned by the Directive “LabelFormat=VolumeLabel-“, for example.
The Storage Daemon capacity to handle all backup or restore jobs running in parallel must be well sized. As Device and Autochanger deﬁnition in the Storage Daemon can be used in a very ﬂexible way in handling volumes from many diﬀerent Pool resources, the capacity to run parallel backup and restore jobs should be at least doubled to plan for growth without the need to restart the Storage Daemon.

#### 6.3 Possible solutions

If it is intended to run 200 backup jobs in parallel every night, for a total of 100 customers or departments, with the need to run maximum ﬁve restore jobs at a time, the following could be considered.
Every Devices resource could be set to run a maximum of 1 job per volume to avoid data interleaving, so the Directive “Maximum Concurrent Job” would be set to “1”. It means 400 Device resources would have to be set to plan the growth in handling more jobs by adding news entities and to avoid any Storage Daemon restarts when adding entities. These 400 Device resources could be grouped by Autochanger resources, with 20 Device resources per Autochanger. Every Device grouped in an Autochanger should point to a single directory on the physical storage, and the load should be spread between diﬀerent physical storage or big infrastructure if needed.
Five additional Device resources should be added to every Autochanger resources for restore purpose and the “Read Only” Directive should be added to this Device resources.
On the Director conﬁguration side, considering for example that only Full and Incremental backup are oﬀered to entities, 200 Pools should be created for the 100 customers, one Pool for Full backup and a second Pool for Incremental backup. The “Label Media” should be set accordingly to the retention period, the backup level and the customer name.
The calculation of the backed up volume or if the quota of a customer is reached could be done by any bash script listing the volume per name and backup type. Such an implementation is very scalable. As soon as about 350 Devices are used, an alert could be triggered so for the next maintenance window of the Storage Daemon more Device and Autochanger resources could be added. In addition to this, if it is needed to delete customer backup data before the retention is over, there will be no need to change the conﬁgurations in the storage daemon and the deletion of the customer’s volumes would be easy to handle (every deletion with backup data should be done very carefully). It will be also necessary to remove the client, jobs and pools associated to this client, from the conﬁguration ﬁles and catalog database, but the Storage Daemon conﬁguration part could be reused by any other customer.

#### 6.4 Notes

◾ Having your clients backups associated to speciﬁc pools will allow you to have a more scalable solution.
◾ The storage device used by a pool/client backup can be used by another pool/client.
◾ The storage device that was used by a decommissioned pool/client can be used for the new backups of a new pool/client conﬁguration.

Previous section Next section