Home > Testing and monitoring

Testing and monitoring

1. Testing backup and restore

1.1 Verify your configuration

After you have created a new job, it is strongly recommended to test it. Run the backup and then restore its data as a test to see that everything is correctly configured and thus can be documented in your disaster recovery procedures.
Important information can be gathered during this test:
◾ How long does the job take?
◾ What is the load on the client, the network and the Storage Daemon?
◾ Does the job run successfully?
◾ Can the data be restored?

1.2 How to test

In order to test your backup job, from bconsole simply type:

run job=MyNewJob
and follow the on-screen options.
Once the job has finished successfully, type again into the console:
restore

1.3 Notes

• It is very important to run a test backup and restore to see the impact on the SD/network but also to ask for help if issues arise during such tests.
• You can also use a Verify Job to verify your backup

2. Config files, version control and test before reload/restart

2.1 Why ?

You will surely make a lot of modifications over time to your configuration files in /opt/bacula/etc, that’s why it is very good practice to:
◾ include this folder in your backup policy (see the chapter 3.5)
◾ and/or put this folder under a revision control system of your choice eg. CVS/SVN/GIT/Mercurial
This will ensure the integrity of your configuration, especially if there are several system administrators involved in the backup process

2.2 Configuration Backup Job

Here is a simple FileSet/Job definition to do the configuration backup

Fileset {
Name = “DisasterRecovery-fs” Include {
Options {
signature = MD5
}
File = /opt/bacula/working/bacula.sql # where the Bacula catalog dump goes File =
/opt/bacula/etc
# you can add other files like keys, content of /etc to make this FileSet
# more complete and adapted to your environment
}
}
Job {
Name = “DisasterRecovery-job” Type = “Backup”
Client = “baculaServer” # change to the name of the fd running on the Bacula DIR Fileset =
“DisasterRecovery-fs”
JobDefs = “Default-jd”
Level = “Full” # full backup is preferable Messages = “Standard”
Pool = “DisasterRecovery-pool” # the pool we just defined to hold all config and catalog dumps
Priority = 15 # Adjust to your priorities so this job runs last, after all jobs of your backup window Runscript {
Command = “/opt/bacula/scripts/make_catalog_backup bacula bacula” RunsOnClient = no
RunsWhen = Before
}
Runscript {
Command = “/opt/bacula/scripts/delete_catalog_backup” RunsOnClient = no
RunsWhen = After
}
Schedule = “NightAfterJobs” Storage = “OnDisk”
WriteBootstrap = “/opt/bacula/bsr/catalog-backup.bsr” # important, can be sent by email
}

2.3 Check for configuration errors

After each modification, always check the configuration to avoid issues when reloading/restarting the Bacula Director process:

# /opt/bacula/bin/bacula-dir -t -u bacula -g bacula
And correct errors that are displayed or contact Support.

2.4 Notes

The Director will not start if there are parsing errors in your configuration. The same will happen if you use reload in your bconsole while there are errors in your configuration.Please test your configuration and modifications. You might use the breload Bacula Enterprise command which will test your configuration, make an compressed archive and reload the director configuration as well. It is a convenient way to unsure you reload the configuration only if it is well formed.

3 Test Tape Autochanger/Drives

3.1 Testing Tape Autochanger for the best performance

Testing the tape autochanger is a very important task in every Bacula Enterprise setup and it should be done before running any production backup because it will:
• identify any connectivity, hardware or configuration issues preventing data to be backed up efficiently and safely
• assure the best performance of Bacula Enterprise working with the tested tape autochanger
• confirm the best settings so no other changes will be required that would make the overall Bacula Enterprise configuration more complex.

3.2 Testing a Single Device or Tape Autochanger

Tests should be performed with the btape utility to verify the Autochanger config- uration and if mt and mtx commands are running correctly. Preferably, the btape tests are done before going to production. Additionally, a special speed test will perform raw data and random data tests on your tape device with your current configuration to test the performance of your device.

3.3 Performing a Writing/Reading Device Check

Before running the btape tests, a working SD configuration to connect to the Tape Library must be functional. Please refer to the Bacula Enterprise Main manual in order to install this. The Storage Daemon connected to the Tape Library needs to be shutdown before running the btape test, so
no backup jobs will interfere. As an example, the first tape drive (LTO-drive0) will be used for this test.
Important Note: A blank and empty tape-media that has not been used before with other third part vendor or legacy backup tools must be used for these tests.
The commands used below are examples only, you will need to adapt them to your current settings and environment.

service bacula-sd stop
Then, the status of the tape library should be reviewed:
mtx -f /dev/tape/by-serial/changer-device status
The tape drive to test, for example the drive “0”, should be empty. If not, please unload it with mtx or your tape library interface. Then, a tape should be loaded in the Tape Drive (Drive “0”, index=0 in the Device{} resource in bacula-sd.conf or in the related BWeb ressource) that will be tested with a new tape media (eg. from slot 22 in the command below):
mtx -f /dev/tape/by-serial/changer load 22 0
Then, btape should be run:
/opt/bacula/bin/btape -v -c /opt/bacula/etc/bacula-sd.conf LTO-drive0
where “LTO-drive0” is the name of the tape drive to test in your SD configuration, at index=0 in this example. Finally, the “Autochanger Test” inside btape should be started.
If any error happens with this first test, it must be fixed. Once fixed, please continue with the next step performing a speed check (The file_size parameter is important in order to write a file bigger than the Maximum File Size defined in the Device{} resource).
speed file_size=20
Two directives can be fine tuned in a device configuration so it leads to a faster writing process to a tape drive.
Maximum File Size:
For LTO-5 tapes, a values between 8GiB and 12GiB is indicated and between 8GiB and 24GiB for LTO-6. The bigger the file is, the slower it will take to restore a single file as the tape will need to read the whole file before extracting a single file. The bigger the file is, the faster the writing is, as it needs to set an EOF less regularly.
Maximum Block Size:
The following values should be tested: 128K, 256K, 512K. The largest value should not be exceeded in order to avoid Kernel problems. For LTO-5 and LTO-6, 256K and 512K are usually appropriate.
Important Notes:

• If the Maximum Block Size settings changes with a lower value after medias have been written with production data, then your media will get incompatible and I/O-ERRORS will happen.
• Minimum Block Size should never be used, as it will just waste tape space. If you think you should use it, please contact us.
• Organize your tests through pairs of directives, for example Maximum File Size = 8GiB; Maximum Block Size = 131072, Maximum File Size = 8GiB; Maximum Block Size = 262144, etc. (8GiB-524288, 12GiB-131072, 12GiB-262144, 12GiB-524288, etc.)
• When the test plan is defined, both Directives should be changed accordingly in the Device{} Resource of the Storage Daemon configuration file for the chosen drive.
Once modified, please run the btape speed tests according the Maximum File Size of your selected pair. After each test with a pair you will need to restart the btape utility to load the configuration changes. Please find this example with a Maximum File Size given in GiB:

speed file_size=20

3.4 Notes

The file_size value should be bigger than the Maximum file Size directive. In order to find the “best pair settings” close to a production environment, it will be necessary to calculate the average throughput of the tests results “Test with zero data and Bacula block structure” and the following “Test with random data, should give the minimum throughput” from all three single tests.
Please find more details about the btape command in our Bacula Enterprise Utility programs in your download area or by contacting us.
If you have any questions, please write to Bacula Systems Support Team.

4 System Monitoring

4.1 Things to Monitor

As a central part of your system, Bacula services will report in your job logs issues such as your network, your disk space, the load of your database.
It is important to assure 2 parallel tasks to ensure correct backup operations:
• monitor space
• monitor jobs
• monitor your infrastructure
You need to monitor space because Bacula cannot check it for you, before the launch of several parallel jobs, whether there is enough room on tape/disks to store them. You must ensure the available space is adequate for storing jobs but also on the Director as a full filesystem will prohibit the backup from running correctly. For example, the spooling data is by default in /opt/bacula/working, the same for the spooled attributes. If the partition is full, these jobs will not succeed. You might want to use a tool like N. a. .g.io. .s in order to create alarms if the pool space or partition space for /opt is low. Also remember that filesystems filled above 80% are prone to have performance issues.
Job monitoring is helpful when the Backup Administrator needs to decide what to do if a job fails or if a warning occurs, as well as in sorting and prioritizing backup issues to resolve them in accordance with your environment automatically. Joblogs and Bacula log file can be parsed by a monitor tool software in order to re-run a job or take any action you may consider necessary (email/scripts/snapshots/reschedule the job) if an error or a warning occurs.

4.2 Example output

When you parse joblogs you can see:

Non-fatal FD errors: 0
SD Errors: 0
joberrors: 0
Non-fatal FD errors: 0 AND SD Errors: 0 Here everything went smoothly, no issues to report, probably no actions should be taken.
In the next case, you can see 1 Non-fatal FD errors and a “Backup OK – with warnings” Termination:
Non-fatal FD errors: 1
SD Errors: 0
FD termination status: OK
SD termination status: OK
Termination: Backup OK — with warnings
Human intervention is needed on this job.
It should help to have a look at the joblog (llist joblog jobid=). For the above job, we can see the cause of the warnings, for example:
time: 2016-12-16 12:24:24
logtext: debianserver-fd JobId 18: Could not stat “/home/bacula”: ERR=No such file or directory
Here we can see the /home/bacula directory specified in the job FileSet is not on the Client’s filesystem to be backed up, as requested by the backup Job’s FileSet.

4.3 Notes

Check regularly all your joblogs to find potential problems and in case of error, update your error parsing scripts/software to take them into account and take appropriate action(s). A job is correctly done when : Backup Termination is “Backup OK” and Non-fatal FD/SD errors are equal to 0.

Previous sectionNext section