Best 9 OpenShift ETCD Backup & Restore Solutions

Home > Backup and Recovery Blog > Best 9 OpenShift ETCD Backup & Restore Solutions

See why NASA, MIT, US Air Force, US Navy and Warner Bros. trust us to protect their data.

Updated 17th January 2025, Rob Morrison

Contents

Built-in OpenShift backup capabilities
OpenShift cluster backup and recovery
OpenShift application backup and recovery
Third-party OpenShift backup options
IBM Spectrum Protect Plus
CronJob
Kasten K10
Storware Backup and Recovery
Portworx Backup
Cloudcasa
Dell EMC PowerProtect Data Manager
Bacula Enterprise
Conclusion

RedHat OpenShift enables users to provision, instantiate, run, and manage a modular bundle comprising a computing platform and a number of applications, significantly reducing the complexity of building and maintaining the infrastructure that would likely be associated with developing and launching the applications and services needed.

OpenShift represents a specific approach to how virtual machines, containers and platform as a service environments can work together; the platform as a whole is often useful for both companies and individuals that are in a position to exploit them.

Since OpenShift consists of a family of software, it is important to mention its flagship – OpenShift Container Platform.

OpenShift Container Platform, or OSCP, is a hybrid cloud platform as a service that works with Linux containers managed by Kubernetes. More often than not it is referred to as “OpenShift”, even though there are multiple other products available within that same family – including OpenShift Online, OpenShift Dedicated, and others. For the sake of continuity, we are going to refer to the OpenShift Container Platform simply as “OpenShift” for the rest of this article, because the majority of the following services and solutions work with OSCP.

OpenShift is designed to be modular and flexible, which brings a number of advantages to various deployments that use OpenShift. However, this flexible and versatile approach has its own caveats. The first issue is that this approach, while it can be very effective, can also be prone to errors and failures. As such, a good backup system is a requirement for most, if not all, of the OpenShift deployments, especially when valuable, persistent data is being created and/or processed..

The need for a backup system creates another significant issue for OpenShift deployments – not all backup solutions are capable of adequately protecting OpenShift environments – and some are simply unable to do it at all.

As such, a backup and restore solution should be fully compatible with OpenShift in the first place. Typically, backup features such as backup automation, support for different backup locations and media, and data migration capabilities are an important addition to the existing OpenShift backup and recovery capability.

Built-in OpenShift backup capabilities

There are not so many comprehensive backup solutions on the market that are also capable of performing OpenShift backup and restore operations – but first, we have to look into how much RedHat itself can do. This built-in backup functionality is what we are going to expand upon first.

OpenShift cluster backup and recovery

Two possible pathways exist here – backup and recovery operations for clusters and those same operations for applications. Cluster backup operations rely in their entirety on etcd data – a key-value store of OSCP (OpenShift Container Platform) that holds information on all of the resource objects within a cluster.

A cluster administrator may have to shut down or restart clusters in specific situations, be it for maintenance purposes or just to save resources by running fewer clusters at once. The process of shutting down an OpenShift cluster is sometimes called “graceful shutdown”, and it includes shutting down the nodes of the cluster with a dedicated shutdown command, as well as shutting down all of the dependencies that are no longer used with the cluster being turned off.

There is also another process called “graceful restart”, which naturally is a direct opposite of a graceful shutdown since it is starting the cluster instead of stopping it. This particular process, however, can be a little more complicated than its opposite, with steps such as:

Turning on any previously disabled cluster dependencies;
Starting the cluster machine itself;
Verifying if the control plane nodes are ready;
Verifying if the worker nodes are ready;
Verifying if the cluster as a whole started properly or not.

If the cluster failed to start properly, ther only solution would be to use an etcd backup to restore the cluster’s previous state.

While it is true that the graceful shutdown should allow the cluster to be restarted with minimal effort, an etcd backup should be performed before each and every cluster shutdown event. Etcd backups play a important role in disaster recovery scenarios, and it would not be possible to restore a faulty OSCP without an etcd backup involved in the process.

The etcd backup process itself is fairly simple and includes three main steps – starting a debug session, changing your root directory to /host, and launching a script called “cluster-backup.sh” while also inputting the backup location.

The cluster restore process, on the other hand, is somewhat more complicated, mainly because cluster restores via etcd backup are considered a “last-resort option” that should normally only be used when nothing else works for recovering the cluster’s working state. It involves several steps, such as:

Selecting a control plane host that’s going to be performing restoration operations;
Setting up SSH connectivity with all of the control plane nodes, including the one that’s going to be used for recovery;
Manually copying the etcd backup to the one host that was chosen as the recovery one;
Accessing the host in question;
Running the restore script “cluster-restore.sh”;
Checking if the nodes are in the “ready” state;
Restarting the kubelet service for all of the control plane hosts, including the recovery one;
Approving pending CSRs, verifying if the single-member control plane is up and running;
Deleting and recreating all of the control plane machines, with the exclusion of the one chosen for recovery purposes.

This brings us to the second part of this long process – it begins with the same user logging in a separate terminal window with the “cluster-admin” role. The following operations are:

Forcing etcd redeployment;
Verifying if the nodes are up to date;
Forcing new rollouts to the control plane (this should reinstall Kubernetes API to all of the nodes since an internal load balancer is used to connect the kubelet to the API server);
And last, but not least, is verifying that all of the newly installed control plane hosts are working and joined the cluster.

It is easy to see how the restoration process for the etcd backup is rather long and difficult, which is one of the reasons why it is treated as a last-resort option.

Now that we have covered how container backup and recovery works, let’s look at the application backup and recovery process.

OpenShift application backup and recovery

The process of creating application backups in OpenShift can also be rather complicated, but we are not going to go into it especially deeply, because the main focus points of this article are etcd and clusters. However, we are still going examine the logic of it.

Backup and restore operations for applications within OpenShift can be done with the help of OADP, or OpenShift API for Data Protection. It uses Velero to perform both backup and restore tasks for either resources and/or internal images, while also being capable of working with persistent volumes via Restic or with snapshots.

There is a very specific list of storage variations that can be backed up with OADP – this list includes MS Azure, AWS, GCP, Multicloud Object Gateway, as well as several S3-compatible object storage variations (Minio, Noobaa, etc.). At the same time, snapshot backups can also be only performed for AWS, Azure, GCP, and CSI snapshot-enabled cloud storage (Ceph FS, Ceph RBD, etc.), since only those support the necessary native-level snapshot API.

The backup process itself involves creating a CR, or custom resource, for either Backup or Restore purposes. This option can be used to perform Restic backups, scheduled backups, and to set up backup/restore hooks to be performed either before or after the backup/restore process is complete.

Third-party OpenShift backup options

While the previously mentioned backup/recovery method is indeed possible, it does have potential issues associated with it, mainly because it lacks any additional features and may be somewhat complicated for a lot of users. This is why we are going to go over third-party OpenShift ETCD backup solutions, as well, starting with IBM Spectrum.

IBM Spectrum Protect Plus

ibm spectrum protect plus landing page

IBM Spectrum Protect Plus is a comprehensive data protection solution that offers a variety of features in the field of data security – including backup, recovery, retention, and replication for all kinds of target locations, be it databases, applications, virtual machines, SaaS workloads or even containers.

IBM Spectrum Protect is a fairly versatile solution, which is why it can also work with OpenShift systems. IBM’s solution is capable of protecting not only persistent volumes but also other OpenShift cluster-dependent resources. While it does have a set of requirements for OpenShift backups to work in the first place, the requirements themselves are not particularly strict and mostly include a single requirement for the different parts of the virtualization system to be up-to-date to a specific degree.

IBM Spectrum Protect allows its users to register OpenShift clusters manually, create backups of OpenShift container data and restore them from either a snapshot or a regular backup copy, and it can also restore specific namespace-scoped and cluster scoped resources or even override retention settings for specific backups or snapshots by expiring OpenShift backup job sessions.

CronJob

cronjob openshift backup solution landing page

Unlike the rest of this list, CronJob is not exactly a backup solution on its own. A CronJob is a task that can perform specific actions following a specific schedule in Unix-based systems. In this particular context, a user on GitHub uses CronJob to perform a series of operations that results in an OpenShift backup being created with the same method that we have mentioned earlier (running cluster-backup.sh).

This specific CronJob creates a pod that runs the aforementioned script to create the backup itself. It also copies the entire backup to a pre-configured PV and then expires the backup job itself to avoid future conflicts. It creates two separate files – one being the collection of the static pods as a whole with their private keys and certificates, and the other being the etcd snapshot. As such, this particular “method” can be dubbed as an OpenShift ETCD backup solution.

Kasten K10

kasten K10 landing page

Kasten K10 is a data management platform that is native to the cloud as a storage type, offering a variety of different options such as backup, recovery, application mobility and disaster recovery to Kubernetes applications, while also being capable of integrating with a variety of different database types, supports multiple cloud storage providers and can work with most Kubernetes distributions.

K10 can also perform OpenShift backup and recovery operations with relative simplicity. The biggest difference here would be the creation of a Secret – Kasten’s own version of a backup list that includes information about etcd pod labels, etcd cluster endpoint, and everything else that needs to be backed up. Other than that, their process is somewhat similar to how K10 usually performs backups – creating a Blueprint and using a Secret with a Blueprint to run the backup task. The restoration process, on the other hand, is similar to what we have discussed in the “built-in methods” section of this article.

Storware Backup and Recovery

storware landing page

Storware Backup and Recovery is a cloud-native data protection software that works extremely well when it comes to creating backups of persistent volumes or metadata attached to OpenShift pods. It has a status of a certified Red Hat OpenShift Operator, providing consistency in its data protection efforts, combined with features such as scheduling, backup policy modification, backup automation, and so on. It is a great OpenShift ETCD backup and restore solution that covers multiple different planes of OpenShift’s data – the aforementioned etcd, metadata, persistent volumes, and more.

Portworx Backup

portworx landing page

Portworx is a data services platform that is focused on developing solutions for Kubernetes users to help with running applications in containers without interruptions and data loss events. It offers easier access to application backups with consistency and multiple features, such as backup scheduling, RBAC, user management, and more. Portworx can also offer disaster recovery capabilities, CSI implementation, cluster-wide encryption, snapshots, as well as many other features for OpenShift clusters and applications.

Cloudcasa

cloudcasa landing page

Cloudcasa is a resilient and powerful backup service with great scalability and a user-friendly interface. It can offer multi-cloud data protection, multiple cyber-resiliency options and several different backup types within your OpenShift environments (Kubernetes resources, etcd backups and CSI snapshots). Cloudcasa can also perform several different levels of recovery operations, be it granular or cluster-level, which makes it a rather convenient and useful OpenShift ETCD backup and restore solution.

Dell EMC PowerProtect Data Manager

dell landing page

Dell EMC is a technological powerhouse that offers a variety of products and services for multiple different markets. Dell EMC’s PowerProtect Data Manager is a versatile data protection solution that supports multiple environment types and can work with a multitude of different storage locations. It also supports OpenShift environments, offering protection for its workloads, while also being a reliable and useful backup and restore solution. It integrates OpenShift’s environments into its own centralized interface, providing the capability to assign protection policies to namespaces, clusters and everything else that OpenShift has.

Bacula Enterprise

bacula enterprise landing page

Bacula Enterprise is a unique, highly secure enterprise backup solution that supports an especially broad range of environments and features via its “modules” system. One such module was created specifically for comprehensive OpenShift backup and restore operations, providing a variety of data protection features to OpenShift environments. This particular module offers features such as cluster state safeguarding, effective cluster resource redeployment, application recovery capabilities, targeted PV data restoration, configuration transfer for other operations, and so on. Not only Bacula Enterprise can be helpful with safeguarding your OpenShift data as a whole, it can also facilitate disaster recovery plans, offer additional security to your data, help with cluster migration tasks, offer environment replication capabilities, and a lot more. It is worth noting that its subscription-based licensing model and high scalability make it advantageous for medium large and large deployments.

Conclusion

OpenShift environments are extremely useful in specific areas and application areas, but it is also relatively new, meaning that safeguarding data in it can be somewhat problematic. Given that Kubernetes environments of any type are increasingly being deployed, backing up their data – especially their persistent data – is increasingly important. This article discusses different OpenShift backup solutions, including both built-in and third-party options that are available, and may be helpful to find a solution for a specific use case of the user.

About the author

Rob Morrison is the marketing director at Bacula Systems. He started his IT marketing career with Silicon Graphics in Switzerland, performing strongly in various marketing management roles for almost 10 years. In the next 10 years Rob also held various marketing management positions in JBoss, Red Hat and Pentaho ensuring market share growth for these well-known companies. He is a graduate of Plymouth University and holds an Honours Digital Media and Communications degree, and completed an Overseas Studies Program.