Home > Backup and Recovery Strategies Blog > How to backup Kubernetes?
1 Star2 Stars3 Stars4 Stars5 Stars
(11 votes, average: 4.98 out of 5)
Loading...

How to backup Kubernetes?

  • February 8, 2021, Rob Morrison

While an assumption about Kubernetes previously being used mainly by DevOps teams may have been somewhat correct, many companies are now actively deploying containers in operational environments. They are also increasingly choosing container-centric approaches over traditional VMs. This is due to the various advantages of flexibility, performance and cost that containers can often provide. However, as containers move into the operations side of the IT environment, there is increasing concern about the security aspects of containers in a mission-critical environment, including their persistent data in the context of backup and restore processes.

Originally, the overwhelming majority of containerized apps were stateless, allowing them to have a much easier deployment process on a public cloud. But that changed in time, with a lot more stateful applications being deployed in containers than before. This change is why backup and recovery in Kubernetes is now an important topic for a lot of organizations.

Important features of a competent Kubernetes backup solution

The dynamic nature of Kubernetes environments makes it harder for more traditional backup systems and techniques to work well in the context of Kubernetes nodes and applications. Both RPO and RTO may need to be far more strict, since applications need to constantly be up and running, or especially critical, and so on.

This leads us to discern three different features that are highly recommended for every enterprise in general, and a clear necessity when it comes to best practice  Kubernetes backups:

  • Disaster recovery;
  • Backup and restore;
  • Local high availability.

In a Kubernetes environment., the context of these three aspects of backup can slightly change from their normal definition:

Local high availability as a feature is more about failure prevention/protection from within a specific data center or across availability zones (if we’re talking about the cloud, for example). A “local” failure is the one that occurs in the infrastructure/node/app used to run the application. In a perfect scenario, your Kubernetes backup solution should be able to react to this failure by keeping the app working, essentially meaning no downtime to the end user. One of the most common examples of a local failure is a stuck cloud volume that happens after a node failure.

In this perspective, local high availability as a feature can be considered a foundation of the overall data protection system. For one, to perform such a task, your solution needs to offer some sort of a data replication system locally and it also has to be in the data path in the first place. It is important to mention that providing local availability via backup restoration is still considered backup and restore and not local high availability, due to the overall recovery time.

Backup and restore is another important part of a Kubernetes backup system. In most use cases it backs up the entire application offsite from a local Kubernetes cluster. The context of Kubernetes also brings up another important consideration – if the backup software “understands” what is included in a Kubernetes app, such as:

  • App configuration;
  • Kubernetes resources;
  • Data

A correct Kubernetes backup needs to save all of the parts above as a single unit for it to be useful in the Kubernetes system after restoring it. Targeting specific VMs, servers or disks does not mean that a software “understands” Kubernetes apps. Ideally, a Kubernetes software should be able to back up specific applications, specific groups of applications, as well as the entire Kubernetes namespace. That’s not to say that it is completely different from the regular backup process – Kubernetes backups can also benefit greatly from some of the regular features of a usual backup, including retention, scheduling, encryption, tiering, and so on.

Disaster Recovery (DR) capability is likely essential to any organization using Kubernetes in a mission critical situation, just as it does in employing any other technology. First, DR needs to “understand” the context of Kubernetes backups, just like backup and restore. It can also have different levels of both RTO and RPO and different levels of protection according to these levels. For example, there could be a strict Zero RPO requirement that implies strictly zero downtime, or there could be a 15 minute RPO, with somewhat less strict requirements. It’s also not uncommon for different apps to have completely different RTO and RPO within the same database.

Another important distinction of a Kubernetes-specific disaster recovery system is that it should also be able to work with metadata to some extent (labels, app replicas, etc.). An inability to provide this feature could easily lead to a disjointed recovery in general, as well as general data loss or an additional downtime.

Data types that need to be backed up in Kubernetes

As any complex system, Kubernetes and Docker have a number of specific data types that they’ll need to rebuild the entire database properly in case of a disaster. To make it easier, it’s possible to split all of the data and config file types in two different categories: configuration and persistent data.

Configuration (and desired-state information) includes:

  • Kubernetes etcd database
  • Docker files
  • Images from Docker files

Persistent data (changed or created by containers themselves) are:

  • Databases
  • Persistent volumes

Kubernetes etcd database

It is an integral part of the system containing the information about cluster states. It can be backed up either manually or automatically, depending on your backup solution. The manual method is via etcdctl snapshot save db command, which creates a single file with the name snapshot.db.

Another method of doing the same thing is via triggering that same command right before creating a backup of the directory that this file would appear in. This is one of the ways of integrating this specific backup into the entire environment.

Docker files

Since Docker containers themselves are run from images, these images have to be based on something – and those are, in turn, created from Docker files. For a correct Docker configuration it is recommended to use a sort of a repository as a version-control system for the entirety of your Docker files (GitHub, for example). For the sake of easier pulling of earlier versions, all Docker files should be stored in one specific repository that allows users to pull older versions of those files if necessary.

Additional repository is also recommended for the YAML files that are associated with all of the Kubernetes deployments, those exist in the form of text files. Backing up these repositories is also a must, using either the third-party tools or the built-in capabilities of something like GitHub.

It’s important to mention that you can still spawn the Docker files to be backed up, even if you’re running containers from images without their Docker files. There’s a specific command that is docker image history, which allows you to create a Docker file from your current image. There are also several third-party tools that can do the same, as well.

Images from Docker files

Docker images themselves also should be backed up in a repository. Both the private repository and the public one can be used for that exact purpose. Various cloud providers tend to provide private repositories to their clients, too. If you’re missing the image that your container runs from, a specific command that is docker commit should be able to create you that image.

Databases

Integrity is also crucial when dealing with databases that containers use to store their data. In some cases, shutting down the container in question before creating a backup of the data is possible, but then again, the downtime required is likely to result in a lot of problems for the company in question.

Another method of doing database backups inside of containers is via connecting to the database engine itself. A bind mount should be used beforehand to attach a volume that could be backed up in the first place, and then you can use mysqldump command (or similar) to create a backup. The backup file in question should also be backed up using your backup system afterwards.

Persistent volumes

It’s fair to say that there are multiple different methods for containers to gain access to a persistent storage of sorts. If it’s about traditional docker volumes – those reside in a directory that is below the Docker configuration. Bind mounts, on the other hand, could be any directory that is mounted inside of a container. Despite the fact that traditional volumes are more preferable in the Docker community, both of those are relatively the same when it comes to backing up data. A third way of doing the same operation is via mounting an NFS directory or a single object as a volume inside of a container.

All three of these methods have the same problem when it comes to backing up data – the consistency of a backup is not complete if the data inside of a container changes mid-backup. Of course, it’s always possible to gain consistency via shutting off the volume before backing up, but in most cases downtime for such systems is basically out of the question for the sake of business continuity.

There are some ways of backing up data within containers that are method-specific. For example, traditional docker volumes could be mounted to another container that would not be changing any of the data until the backup process is complete. Or if you’re using a bind-mounted volume, it’s possible to create a tar image of an entire volume and then back up the image.

Unfortunately, all of those options are really difficult to pull off when it comes to Kubernetes. For that exact reason it’s always recommended to store stateful information in the database and outside the container filesystem.

That being said, if you’re using a bind-mounted directory or an NFS-mounted file system as a persistent storage, it’s also possible to back up that data using the regular methods, like a snapshot. This should get you much more consistency than the traditional file-level backup of the same volume.

Kubernetes backup solution market

In the context of these three important factors/features, let’s look at a few more examples of a Kubernetes backup and recovery solution.  The examples we use here are Kasten, Portworx, Cohesity, OpenEBS and Rancher Longhorn.

Kasten K10 (recently acquired by Veeam) is a backup and restore solution that also takes pride in its mobility and disaster recovery systems. The backup process with Kasten is simplified thanks to its ability to automatically discover applications, as well as many other features, such as data encryption, role-based access control and a user-friendly interface. At the same time, it can work with many different data services, such as MySQL, PostgreSQL, MongoDB, Cassandra, AWS, and so on.

Local high availability is not available with it since Kasten does not directly support replication within a single cluster and relies on the underlying data storage systems instead.  Disaster recovery is also only partially “there” since Kasten can’t achieve zero RPO case cases due to the lack of a data path component. Also noteworthy is the fact that Kasten’s backups are asynchronous only, which is typically an additional downtime between operations.

kasten landing page

Portworx PX-Backup is a data management company that develops a cloud-based storage platform to manage and access the database for Kubernetes projects. It is another example of a data management solution and despite its limitations as such, one of the key benefits of using Portworx is the high availability of data.

Backup and recovery operations, Kubernetes apps understanding, local high availability, disaster recovery, among other features – all of that makes Portworx a good solution for kubernetes backup - if you’re looking for one that specializes in Kubernetes-related tasks.

Another significant part of PX-Backup is its scalability, allowing for on-demand backups / scheduled backups of hundreds of applications at once. The solution also supports multi-database setups and can restore apps directly to the cloud services, such as Amazon, Google, Microsoft, etc.

portworx landing page

Cohesity is a relatively popular competitor in the field of general backup and recovery, but their Kubernetes-related capabilities still have some room to grow. First of all, Kubernetes is a relatively new addition for them, and they’ve added the “understanding” for Kubernetes apps from the get-go, but at the same time it only works for all of the applications within the same namespace, and you can’t protect specific apps within that one namespace.

On the other hand, there are also rapid recovery capabilities, app-tier incremental backups based on policies, data state consolidation and many other capabilities.

cohesity landing page

OpenEBS is another example of a solution that has managed to achieve some results with only one of the three features from our list, and in this case it’s all about Local high availability.

At the same time, OpenEBS can also integrate with Velero, creating a combined Kubernetes solution that excels in Kubernetes data migration. OpenEBS on its own can only backup individual applications (a direct opposite of what Cohesity does). There are also features such as multi-cloud storage, its open-source nature and a gigantic list of supported Kubernetes platforms, from AWS and Digital Ocean to Minikube, Packet, Vagrant, GCP, and more.

However this may not cover a users’ needs, since some users might need those namespace backups in specific use cases.

openebs landing page

Rancher Longhorn is the last of our examples, and this one is probably the least known out of them all. Its community is relatively small for an open-source solution, and it does not allow for complete Kubernetes backups with metadata and resources to make app-aware recovery happen. However, there is one unique feature about it that stands out, and it’s called DR Volume. DR Volume can be set up as a both source and a destination, making the volume active in a new cluster that’s based on the latest backed up data.

Rancher’s capabilities to work with many different container platform types and allow for different backup methods are what differs it from the rest, and there’s already an ability to support Kubernetes Engine, Docker deployments and K3 distributions. Docker containers, for example, have to create a tarball that could act as a backup for Rancher.

rancher landing page

As is clear in this blog, the topic of Kubernetes is still relatively new and the market is still trying to catch up to the full list of features that any Kubernetes-based system demands from the get-go. The entire nature of Kubernetes makes apps into a very different animal from what they were before, and this brings us to the current list of solutions that excel in one thing and struggle to catch up in the other.

Clearly, Kubernetes is a rapidly growing technology area, so it’s safe to say that there will soon be more solutions coming along, with the current ones likely  becoming even better than they are right now. One example of a new, powerful  Kubernetes solution is represented in Bacula Enterprise.

The Kubernetes backup solution from Bacula Enterprise

The very nature of Kubernetes environments makes them at once very dynamic and potentially complex. Backing up a Kubernetes cluster should not add unnecessarily to complexity. And of course it is usually important - if not critical - for System Administrators and other IT personnel to have centralized control over the complete backup and recovery system of the entire organization, including any Kubernetes environments. In this way, factors such as compliance, manageability, speed, efficiency and business continuity become much more realistic. At the same time, the agile approach of development teams should not be thus compromised in any way.

Bacula Enterprise is unique in this space because it is a comprehensive enterprise  solution for complete IT environments (not just Kubernetes)  that also offers natively integrated Kubernetes backup and recovery, including multiple clusters, whether the applications or data reside outside or inside a specific cluster. Every company’s Operations Department recognizes the need to have a proper recovery strategy when it comes to cluster recovery, upgrades and other situations. A cluster that is in unrecoverable state can be reverted back to the stable state with Bacula if both the configuration files and the persistent volumes of the cluster were backed up correctly beforehand.

Another way of showing Bacula’s working methods is by using the picture below:

bacula enterprise kubernetes module schematic

One of the prime advantages of Bacula’s Kubernetes module is the ability to backup various Kubernetes resources, including:

  • Pods;
  • Services;
  • Deployments;
  • Persistent volumes.

Features of Bacula Enterprise’s Kubernetes module

The way this module works is that the solution itself is not a part of the Kubernetes environment, but instead accesses the relevant data inside the cluster via Bacula pods that are attached to single Kubernetes nodes in a cluster. The deployment of these pods is automatic and it works on a “as needed” basis.

Some other features that the Kubernetes backup module provides also includes are:

  • Kubernetes backup and restore for persistent volumes;
  • Restoration of a single Kubernetes configuration resource;
  • The ability to restore configuration files and/or data from persistent volumes to the local directory;
  • The ability to backup resource configuration of Kubernetes clusters.

It’s also worth noting that Bacula readily supports multiple cloud storage platforms simultaneously, including the likes of AWS, Google, Glacier, Oracle Cloud and Azure, at the level of native integration. Hybrid cloud capabilities are thus built in, including advanced cloud management and automated cloud caching features, allowing for an easy integration of either public or private cloud services to support various tasks.

Solution flexibility is particularly important nowadays, with a lot of companies and enterprises becoming ever more complex in terms of different hypervisor families and containers. At the same time, this significantly raises the demand for vendor flexibility for all of the database vendors. Bacula’s capabilities in this regard are substantially high, combining its broad compatibility list with various technologies to reach especially high flexibility standards without locking in to one vendor.

The ever-increasing complexity of different aspects of any organization’s job is always rising, and it’s more often than not easier and more cost-efficient to use one solution for the entire IT environment, and not several solutions at once. Bacula is designed to do exactly this, and is also able to provide both a traditional web-based  interface for your configuration needs, as well as the classic command line type of control. These two interfaces can even be used simultaneously.

Bacula’s Kubernetes backup plugin allows for two main target types for restore operations:

  • Restore to a local directory;
  • Restore to cluster.

Regular and/or automated backups are highly recommended to ensure the best possible backup and recovery environment for containers. Testing your backups from time to time should be mandatory for your System Administrator, as well. In the next picture, you’ll see a part of the restoration process, namely the Restore Selection part, in which you can choose what files and/or directories you want to restore:

restore selection area

Another part of the restoration process that you’ll encounter is the advanced restore options page, which looks like this:

advanced restore options

Here you can specify multiple different options, such as output format, KBS config file path, endpoint port, and more.

It’s also easy to watch over the entire restoration process after the customization is complete, thanks to the restore job log page writing every action one by one:

restore log

Another important capability of Kubernetes module is the Plugin Listing feature, offering plenty of useful information about your available Kubernetes resources, including namespaces, persistent volumes, and so on. To do that, the module is using a special .ls command with a specific plugin=<plugin> parameter.

Bacula’s Kubernetes module offers a variety of features, some of which are:

  • Fast and efficient cluster resource redeployment;
  • Kubernetes cluster state safeguarding;
  • Saving configurations to be used in other operations;
  • Keeping amended configurations as secure as possible and restoring the exact same state as before.

Although this happens often, it is heavily recommended to avoid paying your vendor based on data volume. It makes no sense to be held to ransom now or in the future by a provider that is ready to take advantage of your organization in this way. Instead, take a close look at Bacula Systems’ licensing models, which removes its customers from exposure to data growth charges, while making it far easier for customers procurement departments to forecast future costs, too. This more reasonable approach from Bacula comes from its open source roots and resonates well in a DevOps environment.

Velero & Bacula Enterprise: What’s the difference?

That’s not to say there are no other solutions on the market, both premium and free-of-charge. For example, Velero.

Velero (previously called Heptio Ark) is a free open-source backup and restore solution that mainly focuses on working with Kubernetes clusters / persistent volumes. It has the ability to work with a number of different cloud platforms via specific plugins, and you can choose if you want to run it on premises or within the public cloud platform of your choosing.

The main three target fields of Velero’s capabilities are:

  • Production cluster replication for the purpose of testing or development;
  • General backup and restore capabilities for Kubernetes clusters;
  • Cluster migration feature.

The idea of how Velero works is all about two main parts – a server working within your cluster and a local client represented by a command line for your operating needs. It’s also quite unique in the way it works with Kubernetes clusters, as well.

The way it works is that the Kubernetes API is used to capture the specific state of clusters and perform the restoration process when necessary. This is different from what the majority of other solutions do – they access Kubernetes etcd databases directly and interact with the data in question through that (Bacula Pods is one such example). The advantages of doing everything via API are as follows:

  • Even if the resources that are exposed via API are stored in a separate database - they can still be quickly and efficiently backed up and/or restored;
  • Backups can be somewhat selective, capturing specific subsets of the resources of a cluster, filtered by resource type, namespace, etc., this provides that much more flexibility in regards of the data that you want to backup;
  • It’s not a rare occurrence for users of managed Kubernetes offerings to have no access to the underlying etcd database, making direct backups and restores basically impossible and forcing to use various workarounds.

When it comes to direct comparison between Velero and Bacula, then it’s safe to say that each has its own advantages and benefits.

Bacula is much more comprehensive in terms of being a broad, enterprise backup and recovery solution, and offers an especially wide range of features and technologies that you would expect from a heavy-lifting, enterprise-grade solution. Therefore, Bacula offers a complete single-platform backup solution for medium large and large enterprises. Bacula also has ‘BWeb’; a comprehensive web interface to the many features that it provides. Bacula is probably the solution an IT Director would choose when he or she needs to backup complex, changing IT environments using a single, modern platform.

Velero on the other hand, is specific in a sense that it doesn’t try to cover every aspect of backing up all applications, data and storage types, but instead focuses only on working with Kubernetes. Some users might find that more attractive rather than an all-in-one solution. Then there’s also the unique approach that Velero takes to work with data and backups – via API. And the last, but definitely not the least – it’s free and open source. Despite all of the advantages that Bacula has, it is designed to be a high-end solution for medium and large enterprises, and that, of course, is not representative of all users of Kubernetes.

About the author

Rob Morrison Rob on LinkedIn

Rob Morrison is the marketing director at Bacula Systems. He started his IT marketing career with Silicon Graphics in Switzerland, performing strongly in various marketing management roles for almost 10 years. In the next 10 years Rob also held various marketing management positions in JBoss, Red Hat and Pentaho ensuring market share growth for these well-known companies. He is a graduate of Plymouth University and holds an Honours Digital Media and Communications degree, and completed an Overseas Studies Program.

Leave a comment

Your email address will not be published. Required fields are marked *