Home > Backup and Recovery Blog > Ceph Backup and Restore Strategy

Ceph Backup and Restore Strategy

1 Star2 Stars3 Stars4 Stars5 Stars
(12 votes, average: 4.90 out of 5)
Loading...
Updated 11th June 2024, Rob Morrison

What is Ceph?

Ceph is a software-defined storage platform distributed free and open-source. It provides file, block, and object storage using its distributed cluster foundation as the infrastructure. Ceph does not have to rely on a third-party file system due to its own storage backend called “BlueStore” being capable of managing both SSDs and HDDs directly.

Even though the original prototype of Ceph was created as far back as 2005 (as a summer project, no less!), it was not as popular during most of its development years as it is now, despite its overall impressive feature set. Ceph does not require specific hardware to operate and can replicate data with fault tolerance, offering powerful data availability using a combination of snapshots, replication, erasure coding, cloning, and other capabilities.

Some of the biggest modern-world examples of Ceph infrastructures include a massive cloud provider, DigitalOcean, a French cloud computing company, OVH, and CERN – an intergovernmental research organization famous for its particle physics laboratory, which is considered the largest one on the planet.

Why is it necessary to back up Ceph clusters

One of the most significant advantages of Ceph’s infrastructure is its self-managing and self-healing nature, which, when configured correctly, can significantly reduce overall management costs for both small and large deployments.

Some users might think that this kind of redundancy is good enough to stop using traditional data security methods, such as backup and recovery systems and operations. Unfortunately, that is not the case with Ceph, and there are several different types of situations and use cases that its redundancy cannot cover:

  • Disaster recovery is an easy example of what Ceph cannot work around. The majority of disasters affect entire storage clusters, making the entire infrastructure compromised and unable to be restored.
  • Modern ransomware types have evolved multiple times by now, offering the capability to seek and affect backup copies and entire storage systems to prevent Ceph’s redundancy framework from working as a whole.
  • Not all of the data corruption events can be avoided with Ceph, either, since it is not that uncommon for corruption to spread through entire systems and infrastructures, affecting all of the redundancy copies.
  • Human errors might be one of the most significant categories of issues that most security measures are not capable of covering in their entirety. The very nature of Ceph’s replication system means that practically any change in the original data is replicated in all of its redundancy copies – including accidental alteration or deletion of important information.

As such, various security measures, such as backups, are still needed, even in an environment as redundancy-heavy as Ceph.

How backup tasks are performed in Ceph

Ceph uses a block storage component called RBD (RADOS Block Device) to operate both workloads and information across Ceph clusters. Due to the very specific nature of Ceph’s infrastructure, there are at least two different ways to create a backup or a copy of a cluster:

  • Direct virtual disk attachment using a proxy VM.
  • RBD volume exporting directly from Ceph’s infrastructure.

Both approaches have their fair share of disadvantages, especially when it comes to creating non-standard backup types (such as incremental backup).

The main problem here is Ceph’s inability to perform changed block tracking without going over the entire block device. However, Ceph does offer its own workaround for this – the ability to extract the so-called “snapshot difference” that is generated by creating one manual and one automatic snapshot at different time periods and comparing the two block-by-block.

This kind of approach can be a bit more problematic than traditional incremental backup methods, but it would be a necessity for larger environments that cannot afford to only use Full backups for data storage capacity reasons.

At the same time, both of these use cases only work with a very specific set of circumstances – including a virtualization layer and a software that explicitly supports Ceph infrastructure. Luckily, these are not the only backup methods for Ceph as a whole.

Some backup solutions are capable of protecting RBD volumes on a very basic level, the same way as other applications or VMs are protected. This particular approach also works for Ceph clusters, although the list of software that can perform such a feat is still surprisingly small.

Backup software with Ceph support

As we have mentioned before, some solutions can protect RBD volumes without the virtualization layer, similar to how regular apps or VMs are protected.

Storware

Storware is one such solution, with its vProtect Storage Provider capability offering policy-based data protection for practically any volume type necessary. It can support Ceph’s volumes, as well as multiple other Kubernetes/OpenStack environments. Storware supports direct RBD exporting and direct disk attachment via a hypervisor as two of its main approaches to backups, leaving the choice in the user’s hands.

The overall software also has plenty of other use cases and capabilities. Storware is a VM-centric backup solution with support for many other storage mediums; it is a relatively user-friendly platform that excels at managing snapshots and can provide a reasonable amount of customization for its users.

Customer ratings:

  • G24.7/5 score based on 28 customer reviews

Advantages:

  • A simple and convenient user interface with a pleasant user experience.
  • The software itself is agentless, greatly simplifying the overall deployment process.
  • Speaking of the deployment process, configuring Storware for the first time is also not a difficult task for most users.

Shortcomings:

  • While the software itself is very customizable, some of its areas, such as the notification system, are surprisingly rigid in that regard.
  • Setting up Storware is not a complicated process, but all of the credit for it goes to the third-party learning materials on the Internet.
  • Retention policy configuration can be a hassle for multiple reasons.

Pricing (at the time of writing):

  • The pricing information that Storware provides on its public website is somewhat limited.
  • There are at least two different pricing plans to choose from:
    • Freemium, which is a free version of the solution that is very limited in its capabilities – it can only work with up to 1 TB of data, up to 10 VMs, up to 10 apps, and so on. On the bright side, it does feature every single one of Storware’s capabilities and features.
    • Free Trial, a timed version of the solution that is available for just 60 days; it includes everything Storware has to offer with no hardware or software limitations whatsoever (aside from the time limit).
  • Everything else in regard to the solution’s pricing can only be acquired after requesting a personalized quote.

My personal opinion on Storware:

Storware is originally a VM-centered backup and recovery solution with plenty of capabilities in this regard, but its Ceph compatibility is the highlight here. The software can extract snapshot differences directly from Ceph’s API for better performance. It also offers plenty of basic backup-oriented features by default, including backup automation, policy-based backup protection, different backup types, and more. The ability to manage RBD volumes outside of the virtualization platform also significantly helps Storware manage complex infrastructures comprised of multiple data storage types.

Trilio

Trilio is a good alternative to the previous software. It offers the capability to pull RBD snapshots from Ceph’s infrastructure, generate the “snapshot difference” for incremental backups, and convert this information into a more convenient and versatile file format that can be protected with other capabilities of Trilio. It uses its own TrilioVault solution to offer business continuity and data protection in a single package.

Trilio’s primary audience is cloud-native environments and their users, with Kubernetes and OpenStack being the most valuable elements of this focus group. Not only can Trilio offer backup and recovery capabilities, but also data resiliency and data security. Trilio can perform disaster recovery, assist with evidence-based compliance, and multiple other capabilities.

Advantages:

  • Trilio can work with multiple different APIs that can simplify the process of creating and restoring backups for container and cloud applications (Nova API and Cinder are its most prominent examples).
  • The software can greatly reduce the manual effort necessary to perform backup and recovery tasks by offering policy-driven backup capabilities and customizable automation features.
  • It can also capture entire applications and containers at once, ensuring that dependencies and other connections are not lost during or after the backup-recovery process.

Shortcomings:

  • Trilio’s focus on containerized and cloud-native environments makes it somewhat useless for traditional data storage environment configurations, which can significantly limit the solution’s potential reach and overall usefulness.
  • Technologies such as OpenStack develop very frequently, forcing software such as Trilio to adapt and evolve to new changes as quickly as possible – something that not all users will be comfortable with.
  • While Trilio does offer a certain level of data protection capabilities, protecting cloud-native environments as a whole can be very challenging, which is why Trilio’s capabilities are not particularly rich or detailed.

Pricing (at the time of writing):

  • Trilio does not offer any official pricing information on its public website. The only way to obtain such information is to request a personalized quote from the company.

My personal opinion on Trilio:

Trilio is a rather helpful solution with a very specific set of use cases – namely, assisting cloud-native and containerized environments with data protection efforts. Trilio can seamlessly integrate with Ceph’s distributed storage capabilities, offering a plethora of features for data integrity, minimal downtime, maximum performance, and so on. The software itself is relatively user-friendly and intuitive, which makes it very easy to deploy and work with, even for less experienced users. Ceph is not the only infrastructure type that can benefit from Trilio’s integration, either, since TrilioVault supports all OpenStack environment types.

Bacula Enterprise

Another potential solution for backing up Ceph-based cloud storage provider data specifically is Bacula, with its Bacula Cloud Driver that supports most of the technologies that Ceph and S3 are comprised of.

The addition of cloud storage to the infrastructure allows Bacula to perform many of its backup and recovery-oriented capabilities with impressive customization, high security levels, and a massive number of features to work with. Ceph volumes themselves are backed up using a combination of RBD exporting capabilities and BaculaLat’s bpipe feature, which allows for creating backups of data volumes via snapshots (it also supports differential and incremental backup types).

Customer ratings:

  • TrustRadius9.8/10 points and 54 customer reviews
  • G24.7/5 points and 56 customer reviews

Advantages:

  • Outstanding customer support with 24/7 availability.
  • Multiple available integrations and compatible platforms, including disk, tape, cloud, VM, applications, databases, containers, etc.
  • Impressive performance for both backup and recovery operations.

Shortcomings:

  • A modular licensing system uses a basic subscription-based model with an additional price tag for some specific modules, which can be unsatisfactory for some users.
  • The total number of features can result in a moderate learning curve, even while using Bacula’s BWeb GUI instead of a command line interface.
  • Bacula’s update cycle is relatively short. Plenty of updates are rolled out regularly, and many of them are mandatory to improve the system’s security and stability.

Pricing (at the time of writing):

  • There is no official pricing information for Bacula Enterprise on the official website.
  • However, the general pricing model is explained there, and it can be a bit overwhelming at first:
    • BSBE (Bacula Small Business Edition): 2 contracts, 20 agents, includes BWeb and web support.
    • Standard Tier: 2 contracts, 50 agents, no more than 4 days of waiting for customer support queries.
    • Bronze Tier: 2 contracts, 200 agents, customer support answer deadlines from 6 hours to 4 days, phone support.
    • Silver Tier: 3 contracts, 500 agents, customer support answer deadlines from 4 hours to 2 days, a deduplication plugin.
    • Gold Tier: 5 contracts, 2000 agents, customer support answer deadlines from 1 hour to 2 days.
    • Platinum Tier: 5 contracts, 5000 agents, the shortest possible waiting time for customer support, PostgreSQL catalog support, free Administrator courses (1 training seat).

My personal opinion on Bacula Enterprise:

Bacula Enterprise is an exciting backup and recovery software with a myriad of different capabilities and features. Not only can it work with the majority of existing storage types and infrastructure variations, but it can even work with some of the less common infrastructures, such as Ceph and OpenStack. It is a versatile and especially flexible, customizable solution with an extraordinary feature set. There is even a free version of it called Bacula Community, which offers a lot of the features for free but can be somewhat more challenging to work with and master. The sheer number of features can be a bit overwhelming, but the overall package is extremely competent and well worth the time and effort.

Backy2

Alternatively, there are also relatively small backup solutions with explicit Ceph support. Backy2 is one such example, offering open-source backup software with a block-based infrastructure that can work with practically any block-based device, including Ceph RBD volumes. It is a rather case-specific but powerful backup solution with data compression, deduplication, and encryption, making it both secure and space-efficient.

Backy2 can perform its backups on a block level, which significantly reduces the time per backup while also decreasing storage space requirements. It can be integrated with multiple different infrastructures and can assist with backup management to a certain degree by providing data integrity checking capabilities, backup scheduling, and more. Unfortunately, it is also extremely case-specific, which makes finding any information or feedback about it a very challenging process.

cephbackup

Other Ceph backup projects also exist, such as this simple tool from GitHub that can perform both full and incremental backups using the same process we have gone over multiple times now. However, it is worth noting that this particular tool has not been updated for over seven years now, which puts both its compatibility and its overall feasibility in question.

A recommended approach to Ceph data security

The topic of large-scale RBD backups has been prevalent for multiple years now without any simple solution to it. However, there are plenty of ideas that could be found in recent years, such as this panel from 2018 that was shown by an OVH employee – one of the biggest Ceph environment users by far.

The panel discusses some basic information about Ceph and its capabilities, as well as scalable backups that no third-party backup solution could offer at the time (even their current backup solution, Duplicity).

The person presenting the panel also discussed all the challenges that backup scalability brings in the context of Ceph, such as file size limitations and image size limitations.

The solution to all the issues that existing setup spawned turned out to be surprisingly simple. The idea was to use Ceph’s cloning capabilities to create an identical copy of the system at a given moment that could act as the first line of defense to replace the original cluster if something went wrong. Ceph’s own performance allowed for this particular method to be extremely fast, faster than any backup method OVH used before.

However, that is not all. While Ceph’s cloning capabilities are fast, they are not that powerful when it comes to security and protection measures. As such, it was still recommended to use existing third-party backup tools as the “second line of defense”, so to speak.

This approach combines Ceph’s cloning speed with the additional security that third-party backup software adds – even if it does significantly reduce performance.

Conclusion

Ceph is an impressive storage framework that supports extreme scalability for practically any storage volume in a distributed environment. It uses its own advanced redundancy features to offer a high level of protection against many different data-oriented issues. However, it is far from perfect, which is why backup and recovery solutions are still necessary alongside Ceph’s clusters.

While there are several different solutions that support Ceph clusters to a certain degree, most of them have their fair share of disadvantages that make them somewhat incomplete when used as the sole backup solution. However, combining Ceph’s own replication capabilities with third-party backup features does offer a respectable level of protection with multiple security layers, which seems to be about the best Ceph can offer right now.

Frequently Asked Questions

Is Ceph’s self-healing infrastructure enough of a security measure on its own?

While Ceph does offer a highly redundant infrastructure as one of its main capabilities, it is far from enough to cover all possible attack vectors and disaster variations, from human factor to sophisticated ransomware threats.

What is the main approach of Ceph when it comes to both internal and third-party backup operations?

Most of Ceph’s backup and replication capabilities are performed using either a direct disk mount via proxy VM or an RBD volume exporting feature. There is also the ability to connect to RBD volumes directly when there is no virtualization layer used in the backup process, which is something that some backup solutions can capitalize on.

About the author
Rob Morrison
Rob Morrison is the marketing director at Bacula Systems. He started his IT marketing career with Silicon Graphics in Switzerland, performing strongly in various marketing management roles for almost 10 years. In the next 10 years Rob also held various marketing management positions in JBoss, Red Hat and Pentaho ensuring market share growth for these well-known companies. He is a graduate of Plymouth University and holds an Honours Digital Media and Communications degree, and completed an Overseas Studies Program.
Leave a comment

Your email address will not be published. Required fields are marked *