You don’t think much about OpenStack backup and recovery when it comes to cloud. After all, you build your applications on ephemeral storage and compute. These resources are not expected to persist across reboots and power cycles and your application is built against these failures. You create your workloads on-the-fly from the persistent store such as object store, perform your computation, and save the results back to persistent store. Life is beautiful.

Unfortunately, only a small subsets of applications are written ground-up to fit the cloud paradigm, and you need an army of smart people to build your applications for cloud. However, the cloud paradigm is here to stay. Its elasticity, scalability, and self-service aspects are universally appealing to many IT managers and are actively looking to host their traditional IT applications on open source cloud platforms such as OpenStack. These applications need to be persistent across failures and OpenStack backup and recovery is an important strategy for their business continuance.

OpenStack backup and recovery hasn’t gotten much attention until recently. Just as with any service for the cloud, backup and recovery service must enable tenants to define data protection policies for their workloads. Likewise, IT managers are looking for a scalable solution that can grow with their cloud.

However, traditional solutions are built to manage 10s, perhaps 100s of applications. These solutions are centrally administrated and the backup administrator usually has intimate knowledge about the workloads he/she managing. Such solutions are not a natural fit for cloud and hence, there is a need to build a solution from ground-up that shares the same attributes as your cloud.

OpenStack has been gaining popularity as cloud of choice for IT managers who like to build their own on-prem cloud. It does support a few APIs in Nova and Cinder to backup VMs and storage to Swift, but they are short of providing a comprehensive OpenStack backup and recovery.

Consider a simple workload. In order to perform a regular backup of a workload using existing OpenStack APIs, one has to perform following steps:

  • Pause VM1 and VM2
  • Detach Storage Volume1 and Storage Volume2 from respective VMs
  • Snapshot VM1 and VM2 and store on Glance
  • Call Cinder Backup APIs to backup Storage Volume1 and Storage Volume2 to Swift
  • Keep track of these copies’ URIs in an excel sheet
  • Attach Volume1 and Volume2 back to VM1 and VM2
  • Resume VM1 and VM2
  • Repeat above steps needed

As you can see, this is not an effective OpenStack backup and recovery solution. When evaluating solutions, you should have the following OpenStack backup and recovery requirements :

  • Tenant-Administered Backup and Recovery: Just like any other service in the cloud, backup and recovery service must present easy-to-consume policies that tenant can choose and apply them to their workloads.
  • Non-Disruptive Backups: Backup processes must not disrupt running workloads. The backup process must be non-intrusive for running workloads with respect to availability and performance.
  • Instant Restore: Cloud workloads can be huge, and the recovery of a workload from the backup must be as quick as possible. Waiting for the entire dataset to be copied from backup media to production will severely impact the recovery SLA of the service.
  • Backup/Recovery of Single and Multi-VM Workloads: Cloud workloads can span multiple VMs and hence the backup process must have the ability to backup workloads that span multiple VMs.
  • Validate Backups: This is another feature that cloud backup can implement using on demand cloud resources. Backup processes must provide a means for tenant to quickly replay a workload from a backup media that tenant can periodically validate the backup sanity.
  • Efficient Data Transfers of Backup Images: Incremental backups and performing dedupe at the source significantly improves the backup process
  • Disaster Recovery: Backup service must include disaster recovery element, too. Cloud resources are highly available and periodically replicated to multiple geographical locations. So, replication backup media to multiple locations will enhance the backup process capability to restore a workload, even in the event of an outage at one geolocation.

To learn more about our approach to OpenStack backup and recovery, visit our product page.

Murali Balcha

Author Murali Balcha

Founder and CTO

More posts by Murali Balcha