Can your Business tolerate data loss in OpenStack?
Outages will happen, data does get corrupted, and accidents do happen, where data is lost!
As more and more enterprises, are moving OpenStack projects from an evaluation phase into production, organizations are challenged about Backup & Recovery readiness.
An ideal Cloud application uses ephemeral storage and compute, and creates workloads on the fly from persistent (Object) store, performing computation and saving the results back to the persistent store. In this situation, you are fine without a Backup and Recovery solution.
But majority of the Enterprise applications were not written for the cloud; thus backup and recovery plays an important consideration for businesses to recover applications from data loss and data corruption scenarios.
So should I have a piece-meal approach or a comprehensive solution?
Piece-meal Backup strategy
As Sebastian Han talks in his blog, depending on what release of OpenStack you are using, OpenStack offers bits and pieces of APIs to implement a backup solution. However these APIs alone are not sufficient to implement your own backup solution. Each OpenStack deployment is unique as OpenStack itself offers multiple options to implement an OpenStack cloud. Users have a choice of hypervisors, storage subsystems, network vendors, projects like Ironic and OpenStack distributions, all influence how a backup solution should be implemented.
First one is the use of Block level (Cinder) backups. In the Kilo release, Cinder has incremental snapshots that provide a significant improvement for Cinder backups, but these backups are disruptive, where the application has to be taken offline to take a snapshot. On the other hand, Nova offers instance snapshots, but instance snapshots do not snapshot cinder volumes and Nova snapshots are uploaded to glance where as cinder snapshots are uploaded to Swift. Further, Nova snapshots are not incremental and this limits the efficiency of your backup solution. The other option is to use File level backups. This solution provides file level granularity but does require running an agent in each of the vm instances. Further, with respect to an application workload running on multiple virtual machines, File level backups become overly complex to manage.
Comprehensive Backup strategy
The founders of Trilio Data proposed Raksha project that calls for non-disruptive, application aware, tenant driven policy based Backups for Cloud workloads consisting of one or more virtual machines with integration to the Horizon dashboard.
Here the user has flexibility to use Nova and/or Cinder volumes and a choice to store incremental and full backups on either Swift or CEPH or NFS.
The other aspects that are worth considering are as follows:
- Do I have critical workloads that are using OpenStack Ironic to provision bare metal and do I need backups for these workloads? Best practices for a number of NoSQL databases, is to run these scaleout databases on bare metal instances.
- In most instances customers are using OpenStack Heat and/or Dev Ops tools to configure and manage complex cloud workload deployments. Does the Backup and Recovery solution be tied to these orchestration tools? For application workloads, where additional vm instances are added over time to a workload, resulting a change in the application topology. It is important for the backup policy to recognize this change, so the next scheduled snapshot could create a consistent backup based on this new change.
- Would I be using backups to bring up a staging or test environment with not only configuration but also production data to accelerate application release cycles?
- Can I restore bare metal snapshot to vm and vice versa?
- Is there a place for Backups in Disaster Recovery strategy, where you can use backups to restore to a remote site?
Let us know your thoughts, as you plan your Backup & Recovery strategy. If you happen to be at the OpenStack Summit in Vancouver then we can discuss the Trilio solution in person. Please reach out to us at firstname.lastname@example.org.