Some of the attendees who stopped by our booth asked us whether their scale out databases needs to be backed up, so I reflected on this conundrum for couple of days. Yes, these databases keep multiple copies of the data so why do they need backup? I know I have a good answer, but how do I answer in the terms that the need for backup will be self-explanatory? Does the built-in replication of these products alone ensure the business continuity in all failure scenarios?   To answer this question, let’s look back at the evolution of computer systems.

  1. Before the advent of disk RAID, it was single disk systems with no redundancy whatsoever. RAID offered data protection against disk failures by maintaining redundant data. Since mid 1980s, RAID has been the industry standard against disk failures. But having additional copies of the data in the disk controller did not alleviate the need to have regular backups of your applications.
  2. To improve the redundancy at system level, clustering technology ensured the application availability in case of system failures. However clustering technology did not alleviate the need to have regular backups.
  3. To protect against site failures, storage vendors implemented synchronous and asynchronous replications. People still perform backups.

So how do these features stack up against scale out databases that we all started to love so dearly lately?

Features Traditional IT Scale out databases/File Systems
Parity Based Protection RAID-5/6 Erasure Coding
Clusters Operating System Based Clustering Clustering functionality built into the application
Site wide replication Synchronous replication of storage Replication between racks in a datacenter
Caveat: Not all databases understand the data center topology. For example Cassandra supports different snitches that enables replication at various levels
Geo Replication Asynchronous replication to two or more geographical locations Datacenter wide replication

This new generation of scale out databases and file systems provide the same level of availability that of traditional systems with the fraction of the cost. However backup and snapshotting technologies provide a point-in-time copy for different set of use cases. To start with, backup and snapshots provide protection against data corruption and unauthorized modification of data.

However, snapshots also enable other use cases such as forensics, business intelligence, etc.  So do you still need a backup? The answer is: it depends on your IT risk assessment and the regulatory requirements. If you are already doing backups, you may as well continue to do backups for these new applications too.

Murali Balcha

Author Murali Balcha

Founder and CTO

More posts by Murali Balcha