Trilio is releasing a series of significant updates to TrilioVault this year, including cloud storage support and expanded platform support to include KVM-based infrastructure (including OpenStack-based private clouds and Red Hat Virtualization platforms).
Tucked into the Version 3.0 release was support for S3-compatible storage as backup target. As a matter of fact, our support for S3-compatible cloud storage is ground-breaking, and is part of our larger multi-cloud vision. Here’s why this is such a significant step.
How S3 Became the Defacto Standard in Cloud Storage
Amazon’s introduction of S3 storage in 2006 set cloud computing in motion. Since its release, Amazon S3 has become synonymous with cloud storage (or object storage) and is now the standard against which every implementation of cloud storage is measured. It led the revolution in cloud computing that forced computer vendors to scramble and respond with their own versions of cloud storage to compete in this new market. Though they are quite different, each cloud storage technology shares support for an S3-compatible API.
For modern data centers, cloud storage has become central to their overall storage strategy. Cloud storage is appealing to cloud architects for many reasons:
- WWW HTTP protocol to access objects. Current network architectures handle HTTP traffic very well, and therefore object store access does not require rearchitecting the networks. As a result, it’s easy to:
- Access object stores over internet. There are plethora of tools to manage web URLs, but at the minimum you need a browser or Linux tools such as curl or wget.
- Implement high-performance object storage using out-of-box Linux tools, such load balancers.
- Access the readily available mature object store implementations in open source community.
- Highly scalable in capacity and performance.
- Highly available and durable.
- Cost effective because cloud storage can be implemented using pool of standard white box servers.
- It’s secure.
Caching & Other Challenges with Cloud Storage
Despite all these advantages, cloud storage is not a file system. It loosely supports the write-once-read-many (WORM) data access model. These systems don’t support random access to objects, which makes modifying existing objects (especially large objects) unwieldy.
Consequently, organizations typically leverage cloud storage to house only static data. Some popular use cases include:
- Hosting static web sites
- Sharing documents
- Backup and archival
- Enabling multi-cloud infrastructure, particularly for enterprises
However, applications often need to use cloud storage, which creates an impedance mismatch on what type of access it provides verses what applications expect. Most enterprise applications, including legacy backup engines, are written for POSIX-based file systems, such as Linux VFS, or network file systems, such as NFS or CIFS. These applications cannot consume object storage directly.
To solve this, storage vendors started supporting NFS gateways that act as caching devices, which present file semantics to enterprise applications but allow files to persist as objects in object store.
These caching devices implement various caching techniques to improve accessibility by presenting objects as files to applications. However, these caching devices introduce the same inefficiencies that cloud storage set out to resolve in the first place including stateful protocols that won’t scale for large environments (like NFS/CIFS), and capacity limitations that restrict the amount of object storage that can be addressed.
These NFS gateways fill the short-term gap for applications, but simply aren’t a long-term, sustainable solution to the problem. Not to mention, you’re forced to buy custom hardware, contract license fees, and create dedicated floor space — all of which throw your ROI/TCO out the window, and you’re back to where you started with NFS. These appliances require a valid license to define proprietary file-to-object mapping so you can retrieve the files that are stored as objects in object store.
Consequently, the usefulness of object store is severely restricted, and many vendors use it solely as a dumping ground for data that is not needed on a daily basis, if they use it at all.
Trilio’s Approach to Cloud Storage
At Trilio, we believe in the efficiencies and scalability of the cloud and want to address the challenges of cloud storage head-on. We knew that, in order to unlock the true value of cloud storage for enterprises, we would have to address two important use cases with TrilioVault:
- Efficiently implement backup target for cloud storage without using caching devices or eclipsing the efficiencies that cloud storage brings to enterprises.
- Enable multi-cloud, enterprise-friendly access and usage of cloud storage. Once enterprise data is in cloud storage, it must be accessed and used for different use cases using resources in different clouds.
TrilioVault’s Natively Scalable Architecture
TrilioVault is a highly distributed and available backup engine that is well-suited for modern private and public clouds, and employs an innovative approach to supporting cloud storage as a backup target.
Part of this innovative approach is the TrilioVault architecture, which includes both a control plane and a data plane. The control plane is implemented as one or more VMs that mostly include a highly efficient orchestration layer. The data plane is spread among compute nodes as a simple/slave/stateless Python module called Data Movers.
Every compute node runs a Data Mover and is responsible for backup and recovery of VMs running on that compute node. Assuming that your cloud has implemented an efficient VM placement algorithm, the data plane mirrors the VM placement algorithm that you chose and scales with your cloud. Control plane can be scaled independently based on the number of VMs in the cloud and is independent of the amount of data that is backed up.
TrilioVault Solution for Supporting S3 Cloud Storage
Whether private or public clouds, clouds are meant to scale, and the backup solution you choose should not impose any constraints on your cloud architecture. That means that introducing NFS gateways into the mix simply isn’t an option. Trilio’s solution here is to use an object mount module to support cloud storage as a backup target, a first for a data protection solution.
Trilio’s object mount module is lightweight and yet, very efficient. It provides file-like access to objects without compromising scale and performance, and does not require intermediary hardware appliances, such as an NFS caching device.
The Trilio object mount is also stateless, which allows it to provide file-like access to objects from multiple compute nodes and easily scale with your cloud. It uses minimal resources on each compute node because it:
- Does not require any staging area
- Uses very limited physical memory
- Consumes very little CPU cycles on the compute nodes
Consequently, the object mount module has no limitations on the number of files or size of the file it serves: a 10 TB file is accessed as efficiently as a 10 byte file, and one million files are accessed as efficiently as 10 files.
Additionally, the Trilio object mount offers a full range of functionality including file-level restores, file search capabilities, and synthetic full backups — all of which require efficient random access to files stored in object store.
Cloud Storage Performance
Trilio object mount performance is comparable to native cloud storage API performance. Our initial tests show only 20% overhead when compared to native API calls using the Boto 3 library. Twenty percent seems like a fair trade-off given that our module mount:
- Is stateless
- Can be deployed on multiple compute nodes
- Uses very little resources on the compute node
- Does not require hardware such as an NFS gateway
The only thing that could prevent you from realizing the full potential of your cloud storage back-end (if S3 is hosted on a public cloud such as AWS) is your service provider. Last mile to the cloud!
Hybrid and Multi-Cloud Enablement
Gone are the days when enterprises were tied to a given platform or vendor for decades. Today, enterprises have dozens of options when it comes to public clouds and private clouds, and they must have the ability to change their underlying platform at will in order to meets the ever-changing needs of their business.
Central to hybrid cloud enablement is the need for vendor freedom, which means your critical business applications are not tied to any one platform and have the ability to be deployed anywhere. To achieve this goal:
- All business applications must be captured or described in platform-independent data format
- Applications persist in cloud storage and are accessible from anywhere
- Backup images are platform-neutral (like Trilio’s) and save in an open format that can be managed with out-of-box Linux tools
This new support for S3 as backup target is a critical step in helping our customers build and scale their multi-cloud infrastructure without monetary or resource inefficiencies.