Companies around the world are still scrambling to comply with the new GDPR regulations, and the question has inevitably arisen regarding how backups fit into the EU’s new rule set.
For those who need a refresh: the GDPR was established to help enhance EU citizens’ ability to control what data companies may hold about them. Many organizations have taken steps to become GDPR compliant and have put core processes and procedures into place to attain, maintain, and prove GDPR compliance. However, almost equally as many companies have not even started. In fact, most organizations do not yet see themselves as being fully compliant.
A hallmark of GDPR lies in its establishment of the right to request that individual data be erased by any organization that holds it, so the obvious problem becomes: how can an organization that rightfully relies on backups to ensure the integrity and security of its data also ensure that these backups don’t hold data on individuals who have requested to be “erased”?
Challenges with the GDPR and Backups
The so-called “Right to Be Forgotten” or the “Right to Erasure” is a compliance hurdle on its own, requiring deletion of all personal data upon request. It becomes significantly more complex when layered against years of backups, particularly for organizations with manual tracking and tagging mechanisms. GDPR regulations ultimately mean that organizations must be able to identify and purge an individual’s data on each and every backup, so the challenges to compliance are significant.
Tagging Personal Data in Backups
Identifying personal data in backups is not simple or easy; it will require tagging and/or scripting a large volume of historical snapshots. This is further complicated since:
- Backups are usually in a proprietary format.
- Systems are often fragmented. Is everything within an organization getting backed up to the same place and in the same format? If not, tagging and scripting will need to be built out for each system in place.
- There are a great number of organizations still using scripting, and then tracking backups manually in spreadsheets. The task of identifying individual data here is almost insurmountable.
- Backups are not always easily searchable, and may require that data be restored before it can be deleted. Moving backups back to a production environment for the purpose of identifying and erasing data will present a new set of challenges.
- Because of the sheer volume of backups, it would be nearly impossible to comb through to find every reference to an individual. Many data protection companies are tackling this via tagging or searching functionality, but that only scratches the surface of the effort required here.
The administrative burden associated with erasing individual data from backups is also significant:
- Policies and procedures will need to be built to ensure proper tracking of where changes to backups have been made and what they are.
- The burden of creating proof for the data subject and regulators must also be considered. Central reporting will be required to assure internal and external stakeholders that the right data has been identified and erased from all backups.
- From a larger organizational compliance standpoint, most companies are required to retain backups for a set time period. Which requirement supersedes the other — an individual’s right under GDPR, or an organizational requirement that backups be held for their mandated time period?
An Ongoing Debate: When Does the Right to Erasure Apply?
Partially because of the complex nature of backups and GDPR compliance — and partially because the whole point of backups is to retain a historical record — there’s an ongoing debate within the data protection community about the degree to which the Right to Erasure applies to historical data. ESG’s Christophe Bertrand highlights the antagonistic nature of these two requirements, saying, “The regulation is asking to backup and protect the data at the same time it is asking, in certain circumstances, to forget the data. So you have a bit of a contradiction.”
There are essentially two schools of thought on how to tackle this contradiction:
- Organizations must ensure that data be identified and wholly erased where requested, including on backup systems
- Backup data exists in a manner that make it non-sensitive and non-relational to any one individual, and are thus exempt from GDPR regulations.
If we look at backups in their simplest form, they generally capture points in time. They provide snapshots of changes since the previous backup was performed, and thus hold only a fragment of any one individual’s data. This ultimately means that there is no real ability to relate these small, incremental changes to a larger story (i.e. to an individual who has requested that their data be erased).
The other side of the argument, artfully presented by Andy Barratt of Coalfire, states quite simply that GDPR regulations don’t pertain to backups because the privacy issues addressed through GDPR only matter when data is on a production system. This means that organizations only need to undertake the measure of identifying and erasing an individual’s data when a backup is restored and used. This, of course, implies that a number of processes and precautions must be undertaken before, during, and after a backup recovery in order to maintain compliance. That introduces a great deal of risk, depending on your organization’s ability to consistently and reliably follow erasure procedures after each and every backup.
The Bottom Line
Regardless of your perspective, the GDPR is likely to have a lasting impact on your backup strategy, and your organization’s ability to seamlessly return to a working point-in-time when needed.
Of course, the ideal solution is to build these considerations into your backup strategy from the beginning. Unless your organization is very new and has access to data privacy experts from the outset, it’s unlikely that you’ll have the luxury of streamlining data erasure as it applies to your backups. For most of us, the only path forward is to modify our backup strategy and create processes that enable our organizations to respond to these requests and nimbly and effectively as possible.