Tiered Storage for Appendable Workloads

While tinkering with a few things at work I stumbled on a combination of tools and ideas that make for a pretty decent tiered backup strategy for appendable workloads using a combination of hot, warm, and cold storage.

Assumptions and Setup

First I’ll explain the basic setup and then add a wrinkle into the mix with deletions because understanding the overall setup and approach makes the reconciliation process for deletions an obvious extension.

I’m going to assume there is some appendable data store and it has all the necessary indices for doing efficient range scans. We will leverage that for the backup and restore process by checkpointing a high watermark and avoiding having to scan up to the checkpoint. This lets us efficiently get slices of data from the store. More specifically this lets us efficiently grab the latest slice of data and store it nearby in warm storage for fast recovery. I’m going to denote the slices by S-0, S-1, …, S-N and the latest slice is going to be the one with largest index.

So our data store consists of S-0 + S-1 + ... + S-N and we assume there is a copy of all the slices somewhere in cold storage. The way we managed to do that is at each point in time we took the latest slice and put it in warm storage and then aged it out to cold storage, i.e. S-0 -> W: S-0 -> C: S-0. By repeating this process we incrementally got all the way to S-N being in warm storage. When we add another slice we will age out S-N to cold storage as well.

The reason for the cold and warm storage process is that we want to have fast recovery for the latest slice because the assumption is that the latest data is most relevant and we can tolerate lag in recovering the other slices from cold storage. Hopefully most of this is pretty obvious but now lets add a wrinkle into the mix by worrying about deletions.

Slight Wrinkle: Deletions

The wrinkle is that once we have archived a slice we don’t have indexed access to it and will need to restore it to warm storage to operate on it. This means if we archive a slice and then delete from that slice then our tiered storage is now technically inconsistent because when we restore that slice form backup we’ll have reverted the deletion. We want to avoid this inconsistency and there is a simple way to do it. We keep an index of deletions that also has fast range queries. When loading the data we use that index to skip any items in the slice that have been deleted. In theory we could even use an another offline process to synchronize the deletes in batches across the archived slices but this is more of business requirement and is not technically necessary for the backup and restore process.

Implementation Sketch: MySQL, Postgres, etc.

Assuming you have this kind of workload profile there is a simple way to retrofit this tiered storage and backup mechanism on top of it with a combination of range dumping and database triggers for deletions.

The range dumping is hopefully obvious. Assuming you are at position N there is some background process that periodically polls if things have changed and dumps the data from N up to N + k and then updates its watermark to be N + k. It’s also possible to dump every item and then do the range aggregation from N + 1 to N + k with another offline process but that’s an implementation detail and doesn’t materially affect the overall design. So this takes care of the warm storage component and once a slice is aged out it is moved to cold storage in the obvious way.

The deletions are handled with triggers and another process that does the centralized aggregation for indexing. We add a deletion trigger and whenever something is deleted we insert the ID for the deleted item into a deleted table and use the same range dumping idea to grab the deleted IDs and ship them to a central index.

With the above two pieces we have everything necessary to perform a restoration in case there is a catastrophic failure for hot storage. To restore the data we bring up a new database and use warm storage to populate it while querying the central index to make sure we don’t load in any data that has been deleted. Alongside the warm restoration we start pulling in data from cold storage and using the same process for restoring from warm storage to restore the data from cold storage.