Caching w/ strace

There was that one time I used strace and a Ruby script to bypass a really long step in a build pipeline. The idea from a high level was pretty simple: run the build process with strace and see which files were read from and written to. After “profiling” the build with strace the inputs were hashed and used as a key for bundling the outputs.

The core of the script was a utility class and some convenience methods for computing hashes by shelling out to find, tar, and shasum

The above was used in conjunction with some other files in various folders that were evaluated at runtime to compute the aggregate hashes of the inputs

After computing the aggregate hash it was used to generate a tar file of the outputs that used the aggregate hash as a key

The resulting cache file was then moved around and shared across hosts so that whoever else needed the outputs could just compute the aggregate hash and then download and unpack the file. The end result was the build times for that step were reduced from 5-30 minutes to less than 10 seconds on average and given how many times throughout the day we ran that step it was quite a bit of savings.

You might be wondering why the build process wasn’t “fixed” to do the caching on its own but that’s a story for another time.