Cybernetics of software

Recently I was thinking about how to properly structure software systems to reduce the cognitive burden on the people using and managing those systems as much as possible. The more I dug into the idea the more it became obvious that this is already a solved problem but there isn’t enough mindshare around the solution. The proper solution already exists under the heading of hierarchical planning and process control. An example from a domain I’m familiar with to demonstrate: build systems, release engineering, and in general, infrastructure software.

The hierarchical nature is most obvious in infrastructure software because the layers of the hierarchy are just staring you in the face. To deploy software you need a platform to deploy it to. To have a platform you need a computer with an operating system running on it. To have a computer with an operating system you need storage devices and networks properly configured with a pre-packaged OS or else you need to PXE boot and bootstrap. There are some non-trivial feedback loops already visible even at this stage. How did the pre-packaged OS get to the storage device or the network? Well, with other computers. It’s turtles all the way down if you stare long enough but we need to cut things off at some point otherwise we will never get around the bootstrapping issue. So we will assume we successfully descended the chain and bootstrapped our way back up to a working computer with an operating system on it. We treat this assemblage of a computer with a bare minimum operating system as an atomic unit in the higher levels of the planning and control hierarchy. The higher levels assume this part of the platform is configured and working as intended and treat any low level details, e.g. PXE boot, as irrelevant. In other words, the higher levels only care if the atomic unit works or not and not how it got there or how it can be fixed if it is broken.

Now we are ready to deploy the software but to deploy the software we must build it. To build it we must find the source and gather together the dependencies. To gather the source and the relevant dependencies we need to know the revisions/versions of all the pieces. We recursively traverse these version/revision dependency trees until we have the entire dependency tree along with a plan for a build chain from the leaves of the tree up to our software at the root. After building everything we package things and ship the package over the network to a repository that contains all the necessary meta-data so that higher levels in the planning and control hierarchy can query the artifacts and treat the build process as an irrelevant detail.

So far we have the computer and the software in a position where we can combine them to get some desired outcome. In the good old days we’d be almost done at this point. To finish things we’d just move the packaged artifact to where it needs to be and invoke the installer but, alas, we no longer live in the good old days. We live in the age of distributed systems and having the software where we need it is not enough and we need another layer in the hierarchy.

To get to a working system we must orchestrate the rollout process in a consistent fashion. This layer is usually called the orchestration layer and it is a lot like a sequence of database migrations. We must look at the current state and compare it with our desired state and then gradually get the system to our desired state. This usually involves invoking the installer for the various pieces and then restarting various processes by sending them the proper signals. In Unix-land we accomplish this with signals like SIGUSR1, SIGINT, SIGTERM, etc. Notice again how there are some non-trivial connections between both the higher and lower levels. The software itself needs to understand some details of the operating system and how the higher level orchestration process will interact with it by associating relevant semantics to things like SIGUSR1. In a real production environment we’d also have several auxiliary systems with their own hierarchy of planning and control that would be in charge of watching this entire process to make sure things don’t deviate too far from homeostasis. Homeostasis is very domain specific and there isn’t anything general I can say about it other than most auxiliary systems are designed to mitigate large deviations from homeostasis.

Most software shops I’ve worked at have some crude approximation of this hierarchical planning and control process but there is almost no automation and all parts of the system are basically humans manually enacting algorithms. I think there is a better way to do all this that reifies the entire system as a hierarchical planning and control graph implemented in software and humans end up being another layer in the hierarchy instead of components in the lower levels.