Ad-Hoc YAML DSLs and Productivity

tl;dr Badly designed DSLs create unnecessary complexity and lead to more problems than they actually solve.

One of my frustrations with the DevOps and cloud infrastructure tools is that most of them are badly designed DSLs that eschew all features of modern programming languages. Things like modules, data structures, functions, imperative control flow constructs, debuggers, linters, standard versioning/deployment practices, and rich library ecosystems are all missing. Of course, it is hard to do any real work without these features so the folks using these tools at some point come to the same conclusion and re-invent non-standard analogs to get by. The re-invention usually ends up being some kind of templating system built with a real language. Two obvious examples I can think of are Ansible with its Jinja templating and Terraform with its own ad-hoc variable interpolation mechanism that I presume is built on top of Go’s templating features. Oh and I almost forgot Kubernetes and Helm.

The arguments the tool designers bring up for why they made yet another DSL are usually some variation of “YAML or FooBarLang is declarative and so it reduces complexity”. On the surface this seems to make sense because declarative solutions in theory reduce complexity by hiding more details but when you start actually trying to solve problems the shortcomings become obvious. When real world use cases are brought up along with the shortcomings of the tool to address them the response ends up being some variation of “You’re using the tool wrong”. Again, this kinda makes sense until you dig deeper and realize that it’s not really an answer. Tools must always be subordinate to human intentions. If a tool can not accomplish a goal or requires extraordinary workarounds then it’s not the user’s fault and the logical conclusion is that the tool is badly designed. If I want to write a loop and can’t for some reason then that’s a lack of foresight on the tool designer’s part and not my problem. There could be several valid reasons I’d want to use a loop (or recursion) but because DSLs are not really programming languages I don’t have any real recourse other than to figure out how to work around the limitation.

This state of affairs then leads to a whole lot of inefficiency and waste. People contort their thinking to fit the shortcomings of the tool and end up wasting the true potential of a programmable cloud. We can create computers with HTTP requests and coordinate and orchestrate them with similar ease but because the tools used to provision them are not programming languages all that flexibility is lost. The workarounds necessary to make the tools powerful enough to truly utilize the dynamic capabilities of the cloud end up being more overhead than most people are willing to put up with and so they never get around to it. Most then fall back on some patchwork of automation expressed with these DSLs and manual playbooks to provision and configure their systems. This then ends up wasting a whole lot of money and human effort.

I personally don’t think this state of affairs is acceptable. We have the means to do better and that is the main reason I started Cloudbootup. I plan to work on fixing the problem because internalizing the shortcomings of existing tools as virtues does not benefit any of us. The cloud is a programmable wonderland and we should work toward realizing that potential.