ruby dsl tricks: reifying references

Ruby is great for writing DSLs because it has first class support for two of the most important ingredients of DSLs, contexts and code blocks. With the proper use of instance_eval the same block of code can be evaluated in various contexts to have different kinds of effects but most often what we want to do is evaluate the code block in the “freest” possible context to create an AST (abstract syntax tree). I’m almost certain there is a connection here with initial and terminal algebras in category theory but someone smarter than me will have to chase that analogy. Today I’m just going to demonstrate how to reify references so that we can support cyclic structures in our DSL.

Most Ruby DSLs have the following general form

Notice the two things I mentioned, contexts and code blocks. The first thing we do is evaluate a top level code block within an instance of Context and within that instance we have sub-contexts defined by the resource keyword. We give each resource a unique name so that we can have other resources refer back to those resources and extract properties from the referents. You might be wondering why not just use plain Ruby variables to extract properties. That would also work but you’d run into the issue of not being able to declare cyclic references like we’ve done above. Notice how resource(:b) is not defined when we set property_n to a property of resource(:b). By reifying references we get a form of lazy evaluation that allows us to create cyclic references like above.

Let’s now fill in some of the structure behind the scenes

Hopefully the basic pattern is pretty obvious. We create a context which defines certain words that we can use within that context and then we pass a block of code to be evaluated within that context. The newly defined context is then tracked in some collection and any nested sub-contexts get a pointer back to the parent context. Hopefuly the AST like structure is starting to become visible. Before I can define ref keyword I have to define the structure to hold the references

The way I’ve defined reference chains forces memoization and because references are so simple the memoization in turn forces a canonicalization. If the symbols in two reference chains match then by the above construction those chains are one and the same. This is handy because when resolving references we only have to do it once per chain. If we didn’t memoize then we could have 2 reference chains that refer to the same resource but we’d have to do the resolution twice or we’d need to do some kind of post-processing to canonicalize the references and compare each one to a set of reference chains we’d already resolved. Anyway, it’s a bunch of headache and it is better to do it this way. Let’s incorporate this now in our DSL

So now the top level context has the initial reference and all we have to do when trying to create a reference is get back to the root context and return the root reference so that the sub-contexts can extend it by chaining

All of this recently came up while I was working on a DSL to define and provision cloud resources. It took me a few iteration to settle on the above outlined structure. I don’t think there is anything novel here but hopefully it will be useful to someone else as well. I was certainly pretty happy with myself for coming up with it. Getting it to the point where the starting example runs without errors is left as an exercise for the reader.