The missing Terraform VM

Igor Zalutski
6 min readDec 20, 2023

Terraform is unique among other languages in that in addition to the configuration language itself, there is also a thing called state that stores low-level settings of every resource described in code. This approach seems needlessly complex but it evolved for a reason. The first infrastructure-as-code tool, CloudFormation from AWS, did not have state, only code — and ironically that made working with it harder, not easier, because there was no way to tell whether the code matched the actual configuration of the cloud account.

One thing however is surprisingly similar with Terraform compared to other languages: it’s the concept of dependencies. Naturally, some pieces of your codebase might need other parts of it to work properly — that’s what dependencies are all about. And in Terraform, just like with compiled programming languages, say Java or C++, there also 2 kinds of dependencies. Static dependencies (or implicit, or compile-time) can be resolved at the time of writing the code (e.g. modules); whereas dynamic ones can be resolved only at runtime (or apply-time in Terraform case).

Disclaimer: this article is written by Digger, an open-source CI/CD orchestrator for Terraform.

Admittedly applying concepts of “real” programming languages to Terraform is a bit of a stretch. It is a configuration language after all — HCL is not too dissimilar from JSON at its core, designed to be used mostly in a declarative fashion. Yes you can have functions and other goodies; still, configuration language it is. And yet trying to imagine what the concepts of general-purpose programming languages could mean in Terraform world seems to be quite entertaining.

The 2 concepts I want to borrow from other languages for the purpose of this article are runtime and compile-time in the context of dependencies. Let’s assume for a minute that there is such thing as “terraform VM”; and perhaps that terraform code is compiled into some sort of a bytecode, like Java or C#. Actually it even makes sense in a way — the VM being the combination of the terraform provider and the target cloud provider that “executes” it. Plan artifact is the bytecode then.

Static dependencies: modules

In our “terraform VM” model that we just made up, modules resemble compile-time (static) dependencies of compiled programming languages. One source file requires another to produce a working build; and if something is wrong in that “dependency link”, it can be known at the time of writing the code — hence static.

Initially Terraform didn’t have native support for modules; so tools like Terragrunt came to rescue. In some sense Terragrunt resembles macros in the C language — it’s essentially just automated copy-pasting of the same piece of code to reduce duplication (Don’t Repeat Yourself aka DRY). Simple and powerful! But native support of modules at the language level is of course much better: modules can now be distributed independently, and the “compiler” (terraform CLI) can resolve the correct version using a centralised registry.

Runtime dependencies: inputs / outputs

The 2nd kind of dependencies in Terraform is less obvious, so let’s start with a practical example. It is a good practice to split your terraform codebase into smaller pieces, each with its own state, to reduce blast radius. This means that if something goes wrong with one piece, the rest keeps working. So it’s a standard practice to have a state file for “base layer” like networking, plus a piece of state for each logical part of the application. For example a service with a queue and a database that are required for it to work will have their own piece of state. And all of that multiplied by the number of environments (typically at least 3 — dev, staging, prod).

For the sake of simplicity, let’s imagine our terraform project has just 2 pieces of state — Base that defines the networking setup (VPC etc), and App that defines everything else. How does App know which VPC it should be deployed into?

Another good practice is to define parameters of each module as variables. This is effectively incapsulation — hiding implementation details from the consumer of the library. In programming languages interfaces and function signatures serve a similar purpose. So our App module will define a variable named vpc_id. Where does it come from?

It has to come from the Base part of our project. But it’s not known at the time of writing the code for Base. You first need to run terraform apply for Base, and only when the apply is successful, you’ll see the ID of a newly created VPC in the outputs (they are also stored in the state file).

This is runtime dependencies in our made-up model. One piece of state implicitly depends on another; but the dependency can only be resolved at runtime (or rather apply time). A bit like instantiating a class in Java using a DI (dependency injection) container that loads the right library based on configuration or runtime data.

Surely there is a way to pass those outputs from one state to another in Terraform? Turns out, the answer is no — at least not at the language level. The reason is probably that since state is deliberately separated from code, every piece of terraform code is only concerned with one piece of state, assuming it’s all there is. So you need something else (let’s call it “terraform VM”) to oversee the “links” of runtime dependencies between different pieces of state.

Terragrunt again helps here: you can use inputs with dependency.* syntax to take outputs from one state and use them as inputs in another. This is handy - you can now run terragrunt apply-all and it will take care of the order by first applying the Base project, then taking vpc_id from outputs and using it as inputs for App.

The missing runtime: “terraform VM”

Let’s stretch our parallel with programming languages even further: is there a pattern for runtime dependency resolution that we could borrow from other programming languages? One seems to be fitting pretty well: service locator, or run-time linker.

This is effectively what Terragrunt does with its dependency syntax. But it doesn’t do so explicitly; instead, the way Terragrunt does it makes the runtime dependencies look like they are compile-time, while in reality they are not.

Programming languages seem to have undergone similar evolutionary steps. First the need for dynamic dependencies arises; it is initally addressed by some kind of templating / generation, like macros in C. Then as codebases grow, every substantially large project becomes mostly-dynamic and it becomes harder and harder to debug. Everything depends on everything and it’s impossible to trace at the time of writing the code. So dependency management is “shifted left” (yes same exact principle as in devops best practice) — what can be a static dependency, should be static, in order to minimise confusion from runtime dependencies.

What seems to be missing in the terraform ecosystem is the set of constructs outside of scope of a single state that deal with “runtime” matters like dependencies, TFVars, and so on. Terraform Cloud and its alternatives offer variations of it as part of their cloud-based offerings; but that seems like an overkill. Why do I have to use a third-party cloud-based application for features that belong in the language itself? Why cannot I install something like “TDK” (parallel with JDK) and have the same sort of experience on my laptop? And then perhaps deploy “TFCat” (parallel with Tomcat) into my K8S cluster or EC2 instance to run it centrally?

Disclaimer: this article is written by Digger, an open-source CI/CD orchestrator for Terraform.

--

--