ARM Templates vs Terraform vs Pulumi - Infrastructure as Code in 2021

26 January 2021 •13 min

A few years ago Pulumi introduced code-native programming language for Infrastructure as Code (IaC), bringing it closer to the developer and their existing skillset. Fast-forward to 2021 and Microsoft and HashiCorp are playing catch-up to Pulumi and to each other. To help you choose IaC technology, let’s look at IaC programming languages for short-term developer happiness and code re-use for long-term productivity.

Just want a summary? Watch the Ask Me Anything (AMA) style answer

Features Comparison Table

Although I have created a feature comparison table below, I discuss many of the features, but not all of them. This should be a good springboard to help you learn more about each technology.

Feature	ARM	Terraform	Pulumi
Language	JSON + Bicep	HCL/DSL	Code Native, e.g. JavaScript, Python
Languages (in preview)	Bicep DSL	CDK for Terraform, Python and TypeScript Support	-
Clouds	Azure-only	Agnostic + on-prem	Agnostic + on-prem
Preview Changes	`az deployment … what-if`	`terraform plan`	`pulumi preview`
Rollback Changes	Rollback	Revert code & Re-deploy	Revert code & Re-deploy
Infrastructure Clean Up	No	`terraform destroy`	`pulumi destroy`
Deployment History	Deployment History	SCM + Auditing*	SCM + Auditing*
Code Re-Use	Hosted JSON URIs	Modules + Registry*	Code-Native Packages, e.g. npm or pip
State Files	No State File	Plain-text	Encrypted

* refers to a premium feature from vendor, i.e. Terraform Cloud or Pulumi Enterprise.

Instead I want to focus on optimizing your choice for developer happiness, which is strongly tied with productivity. People choose human friendly Domain Specific Languages (DSL) and Code-Native languages because if they can code faster and deploy more often, they are more productive - and thus more happy.

So let’s do a comparison from these 2 perspectives

Happiness Today - how quickly can I as an engineer work with each technology’s flavor of Infrastructure as Code?
Happiness Tomorrow - as my application and company grows, how easily can I scale my IaC with re-usable components?

ARM Templates

As a Microsoft engineer, I should point out the major reasons to use Azure Resource Manager (ARM) before I elaborate on why I personally don’t use it:

First Party Support
Because ARM is Azure exclusive, all Azure resources are supported, from the simple resource group to complicated policies and blueprints. And your deployments are most likely to work out of the box as expected.
No state file required
ARM Templates queries the APIs directly for current state. So you do not have to worry about securing this state file like with other IaC technologies.
Deployment Histories included
Deployment history is included out of the box. While you have IaC with your _intended changes_in your git history, Azure can tell you the actual deployed changes.

ARM Improvements in 2021

The following were gaps in ARM that existed before 2020 and the major reasons I never properly learned it. But Microsoft has caught on to the competitors and are filling the following gaps:

Detect Drift with what-if
Last year Microsoft implemented the what-if command, which is the equivalent of terraform plan, which lets you preview infrastructure changes before you deploy. This lets your preview if destructive changes might happen.
JSON is for machines
If I want to author infrastructure, I don’t think in JSON, which is why it feels so unnatural. See below for more details, including new DSL Bicep.

ARM’s Biggest Pain Point - JSON

The main reason I don’t use ARM is because I don’t like writing JSON. When I write code I often use comments and the /* */ syntax in ARM feels like a cheat. To illustrate, this is an example ARM Template for an Azure Storage Account:

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "storageAccountName": {
      "type": "string"
    },
    "containerName": {
      "type": "string",
      "defaultValue": "logs"
    },
    "location": {
      "type": "string",
      "defaultValue": "[resourceGroup().location]"
    }
  },
  "functions": [],
  "resources": [
    {
      "type": "Microsoft.Storage/storageAccounts",
      "apiVersion": "2019-06-01",
      "name": "[parameters('storageAccountName')]",
      "location": "[parameters('location')]",
      "sku": {
        "name": "Standard_LRS",
        "tier": "Standard"
      },
      "kind": "StorageV2",
      "properties": {
        "accessTier": "Hot"
      }
    },
    {
      "type": "Microsoft.Storage/storageAccounts/blobServices/containers",
      "apiVersion": "2019-06-01",
      "name": "[format('{0}/default/{1}', parameters('storageAccountName'), parameters('containerName'))]",
      "dependsOn": [
        "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]"
      ]
    }
  ]
}

I’ve been at Microsoft for over 1.5 years and I still can’t write ARM templates. The reality is I will probably skip ARM and instead learn to write Bicep.

ARM Bicep DSL

Bicep is a Domain Specific Language or DSL, which compiles to standard ARM template JSON. Looking at this example from the GitHub project repo, you may see similarities to Terraform’s HashiCorp Language DSL:

// Bicep 💪
param storageAccountName string
param containerName string = 'logs'
param location string = resourceGroup().location

resource sa 'Microsoft.Storage/storageAccounts@2019-06-01' = {
  name: storageAccountName
  location: location
  sku: {
    name: 'Standard_LRS'
    tier: 'Standard'
  }
  kind: 'StorageV2'
  properties: {
    accessTier: 'Hot'
  }
}

resource container 'Microsoft.Storage/storageAccounts/blobServices/containers@2019-06-01' = {
  name: '${sa.name}/default/${containerName}'
}

Although I personally would prefer storageaccount over sa, I am overall quite excited about Bicep.

ARM & Bicep Summary - Promising Future

If we can get a DSL like Terraform but also get first party support for Azure features sooner, that could be an IaC game changer for Azure-only workloads. Azure has also filled the preview gap with the az deployment… what-if command which was really missing.

The code re-use strategy with modules is still very experimental. See this discussion about sharing references across modules. This is a last major gap for me personally before I would consider using Bicep in production.

Everything is still experimental but very promising.

Terraform

Terraform is my favorite IaC technology and what I personally use because it’s so human-friendly, cloud-agnostic and solid. These are the major features of Terraform:

HashiCorp Language - Human Friendly DSL
Reading and writing the HCL flows naturally and is a joy to use. More details below.
Cloud Agnostic
Although the cloud vendor providers are rather specific, mastering Terraform helps you master IaC for any cloud.
Preview Infrastructure Changes
Run terraform plan and check you don’t accidentally blow up your infrastructure. Also use the -detailed-exitcode flag, so you can adjust your CI/CD builds based on whether or not configuration drift was detected..
Clean Up Infrastructure
Run terraform destroy and easily remove any infrastructure, great for clean up after an experiment, or for starting over if something breaks beyond repair. This works because Terraform keeps a record of your infrastructure in a state file.
Code Re-use with Modules
This is so easy that it’s fun to write modules. The DSL is easy to understand and I can have local and hosted modules, either in git or a Terraform Registry (public or private). This is the deciding factor and most important Terraform advantage over its competitors. See details in last section of this article.

HashiCorp Language - Terraform’s DSL

Ok let’s look at the main reason I chose Terraform - for HashiCorp Language (HCL), the human friendly DSL. This is an example from the Terraform Documentation:

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_storage_account" "example" {
  name                     = "storageaccountname"
  resource_group_name      = azurerm_resource_group.example.name
  location                 = azurerm_resource_group.example.location
  account_tier             = "Standard"
  account_replication_type = "GRS"

  tags = {
    environment = "staging"
  }
}

It’s like reading English. I LOVE it.

(Dis)advantages vs ARM

These are the most common arguments I hear against Terraform when compared to ARM:

Not every Azure Resource exists outside ARM
There isn’t a Terraform Provider for every ARM type. Or even if there is, e.g. Azure Policy, you’re still just writing ARM JSON inside another language.
State File in plain text 🧐
If you create resources with credentials, e.g. a database or create service principals, these secrets are stored in plain text in your Terraform state file.

State files as plain text scares many people. Personally I am less concerned and accept this trade-off because I have confidence in my code quality, CI/CD governance, and security e.g. I use short lived tokens and scoped permissions.

If your security team cannot live with this, then delete the state file after the resources are created. No file, no problem 🤷‍♀️ Some tasks, like creating scoped service principals at scale are so much easier with Terraform because it can talk to both the ARM and the Azure Active Directory API. Create the credentials, immediately throw them in Key Vault and delete the state file afterwards. I’m pragmatic.

Terraform in TypeScript and Python - New since 2020

In July 2020 HashiCorp introduced Cloud Development Kit (CDK) for Terraform, which lets you write IaC in code native languages like TypeScript and Python.

This is an example from their GitHub repo:

import { Construct } from 'constructs';
import { App, TerraformStack } from 'cdktf';
import { AzurermProvider, VirtualNetwork } from './.gen/providers/azurerm'

class MyStack extends TerraformStack {
  constructor(scope: Construct, name: string) {
    super(scope, name);

    new AzurermProvider(this, 'AzureRm', {
      features: [{}]
    })

    new VirtualNetwork(this, 'TfVnet', {
      location: 'uksouth',
      addressSpace: ['10.0.0.0/24'],
      name: 'TerraformVNet',
      resourceGroupName: '<YOUR_RESOURCE_GROUP_NAME>'
    })
  }
}

const app = new App();
new MyStack(app, 'typescript-az');
app.synth();

Because it’s TypeScript, it’s very familiar to JavaScript engineers like myself.

But I personally prefer HashiCorp Language (HCL) because it is meant for humans. As a human it is much easier for me to read and scan. It’s like HCL speaks to me, meeting me halfway. Even though I know JavaScript, I still have to read the code entirely.

That is my personal preference. Maybe JavaScript speaks more to you 🤓

Pulumi

And finally we have Pulumi, the new kid on the IaC block who introduced the concept of code-native IaC. Pulumi’s largest value proposition is that engineers don’t have to learn a new programming language.

And looking at this Pulumi example from their documentation, it looks much cleaner than the CDK for Terraform:

import * as pulumi from "@pulumi/pulumi";
import * as azure from "@pulumi/azure";

const exampleResourceGroup = new azure.core.ResourceGroup("exampleResourceGroup", {location: "West Europe"});
const exampleAccount = new azure.storage.Account("exampleAccount", {
    resourceGroupName: exampleResourceGroup.name,
    location: exampleResourceGroup.location,
    accountTier: "Standard",
    accountReplicationType: "GRS",
    tags: {
        environment: "staging",
    },
});

It probably looks cleaner because it’s been around longer and Pulumi has had ample time to fine tune its abstraction to make it as close to a friendly DSL as possible. And this kind of friendly abstraction layers is an art form. So kudos to Pulumi for achieving this 👌

Encrypted State File

Like Terraform, Pulumi also uses a state file to keep track of your infrastructure, which helps it do configuration drift detection and clean up resources.

Unlike Terraform, however, Pulumi’s state file is encrypted which is more secure.

Give Pulumi a Chance

Sorry I am not covering Pulumi further. I don’t use it so I am not going to pretend to be an expert. I did some research because one of my YouTube subscribers asked me to do this comparison. This does not mean I do not recommend Pulumi.

If you are still deciding which IaC technology is right for you, you should also consider Pulumi, especially if you want to write IaC in a code-native programming language like JavaScript, Python, etc.

Code Re-Use

So now you have had and introduction to the “flavors” of Infrastructure as Code. You may even have a favorite. We can imagine ourselves writing a bit of code. Now let’s imagine scaling that IaC to many environments and applications. How can we leverage code re-use?

ARM Template Links

If you want to create a template for re-use you need to send a URI to the main template. It is not possible to pass a local file. If if you can send a protected link, you still have to publish it, which makes development and iteration of templates painfully slow.

This is what a templateLink looks like:

"resources": [
  {
    "type": "Microsoft.Resources/deployments",
    "apiVersion": "2019-10-01",
    "name": "linkedTemplate",
    "properties": {
      "mode": "Incremental",
      "templateLink": { // Painful 😖
        "uri": "https://mystorageaccount.blob.core.windows.net/AzureTemplates/newStorageAccount.json",
        "contentVersion": "1.0.0.0"
      },
      "parametersLink": { // Painful 😖
        "uri": "https://mystorageaccount.blob.core.windows.net/AzureTemplates/newStorageAccount.parameters.json",
        "contentVersion": "1.0.0.0"
      }
    }
  }
]

And don’t forget to append a SAS token to the URI to access the JSON file… now it’s clear why I don’t use ARM, right?

Terraform Modules

As an engineer I need to be able to work with local code when I am initially experimenting or for quick debugging. In Terraform, it’s really easy to create modules, which can be local or published to an external registry.

# Custom Module example
module "dev_cluster" {
  source              = "./../aks-cluster"
  name                = "dev-cluster"  
  vm_size             = "Standard_D2s_v3"   # ca. 68 EUR/mo.
  ssh_public_key      = "~/.ssh/id_rsa.pub"
  vnet_address_space  = ["10.100.0.0/25"]
  aks_subnet_prefixes = ["10.100.0.0/28"]
}

From the example it is clear how I can re-use infrastructure modules to easily create different deployment environments that vary slightly. For example, I can use the same custom aks-cluster module to create a cluster for production and choose more expensive Virtual Machines.

You can also publish your modules to the public terraform registry or a private registry in Terraform Cloud.

Pulumi Packages

Because Pulumi uses code native programming languages, you would leverage the language’s code re-use techniques. For example in JavaScript you create packages that you could publish to a registry as a node module.

This is a piece of example code from this Pulumi Blog article describes re-use in detail:

/**
 * Static website using Amazon S3, CloudFront, and Route53.
 */
export declare class StaticWebsite extends pulumi.ComponentResource  {
  readonly contentBucket: aws.s3.Bucket;
  readonly logsBucket: aws.s3.Bucket;
  readonly cdn: aws.cloudfront.Distribution;
  readonly aRecord?: aws.route53.Record;

  constructor(name: string , contentArgs: ContentArgs,
              domainArgs?: DomainArgs, opts?: pulumi.ResourceOptions);
}

Then you could use it like this:

// If you have publshed it to an NPM registry
import { StaticWebsite } from "static-website-aws";

// OR reference a local file
import { StaticWebsite } from "./static-website-aws";

// Then
const website  = new StaticWebsite ("browserhack", {
  pathToContent:"./browserhack",
  custom404Path:"/404.html",
});

Which IaC makes you most happy?

So now you’ve seen how programming Infrastructure as Code in ARM Templates, Terraform and Pulumi compare to each other.

You know my opinions. Which one is your favorite? I’d love to know, especially if you are using Pulumi in production. Let me know via @jng5 on Twitter or on YouTube.