Julie Ng's BlogWriting about the web development and fitness from an American expat in Germanyhttp://julie.io/writing2024-01-13T01:00:00+01:00Julie NgSetup git commits and authentication with multiple GPG keys and YubiKeyshttp://julie.io/writing/setup-git-multiple-gpg-and-yubikeys/2024-01-13T01:00:00+01:002024-01-20T10:41:12+01:00Julie Ng<p class="lead">Since I worked as an architect in the compliant financial industry, I have been <a href="https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification">signing my git commits so that people cannot impersonate me</a> in source code. I have always defaulted to a single GPG personal key that I could also use for <em>both</em> personal and work. But suddenly I needed to juggle two keys.</p>
<div class="article-image">
<img src="/assets/images/2024/yubikeys.jpg" alt="Separate YubiKeys for personal and work" style="max-width:600px">
<span>Separate YubiKeys for personal and work use</span>
</div>
<p>Please note that technically, it is completely possible to juggle multiple GitHub users for commits without any GPG and YubiKeys. This article is <em>only...</em></p><p class="lead">Since I worked as an architect in the compliant financial industry, I have been <a href="https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification">signing my git commits so that people cannot impersonate me</a> in source code. I have always defaulted to a single GPG personal key that I could also use for <em>both</em> personal and work. But suddenly I needed to juggle two keys.</p>
<div class="article-image">
<img src="/assets/images/2024/yubikeys.jpg" alt="Separate YubiKeys for personal and work" style="max-width:600px" />
<span>Separate YubiKeys for personal and work use</span>
</div>
<p>Please note that technically, it is completely possible to juggle multiple GitHub users for commits without any GPG and YubiKeys. This article is <em>only applicable</em> if you have multiple GitHub accounts setup with signed commits.</p>
<h4 id="result-preview">Result preview</h4>
<p>All the steps below describe how to ultimately end up with a green “Verified” badge for my work commits with setup:</p>
<pre><code class="language-bash">❯ gpg --list-secret-keys
[keyboxd]
---------
sec> rsa4096 2018-05-27 [SC]
121E4BXXXXXXXXXXXXXXXXXXXXXXXXXXX
Card serial no. = 0006 121XXXX
uid [ultimate] Julie Ng <redacted>
ssb> rsa4096 2018-05-27 [E]
ssb rsa2048 2019-09-28 [A]
sec> rsa4096 2024-01-13 [SC] [expires: 2026-01-12]
5F9DE7XXXXXXXXXXXXXXXXXXXXXXXXXXX
Card serial no. = 0006 1095XXXX
uid [ultimate] Julie Ng <redacted@microsoft.com>
ssb> rsa4096 2024-01-13 [E] [expires: 2026-01-12]
ssb> rsa4096 2024-01-13 [A] [expires: 2026-01-12]
</code></pre>
<p>Note that both key are on different <code>Card serial no.</code>s and that the <code>ssb></code> with <code>></code> indicating that the key is on a smartcard, i.e. not on my computer. See <a href="https://www.gnupg.org/documentation/manuals/gnupg24/gpg.1.html">gpg manual</a> for details.</p>
<h2 id="use-case-and-problem">Use case and problem</h2>
<h3 id="increased-security">Increased security</h3>
<p>The basic use case is that I have two (2) accounts I need to juggle:</p>
<ul>
<li>a personal account for personal open-source work that is public on GitHub.com</li>
<li>a work account that is managed by Microsoft for any internal repositories</li>
</ul>
<p>The problem is that my work account is managed by <a href="https://docs.github.com/en/enterprise-cloud@latest/admin/identity-and-access-management/understanding-iam-for-enterprises/about-enterprise-managed-users">GitHub Enterprise Managed Users (EMU)</a>, which means the identities are synced to an external identity provider.</p>
<p>The account is completely managed in our internal Entra ID tenant, which of course does not include my personal email. Because the account is externally managed (and merely synced to GitHub enterprise), there is no possibility for me to <a href="https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-personal-account-on-github/managing-email-preferences/verifying-your-email-address">“verify” my personal email</a>, so commits signed with my personal email always show up as “unverified”:</p>
<div class="article-image">
<img src="/assets/images/2024/ghe-gpg-unverified.png" alt="Unverified commits with personal email" style="max-width:600px" />
<span>"Unverified" commit when signed with personal email</span>
</div>
<p>This is clearly visible via the yellow-ish “Unverified” badge as well as my username not having a photo. It is a problem for pedantic Julie to have “unverified” associated with my work. So I fixed it by adding a second key</p>
<h4 id="why-a-second-yubikey">Why a second YubiKey?</h4>
<p>I need a second physical YubiKey because my existing key already is already full. <a href="https://support.yubico.com/hc/en-us/articles/4404456942738-FAQ">YubiKey’s OpenPGP application can only hold up to three private keys</a>, which are separate private keys for encryption, signing, and authentication.</p>
<h4 id="why-a-usb-type-a-key">Why a USB Type-A key?</h4>
<p>I have two different YubiKey types because originally I bought the Type-A version for use with a work computer, a Windows Surface computer without USB-C. I abandoned the PC and use my personal Mac, which is company managed for work. Now I need a USB adapter to use this key. But I’m too cheap to buy another YubiKey.</p>
<h2 id="generate-gpg-keys-with-work-email">Generate GPG keys with work email</h2>
<h3 id="step-1---generate-new-keys">Step 1 - Generate new keys</h3>
<p>As noted above, I bought the Type-A key ages ago. I had a private key on it. But it was borked. So I just generated new ones following <a href="https://developer.okta.com/blog/2021/07/07/developers-guide-to-gpg">Okta: Developers Guide to GPG and YubiKey</a></p>
<pre><code class="language-bash">gpg --full-generate-key
</code></pre>
<p>I created the new key using my work email address. See the <a href="https://developer.okta.com/blog/2021/07/07/developers-guide-to-gpg">Okta guide</a> for full steps.</p>
<h3 id="step-2---move-private-keys-to-yubikey">Step 2 - Move private keys to YubiKey</h3>
<p>See the <a href="https://developer.okta.com/blog/2021/07/07/developers-guide-to-gpg">Okta guide</a> for full steps.</p>
<pre><code class="language-bash">gpg --list-keys
gpg --edit-key <KEY-ID>
gpg > keytocard
</code></pre>
<h3 id="step-3---export-public-key-and-add-to-github">Step 3 - Export public key and add to GitHub</h3>
<p>Now the private keys are stored on the physical key, which we’ll need to sign our commits. We want to share the public key with GitHub so they can verify our signatures. First we export it.</p>
<pre><code class="language-bash">gpg --armor --export USER@COMPANY.com > public.key
</code></pre>
<p>And then add copy and paste contents from <code>public.key</code> via this documentation: <a href="https://docs.github.com/en/authentication/managing-commit-signature-verification/adding-a-gpg-key-to-your-github-account">Adding a GPG key to your GitHub account</a>.</p>
<p>Then delete the file <code>rm public.key</code> for good housekeeping.</p>
<h3 id="step-4---configure-repository-with-work-user">Step 4 - Configure repository with work user</h3>
<p>The majority of my work is public open source or personal, so my global git settings use that email.</p>
<p>So I have to configure each internal repository manually by going to the internal project folder and configure git using the <code>--local</code> flag:</p>
<pre><code class="language-bash">git config --local user.email <WORKEMAIL@microsoft.com>
git config --local user.signingkey <KEYID>
git config --local commit.gpgsign true
</code></pre>
<p>My name is the same, so I only need to configure the email and specify the work specific <code><KEY-ID></code>. Voila, you’re done and whenever you make a <code>git commit</code>, you will be prompted to insert your YubiKey and unlock it with the PIN.</p>
<h2 id="configure-authentication-with-multiple-accounts">Configure authentication with multiple accounts</h2>
<p>Now we can sign commits on our local workstations. But our multiple accounts will also need multiple authentication mechanisms. How do we juggle that?</p>
<p>First, you first need to <a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens">create personal access tokens</a> for <strong><em>both</em></strong> your personal and work accounts on the GitHub website. After that there are multiple options to juggle the tokens.</p>
<h4 id="warning---never-store-your-personal-access-token-in-your-repositorys-remote-url">Warning - NEVER store your personal access token in your repository’s remote url</h4>
<p>Never store tokens in a URL. Although it works it is clearly visible in plain text and not secure. There are many bad articles on the internet that suggest this awful method:</p>
<pre><code class="language-bash"># Do NOT do this - insecure!
git remote add origin https://<USER>:<PAT>@github.com…
</code></pre>
<p>I have also seen this in my work with customers. Do yourself a favor and take some time to understand how configuration, authentication and security work for git. It is one of those technology/company agnostic skills that’s valuable for life. So invest the time.</p>
<h4 id="option-1---git-credential-manager">Option 1 - <code>git-credential-manager</code></h4>
<p>The most straight-forward way that works for all git providers (not just GitHub) is to use <a href="https://github.com/git-ecosystem/git-credential-manager">git-credential-manager (GCM)</a>. It works and integrates will with operating system password managers. But be sure to configure it to use encrypted stores, also for caching. And the real challenge is managing multiple users. See this <a href="https://github.com/git-ecosystem/git-credential-manager/blob/release/docs/multiple-users.md">GCM documentation on how to manage multiple users</a>, which is fairly complicated.</p>
<p>I do not use git-credential-manager because I want to use my YubiKey and there are better otions for Mac users. So this option is not for me.</p>
<h4 id="option-2---github-cli">Option 2 - GitHub CLI</h4>
<p>If you only need GitHub, a newer and better way is to use the <a href="https://cli.github.com/manual/gh_auth_login">GitHub CLI</a>, which will <em>automatically</em> cache your credential security (if possible)
<a href="https://docs.github.com/en/get-started/getting-started-with-git/caching-your-github-credentials-in-git">Cache GitHub credentials</a>.</p>
<p>I need more than GitHub. And I want to use my YubiKey. So this option is also not for me.</p>
<h4 id="option-3---permanent-authentication-with-encrypted-netrc">Option 3 - Permanent authentication with encrypted <code>.netrc</code></h4>
<p>From a great developer experience perspective, I do not want to juggle multiple credentials. <strong>It should just work</strong> <em>and</em> be secure. This setup requires understanding that under the hood, git uses curl. And curl supports <a href="https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html">netrc</a>. The <a href="https://github.com/git/git/tree/master/contrib/credential/netrc"><code>git-credential-netrc</code> helper is built-in</a> and does not require additional software like the other options described above.</p>
<p>Although I do not need additional software, I do need to encrypt the <code>.netrc</code> file:</p>
<pre><code class="language-bash"># Encrypt the .netrc file (using personal key in example)
gpg --encrypt --recipient <user@email> -o .netrc.gpg .netrc
</code></pre>
<p>and then configure git to use this file:</p>
<pre><code class="language-bash">git config credential.helper = 'netrc -f ~/.netrc.gpg -v'
</code></pre>
<p>Read on to learn about what’s in that <code>.netrc</code> file.</p>
<h4 id="managing-multiple-accounts">Managing multiple accounts</h4>
<p>Identity based authentication is always complicated. If you read the <a href="https://git-scm.com/docs/gitcredentials">official git doc on configuring credentials</a> you’ll understand what the GCM and GitHub CLI is doing under the hood - setting custom configurations with hostname or path matching. If you are using GitHub enterprise server, you’ll have a separate host, e.g. <code>github.mycompany</code>, which is easier to configure. Generally, if you have GitHub Enterprise Cloud, you will need path matching because both use <code>github.com</code>.</p>
<div class="alert alert-yellow">
<p><strong>Updated 20 January 2024</strong><br />
This article has been updated to fix problems with my workflow. Be sure to include <code>machine github.com</code> <em>twice</em> in the <code>.netrc</code> file and configure git default credentials as described below.</p>
</div>
<pre><code class="language-bash"># .netrc (when decrypted)
machine github.com
login <personal-user>
password <personal-pat>
machine github.com
login <work-user>
password <work-pat>
</code></pre>
<h2 id="week-later-more-complicated-than-thought">1 week later… more complicated than thought</h2>
<p>I did not thoroughly test. I discovered it worked as an unencrypted <code>.netrc</code>, but <strong>not</strong> as an encrypted <code>.netrc.gpg</code>, which was maddening to debug.</p>
<p>Ultimately I added the <code>-d</code> flag to the netrc credential helper and saw it could not pick up the user when I only listed <code>machine github.com</code> once. Based on that I made a few changes to get all of this working.</p>
<h3 id="auto-toggle-users">Auto-toggle users</h3>
<ol>
<li>Added second <code>machine github.com</code> line to the <code>.netrc</code> file</li>
<li>Specified default users in the global <code>~/.gitconfig</code>:</li>
</ol>
<pre><code class="language-bash"># .gitconfig - comments for article only
[credential]
helper = netrc -f ~/.netrc.gpg -v
# default to prsonal user
user = julie-ng
# specify credential helpers should match paths, not just hosts
useHttpPath = true
# specify work user for work specific repos
[credential "https://github.com/<WORK_ORG>/*"]
user = <WORK_USER>
</code></pre>
<p>Finally in my desperate debugging, I had also specified the user explicitly in the git remote URLs, for example:</p>
<pre><code class="language-bash"># if not using .gitconfig above
git remote set-url origin https://<WORK_USER>@github.com/<WORK_ORG>/…
</code></pre>
<p>I have sinced removed the users from my remote URLs now that everything is in the global git configuration. Finally 😅</p>
<h2 id="conclusion">Conclusion</h2>
<p>So we jumped through all the hoops of…</p>
<ul>
<li>creating multiple GPG keys</li>
<li>storing our private keys on multiple YubiKeys</li>
<li>configuring out git clients to be able to handle multiple authentication credentials</li>
</ul>
<p>…just so I can get a green “Verified” badge and see my photo 😅</p>
<div class="article-image">
<img src="/assets/images/2024/ghe-gpg-verified.png" alt="Verified commits with work email" style="max-width:600px" />
<span>"Verified" commit when signed with work email</span>
</div>
<p>In all seriousness, the hoops are worth it <em>to me</em> to ensure no one can impersonate me and if my someone got access to my computer it’s impossible for the hackers to get access to credentials without access to physical YubiKeys and the PINs to unlock them.</p>
Infrastructure as Code and Monorepos - a Pragmatic Approachhttp://julie.io/writing/infra-as-code-monorepo/2022-01-05T01:00:00+01:002024-01-20T10:41:06+01:00Julie Ng<p class="lead">As engineers move beyond “hello world” samples, they can struggle extending the code to multiple deployment targets and creating automation pipelines. How can we structure code for re-use and automation <em>and</em> ensure we won’t accidentally deploy to production?</p>
<figure class="figure-center">
<img src="/assets/images/2021/iac-monorepo-how.png" alt="Infrastructure as Code monorepo for multiple environments" width="540">
</figure>
<p>There are many ways to do this. In this article I will share one solution that uses a monorepo to deploy and manage multiple Kubernetes clusters. The source code is public, maintained and available at <a href="https://github.com/julie-ng/cloudkube-aks-clusters">github.com/julie-ng/cloudkube-aks...</a></p><p class="lead">As engineers move beyond “hello world” samples, they can struggle extending the code to multiple deployment targets and creating automation pipelines. How can we structure code for re-use and automation <em>and</em> ensure we won’t accidentally deploy to production?</p>
<figure class="figure-center">
<img src="/assets/images/2021/iac-monorepo-how.png" alt="Infrastructure as Code monorepo for multiple environments" width="540" />
</figure>
<p>There are many ways to do this. In this article I will share one solution that uses a monorepo to deploy and manage multiple Kubernetes clusters. The source code is public, maintained and available at <a href="https://github.com/julie-ng/cloudkube-aks-clusters">github.com/julie-ng/cloudkube-aks-clusters</a>.</p>
<p><strong>Disclaimer</strong> this article is a mix of best practices and walkthrough of a specific high trust use-case. Your requirements may differ.</p>
<h2 id="what-is-a-monorepo">What is a Monorepo?</h2>
<p>In the context of cloud infrastructure automation, a monorepo approach refers to a single repository that holds <strong>both</strong>:</p>
<ul>
<li>deployment templates</li>
<li>deployment configuration</li>
</ul>
<p>which has the following consequences.</p>
<h4 id="advantages">Advantages</h4>
<ul>
<li>Easier to understand</li>
<li>Faster to debug when configuration is next to template code</li>
</ul>
<h4 id="disadvantages">Disadvantages</h4>
<ul>
<li>Urge to “copy & paste” code to make debugging easier than having to correlate code between two different repositories with separate git histories.</li>
<li>A single repo means only 1 security boundary in git</li>
</ul>
<p>Because you cannot use folders as a security boundary in git, anyone with write-access to the monorepo can trigger deployments, incl. to production. It is possible to introduce a <em>soft</em> boundary by using a combination of <a href="https://www.atlassian.com/git/tutorials/making-a-pull-request">Pull Request workflow</a> and <a href="https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/about-protected-branches">protected branches</a>. But organizations with stricter requirements to remove write-access from developers will adopt the multi-repo approach.</p>
<h4 id="my-use-case">My Use Case</h4>
<p>In my <a href="https://github.com/julie-ng/cloudkube-aks-clusters">cloudkube-aks-clusters</a> project, I do not need such a security boundary because it’s just me and thus a high trust scenario.</p>
<h2 id="leverage-software-modules-for-multiple-environments">Leverage Software Modules for Multiple Environments</h2>
<p>Do not use copy and paste <em>ever</em>. For work in progress code, leverage <a href="https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell">git branches</a>. If you are not experienced with creating software modules, start with a single giant file to make progress. When you can deploy the infrastructure you need (but do not wait until it’s perfect), refactor into modules to follow software programming best practice and <a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself">DRY your code</a>.</p>
<p>Once you DRY your code, you will have an <strong>abstraction</strong> for your environment, which will include all the compute and data infrastructure for your workloads. Your abstraction will have different syntax depending on the language you choose. If you’re using Pulumi and JavaScript, your abstraction may look something like this:</p>
<pre><code class="language-javascript">// example IaC Module pseudo-code
const AppEnvironment = require('custom-module')
const dev = new AppEnvironment({
name: 'dev',
postgresVersion: '14.1'
})
const prod = new AppEnvironment({
name: 'prod',
postgresVersion: '13.5'
})
</code></pre>
<p>I prefer <a href="https://www.terraform.io/">Terraform</a>, but the concepts of software modules and parameters are generic and will also apply to <a href="https://docs.microsoft.com/en-us/azure/azure-resource-manager/bicep/modules">Azure Bicep modules</a> and the modules ecosystem of the <a href="https://www.pulumi.com/">Pulumi</a> language you choose, e.g. <a href="https://docs.npmjs.com/about-packages-and-modules">npm packages or npm modules</a> for JavaScript.</p>
<h3 id="why-are-custom-abstractions-necessary">Why are Custom Abstractions necessary?</h3>
<p>Official modules, e.g. <a href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster">official Microsoft managed Terraform module</a> for an Azure Kubernetes cluster are bare-bones by design. Many small modules allows for greatest flexibility in customizing your architecture. To get started you may use the official provider to create and deploy a resource, e.g. the Kubernetes cluster. In real life, you will eventually need to add your own specific requirements.</p>
<h4 id="my-kubernetes-cluster-requirements">My Kubernetes Cluster Requirements</h4>
<p>For example, my <a href="https://github.com/julie-ng/cloudkube-aks-clusters/tree/main/modules/aks-cluster">aks-cluster</a> module add some security and automation resources on top of my Kubernetes cluster:</p>
<ul>
<li><a href="https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-overview">Virtual Networks</a> for cluster integration</li>
<li>Headless <a href="https://docs.microsoft.com/en-us/azure/role-based-access-control/overview#security-principal">security principal</a> to be used by cluster <a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/">ingress controller</a> to fetch TLS certificates</li>
<li>Headless <a href="https://docs.microsoft.com/en-us/azure/role-based-access-control/overview#security-principal">security principal</a> to be used in CI/CD automation</li>
<li>An <a href="https://docs.microsoft.com/en-us/azure/key-vault/general/basic-concepts">Azure Key Vault</a> for Kubernetes <a href="https://docs.microsoft.com/en-us/azure/aks/csi-secrets-store-driver">secrets integration</a></li>
<li>etc.</li>
</ul>
<p>Note these resources are created <em>per environment</em> to follow <a href="https://en.wikipedia.org/wiki/Principle_of_least_privilege">Principle of Least Privilege</a>, which is one of <strong><em>my</em></strong> specific requirements. Your requirements may differ.</p>
<h3 id="separate-configuration-files-per-environment">Separate Configuration Files per Environment</h3>
<p>Once we’ve created an IaC module, we can re-use the same code for multiple deployment environments. This is done using separate config files per environment. The IaC configuration only needs to know which parameters to set, for example this excerpt from my <a href="https://github.com/julie-ng/cloudkube-aks-clusters/blob/main/environments/dev/dev.cluster.tfvars">dev.cluster.tfvars</a>:</p>
<pre><code class="language-hcl"># module config (excerpt)
name = "cloudkube-dev"
env = "dev"
hostname = "dev.cloudkube.io"
kubernetes_version = "1.20.9"
</code></pre>
<p>Treat your module like software and provide documentation so engineers know how to use it. At a minimum you should document required parameters and default values.</p>
<p><strong>Pro Tip</strong> - if you are using Terraform you can autogenerate module documentation with <a href="https://terraform-docs.io/">terraform-docs.io</a>. See this generated <a href="https://github.com/julie-ng/cloudkube-aks-clusters/blob/main/modules/aks-cluster/README.md">README.md</a> summary, which saves me the trouble of having to open and read multiple Terraform files. Be aware the docs are only as good as your coding.</p>
<h2 id="use-subfolders-per-environment-configuration">Use Subfolders Per Environment Configuration</h2>
<p>Now that we have re-usable IaC modules, the next challenge is to setup automation pipelines that do not <em>unintentionally</em> deploy to production. This is a common fear for engineers getting started with DevOps and CI/CD.</p>
<p>The hurdle here is to understand that while most beginner pipeline documentation focusses on <a href="https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#onpushpull_requestbranchestags">branch triggers</a>, pipelines also have <a href="https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#onpushpull_requestpaths">path triggers</a> and you will need <em>both</em>. Unlike application pipelines however, your deployment target will be determined by <strong>paths</strong> not branches.</p>
<p>To better understand this, let’s walk through an example.</p>
<h3 id="leverage-path-based-pipeline-triggers">Leverage Path based Pipeline Triggers</h3>
<p>Given the following file tree structure (with example multi-region production scenario)…</p>
<pre><code class="language-text">environments/
├── dev/
├── prod-northeurope/
├── prod-westeurope/
└── staging/
</code></pre>
<p><em>and</em> given the following pipeline triggers…</p>
<pre><code class="language-yaml"># azure-pipelines/production.yaml
trigger:
branches:
- main
paths:
include:
- 'environments/prod-northeurope/*'
- 'environments/prod-westeurope/*'
</code></pre>
<p>I could then create pipelines that <strong><em>only</em></strong> run against production environments <strong>IF</strong>…</p>
<ul>
<li>a commit is pushed to the <code>main</code> branch</li>
<li>a change is made to configuration files inside <code>prod-northeurope</code> and <code>prod-westeurope</code> subfolders.</li>
</ul>
<h3 id="work-in-progress-changes">Work in Progress Changes</h3>
<p>In this scenario, I can actively make changes to the Terraform modules code in the <a href="../modules/"><code>modules/</code></a> folder, but automated deployments using the triggers below will <strong>not run against production until</strong> the changes are made to the <code>environments/prod…</code> folders.</p>
<p>To better illustrate the various triggers, let’s map the corresponding deployments into a table.</p>
<table class="has-border">
<thead>
<tr>
<th style="text-align: left">Pipeline</th>
<th style="text-align: left">Branch</th>
<th style="text-align: left">Path</th>
<th style="text-align: left">Deployment Target</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><code>ci.yaml</code></td>
<td style="text-align: left"><code>*</code></td>
<td style="text-align: left"><code>*</code></td>
<td style="text-align: left">-</td>
</tr>
<tr>
<td style="text-align: left"><code>cd.yaml</code></td>
<td style="text-align: left"><code>dev</code></td>
<td style="text-align: left"><code>modules/*</code></td>
<td style="text-align: left">DEV</td>
</tr>
<tr>
<td style="text-align: left"><code>cd.yaml</code></td>
<td style="text-align: left"><code>dev</code></td>
<td style="text-align: left"><code>environments/dev/*</code></td>
<td style="text-align: left">DEV</td>
</tr>
<tr>
<td style="text-align: left"><code>cd.yaml</code></td>
<td style="text-align: left"><code>main</code></td>
<td style="text-align: left"><code>modules/*</code></td>
<td style="text-align: left">Staging</td>
</tr>
<tr>
<td style="text-align: left"><code>cd.yaml</code></td>
<td style="text-align: left"><code>main</code></td>
<td style="text-align: left"><code>environments/staging/*</code></td>
<td style="text-align: left">Staging</td>
</tr>
<tr>
<td style="text-align: left"><code>cd-production.yaml</code></td>
<td style="text-align: left"><code>main</code></td>
<td style="text-align: left"><code>environments/prod-northeurope/*</code></td>
<td style="text-align: left">Production (North Europe)</td>
</tr>
<tr>
<td style="text-align: left"><code>cd-production.yaml</code></td>
<td style="text-align: left"><code>main</code></td>
<td style="text-align: left"><code>environments/prod-westeurope/*</code></td>
<td style="text-align: left">Production (West Europe)</td>
</tr>
</tbody>
</table>
<h3 id="leverage-resource-tagging-and-iac-versioning">Leverage Resource Tagging and IaC Versioning</h3>
<p>Sometimes changes may be under the hood improvements, e.g. refactoring the infrastructure as code. But you should still deploy to production to confirm that the infrastructure does not change. You can test this <em>without</em> changing the infrastructure by using tags. See this <a href="https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/decision-guides/resource-tagging/?toc=/azure/azure-resource-manager/management/toc.json#resource-tagging-patterns">Azure documentation</a> for example common tagging patterns. Resource tagging is a generic concept also offered by other cloud providers.</p>
<p>Using tags is straight-forward and a general good practice. Just as I tag my resources <code>env:staging</code>, I could also tag them <code>iac-version:1.28</code> and bump the versions according to your schema. I prefer <a href="https://semver.org/">semantic versioning</a>.</p>
<h2 id="infrastructure-as-code-rollbacks">Infrastructure as Code Rollbacks</h2>
<p>Rollbacks are a part of real life cloud engineering. And they ARE scary. But over time and with experience, it is straight-forward to rollback configuration changes <em>with confidence</em>.</p>
<h4 id="dev-and-staging-rollbacks">Dev and Staging Rollbacks</h4>
<p>Non-production rollbacks are expected to be messy because they are not versioned.</p>
<p>Personally, I generally only track these by the git branch heads. So if I need to undo a change, I need to change the code. Because I don’t like waiting minutes for CI builds, I tend to apply Terraform changes locally and check in the code <em>after</em> I have the result I want. So when the pipeline runs against a remote backend, it won’t find any configuration changes and not execute <code>terraform apply</code>.</p>
<p>The trade-off here is the risk of the “it works on my machine” effect. In the <a href="https://github.com/julie-ng/cloudkube-aks-clusters">cloudkube-aks-clusters</a> repo, I’m the only contributing engineer and thus the risk is low. Your mileage will vary.</p>
<h4 id="production-rollbacks">Production Rollbacks</h4>
<p>The key here is <em>discipline</em> when using git. In general for both application and infrastructure workloads, I tend to track production code with with <em>both</em></p>
<ul>
<li><code>production</code> or <code>main</code> branch heads, i.e. tips</li>
<li>git tags in <a href="https://semver.org/">semantic versioning</a> format. See example <a href="../CHANGELOG.md">CHANGELOG.md</a> created with <a href="https://github.com/conventional-changelog/standard-version">standard-version</a> from this <a href="https://github.com/julie-ng/cloudkube-aks-clusters">cloudkube-aks-clusters</a> repo.</li>
</ul>
<p>Using tags, I have a clearer overview of intended deloyments. In the simplest scenario, if I am at v0.3.0 but want to rollback to v0.2.1, I would return the code to that previous point (preferably without a force push) and re-deploy.</p>
<h4 id="deployments-outside-of-pipeline-runs">Deployments outside of Pipeline Runs</h4>
<p>Most engineers understand that for example the v0.3.0 deployment might be numbered deployment #86 and the rollback is deployment #87. But there can also be a deployment gap, for example:</p>
<table class="has-border">
<thead>
<tr>
<th style="text-align: left">Deployment #</th>
<th style="text-align: left">Trigger</th>
<th style="text-align: left">Details</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">86</td>
<td style="text-align: left">git push</td>
<td style="text-align: left">Pipeline Deploy of v0.3.0</td>
</tr>
<tr>
<td style="text-align: left">87</td>
<td style="text-align: left">Nightly Scheduled Run</td>
<td style="text-align: left">Resolve configuration drift that someone did in Cloud Provider UI</td>
</tr>
<tr>
<td style="text-align: left">88</td>
<td style="text-align: left">Manual or git push</td>
<td style="text-align: left">Pipeline Deploy or Manual Rollback to v0.2.1</td>
</tr>
</tbody>
</table>
<p>In this example an engineer intentionally triggers deployment #86 and #88. But the deployment #87 runs in between and may be triggered outside the normal engineer workflow. It can be en engineer who configured their own scheduled nightly runs. It can also be change that is enforced centrally, for example if an organization uses <a href="https://docs.microsoft.com/en-us/azure/governance/policy/concepts/effects">Azure policy</a> to strictly enforce governance of their cloud real estate.</p>
<p>To have predictable and reliable infrastructure, you need to be aware of all the ways deployments can happen, including the ones outside of your control.</p>
<h2 id="when-should-you-not-use-a-monorepo">When should you <em>not</em> use a Monorepo?</h2>
<p>Starting with a monorepo is the quickest way to deployment. It’s simple but also very versatile for the experienced engineer.</p>
<p>So when should you not use a monorepo? That will be a future article ;-) Follow me on <a href="https://twitter.com/jng5">Twitter</a> and <a href="https://www.youtube.com/c/JulieNgTech/">YouTube</a> to be notified when it gets published.</p>
<hr />
<p><em>P.S. Props to anyone who went through the <a href="https://github.com/julie-ng/cloudkube-aks-clusters">julie-ng/cloudkube-aks-clusters</a> code and noticed it does <strong>not</strong> have any pipeilnes. That’s another way to approach security - remove automation altogether ;-)</em></p>
<p><em>In all seriousness, if you’re looking for infrastructure as code pipeilnes, checkout the <a href="https://github.com/Azure/devops-governance/tree/main/azure-pipelines">azure/devops-governance</a> repo, which follows a very similar pipeine structure to the one described above. That project includes pipelines because they deploy to a Visual Studio Enterprise subscription on my personal Azure AD tenant. The Kubernetes clusters project described in this article is deployed to a Microsoft internal Azure subscription. So there are no automation pipeilnes in this public repository to be better safe than sorry.</em></p>
CI/CD Review - How DevOps in Real Life & Mature Organizations workshttp://julie.io/writing/ci-cd-review/2021-03-01T01:00:00+01:002024-01-20T10:41:06+01:00Julie Ng<p class="lead">People love checklists because they give the illusion of an easy success. But DevOps is not straight-forward and looks different for each team and application. That is why I conduct reviews with Azure customers as an engineer at Microsoft in an interview-style discussion. And like an interview, I’m not challenging your answers, but your thought process.</p>
<p><em>Want some more context and answers? Watch me walkthrough some of these questions and share examples from real life.</em></p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/e4lJmgd_4DA" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<h3 id="goal-of-the-review">Goal of the Review</h3>
<p>This...</p><p class="lead">People love checklists because they give the illusion of an easy success. But DevOps is not straight-forward and looks different for each team and application. That is why I conduct reviews with Azure customers as an engineer at Microsoft in an interview-style discussion. And like an interview, I’m not challenging your answers, but your thought process.</p>
<p><em>Want some more context and answers? Watch me walkthrough some of these questions and share examples from real life.</em></p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/e4lJmgd_4DA" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<h3 id="goal-of-the-review">Goal of the Review</h3>
<p>This exercise focuses on DevOps in practice, not in theory. After going through the questions, you should be able to better gauge <em>your confidence</em> in <em>your practices</em> meeting the requirements of <em>your use case</em>. It will also help you figure out what is the next practice you want to improve upon.</p>
<p>Most of us don’t meet every requirement and do everything listed below all the time for every project. Re-visit this exercise every now and thing and continuously improve.</p>
<p>Keep in mind this conversation is <strong>cloud-agnostic and therefore for everyone</strong>, not just Microsoft Azure customers. In fact, if you know me personally you will know my favorite CI/CD tool is Jenkins.</p>
<p>The questions are organized into the following categories, loosely structured around Microsoft’s Well Architected Framework:</p>
<ul>
<li><a href="#release-management">Release Management</a></li>
<li><a href="#pipelines">Pipelines</a></li>
<li><a href="#security">Security</a></li>
<li><a href="#governance">Governance</a></li>
<li><a href="#cost-optimization">Cost Optimization</a></li>
</ul>
<h2 id="release-management">Release Management</h2>
<h4 id="what-is-your-versioning-scheme">1. What is your versioning scheme?</h4>
<ul>
<li>Can you tell me which versions of your code are on Dev vs QA vs Production?</li>
<li>Consider <a href="https://semver.org/">Semantic Versioning</a> format of <code>MAJOR.MINOR.PATCH</code> is most common in Open Source Software.</li>
</ul>
<h4 id="do-you-use-naming-conventions">2. Do you use naming conventions?</h4>
<ul>
<li><strong>Branch names</strong>
Common examples include:
<ul>
<li><code>feat/*</code></li>
<li><code>fix/*</code></li>
<li><code>main</code></li>
<li><code>production</code></li>
<li><code>qa</code></li>
</ul>
</li>
<li><strong>Commit messages</strong>
<a href="https://www.conventionalcommits.org/en/v1.0.0/">Conventional Commits</a> is a common standard in Open Source. Examples include:
<ul>
<li><code>docs(readme): add instructions</code></li>
<li><code>chore(deps): update</code></li>
<li><code>chore(release): 0.7.0</code></li>
<li><code>feat(signup): add new button</code></li>
<li><code>fix(ui): misaligned header</code></li>
</ul>
</li>
</ul>
<h4 id="do-you-have-a-change-log">3. Do you have a Change Log?</h4>
<ul>
<li>Is it automated?</li>
<li>It is absolutely OK to start with a manual change log.</li>
</ul>
<p>This is an example changelog from my <a href="https://github.com/julie-ng/azure-nodejs-demo/blob/main/CHANGELOG.md">azure-nodejs-demo</a> project, which is generated with <a href="https://www.npmjs.com/package/standard-version">Standard Version</a>:</p>
<p><img src="/assets/images/2021/auto-changelog-example.png" alt="Automated Changelog" class="has-border" width="450" /></p>
<h4 id="are-you-linking-commits-to-features-bugs-etc-in-your-dev-planning-tool-eg-azure-boards-or-github-issues">4. Are you linking commits to features, bugs, etc. in your dev planning tool (e.g. Azure Boards or GitHub Issues)?</h4>
<p>Note how in the example change log above the features and bug fixes are linked to specific commits. It’s easier than it looks. For more info, see your provider’s documentation:</p>
<ul>
<li><a href="https://docs.github.com/en/github/writing-on-github/autolinked-references-and-urls">GitHub: Autolinked references and URLs</a></li>
<li><a href="https://docs.gitlab.com/ee/user/project/issues/crosslinking_issues.html">GitLab: Crosslinking Issues</a></li>
<li><a href="https://docs.microsoft.com/en-us/azure/devops/notifications/add-links-to-work-items?view=azure-devops#link-to-work-items-from-pull-requests-commits-and-comments">Azure DevOps: Link to work items from pull requests, commits, and comments</a></li>
</ul>
<h4 id="can-you-describe-your-git-branching-workflow">5. Can you describe your git branching workflow?</h4>
<p>There is no single “correct” answer, even for the same use case. A developer team must decide together and <em>commit</em> to following it. One of the most frustrating periods in my career was trying to force my co-workers to work in the pedantic way I do. Unsurprisingly I was not very popular. We were working to port a legacy application to the cloud and eventually the team learned to appreciate git submodules, after they gained experience <em>how</em>. It was my mistake to not let them learn at their own pace.</p>
<h5 id="pro-tips">Pro Tips</h5>
<ul>
<li>Your branch workflow should be documented. Consider also drawing this out.</li>
<li>To test your mastery, see if you can explain your workflow <em>without</em> notes and sketch the workflow from scratch. Start with a simple monolithic project, then do the same with more complex situations, e.g. with:
<ul>
<li>dependencies on other services</li>
<li>distinct environments, e.g. <code>staging</code>, <code>uat</code> and <code>production</code></li>
<li>infrastructure - if you own it and have infrastructure as code</li>
</ul>
</li>
</ul>
<h5 id="resources-getting-started-with-git-workflows">Resources: Getting Started with Git Workflows</h5>
<ul>
<li>This <a href="https://www.atlassian.com/git/tutorials/comparing-workflows">comparing workflows article</a> from Atlassian is a good place to start.</li>
<li><a href="https://www.endoflineblog.com/oneflow-a-git-branching-model-and-workflow">OneFlow</a> is also popular, more recent and worth mentioning.</li>
</ul>
<h2 id="pipelines">Pipelines</h2>
<p>Please note these questions will be very <strong>workload specific</strong>. If you are trying to measure your own expertise, try mapping out answers for both simple and complex workloads.</p>
<h4 id="do-your-pipelines-generate-assets-eg-binaries-builds">6. Do your pipelines generate assets, e.g. binaries, builds?</h4>
<ul>
<li>How are they archived?</li>
<li>How are they distributed? How many people in your organization have access?</li>
<li>Have you built artifacts that contain secrets or certificates? Have you secured them? Note: obviously you should not do this. But sometimes you have to deal with a legacy application.</li>
</ul>
<h4 id="when-your-pipeline-runs-how-many-environments-does-it-deploy-to">7. When your pipeline runs, how many environments does it deploy to?</h4>
<p>One push should trigger deployment(s) for a <em>single</em> environment. Confirm that you have used <code>condition</code>s and triggers properly to ensure production is not accidentally deployed to.</p>
<h4 id="do-you-schedule-your-pipelines-to-run-regularly-to-ensure-it-still-works">8. Do you schedule your pipelines to run regularly to ensure it <em>still</em> works?</h4>
<ul>
<li>Are you just running unit tests?</li>
<li>Are you also deploying to (non-prod) environments?</li>
</ul>
<h4 id="how-do-you-re-use-pipeline-code">9. How do you re-use pipeline code?</h4>
<p>If you are just starting with DevOps, ignore this. An additional abstraction layer will not help you master the one measure that matters: how often you deploy. If you choose to go this route, I would ask you:</p>
<ul>
<li><em>WHY</em>? What do you hope to achieve?</li>
<li>What is your versioning model?</li>
<li>Is this a public or private library? If private, how to you secure it?</li>
<li>Who owns and <em>maintains</em> this code?</li>
</ul>
<p>If you want to pursue knowledge transfer in your organization, I can tell you based on first hand experience at Allianz Germany, this is more daunting than it appears. If you don’t create and communicate your ownership and collaboration model correctly from the beginning, you’ll end up with dozens of forks and trying to support outdated versions and are maybe worse off than if you didn’t have libraries to begin with.</p>
<h5 id="vendor-documentation">Vendor Documentation</h5>
<ul>
<li><a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/process/templates?view=azure-devops">Azure DevOps Pipeline Templates</a></li>
<li><a href="https://www.jenkins.io/doc/book/pipeline/shared-libraries/">Jenkins Pipeline Libraries</a></li>
<li><a href="https://docs.github.com/en/actions/creating-actions/creating-a-composite-run-steps-action">GitHub Actions - Composite Run Steps Action</a></li>
</ul>
<h4 id="pull-requests---do-they-trigger-pipelines-which-ones">10. Pull Requests - do they trigger pipelines? Which ones?</h4>
<ul>
<li>
<p>As I explain in <a href="https://www.youtube.com/watch?v=lCCRmxYZUs8">this YouTube video, Pull Requests are a security backdoor</a>. Therefore, make sure you go through all your pull request workflows and pipeline code to ensure they only run when you intend them to run.</p>
</li>
<li>
<p>Are you <em>sure</em> production is not accidentally deployed?
This is an important sanity check question. I often ask myself this too to ensure I verify my assumptions and work before moving on.</p>
</li>
</ul>
<h3 id="deployment-strategies">Deployment Strategies</h3>
<h4 id="what-is-the-difference-between-your-dev-and-prod-environments-how-does-it-affect-your-confidence-to-deploy-to-production">11. What is the difference between your dev and prod environments? How does it affect your confidence to deploy to production?</h4>
<ul>
<li>Some people are comfortable with just a dev and production environment. Other teams want a more stable “staging” environment before production. Which group do you belong to? Why?</li>
<li>Most people think about source code when it comes to pre-production. What about your data? Do you have test data that is as close to production as possible? How?</li>
</ul>
<h4 id="what-is-your-production-rollout-strategy">12. What is your production rollout strategy?</h4>
<p>It is <strong>very much OK to deploy manually</strong> to production, regardless of whether your organization are new to CI/CD or not. Some organizations disallow automatic deliveries (to production) for compliance reasons.</p>
<p>If you practice continuous delivery (and most of us are not Netflix), here are the most common options:</p>
<ul>
<li>Rolling Updates</li>
<li>Blue/green deployments</li>
<li>Canary deployments</li>
</ul>
<p>If you choose automatic deliveries, I would challenge you further on the following questions 13-15 that also relate to deployment.</p>
<h4 id="how-do-you-update-your-database-when-you-release-a-new-feature-to-your-data-models">13. How do you update your database when you release a new feature to your data models?</h4>
<ul>
<li>Do you migrate the database first and then release the new code? Or vice versa? Why?</li>
<li>Is this done via your software Framework, e.g. <a href="https://guides.rubyonrails.org/active_model_basics.html">Active Model</a> or <a href="https://docs.microsoft.com/en-us/ef/">Entity Framework</a>? Or are you writing SQL scripts?</li>
<li>What happens if you have 2 versions of your application running against same database?</li>
<li>How do you <em>revert a database migration</em>? Also a part of Question 15.</li>
<li>Do you have model validations in your software? Do you know if existing production data is still valid? How?</li>
</ul>
<h4 id="how-do-you-know-if-a-deployment-succeeded">14. How do you know if a deployment succeeded?</h4>
<ul>
<li>Do you have automated end to end tests? What is your coverage percentage?</li>
<li>Are you testing by hand?</li>
<li>Sometimes a deployment is successful but server returns a 50x. How would you catch this? What role does monitoring play here?</li>
</ul>
<h4 id="how-do-you-perform-rollbacks">15. How do you perform rollbacks?</h4>
<p>Let’s assume a security bug was deployed in your last release…</p>
<ul>
<li>How do you rollback code <em>and</em> the database if needed?</li>
<li>Will your users notice? In what way?</li>
<li>How will you document and version this rollback?</li>
</ul>
<p>Think about consequences of just overwriting existing code. Can you really just do a simple <code>git revert</code>?</p>
<h4 id="do-production-deployments-need-to-be-approved-manually">16. Do production deployments need to be approved manually?</h4>
<p>If so, how are you achieving this? Examples include:</p>
<ul>
<li>Pull Requests</li>
<li>Approvals and Release Gates</li>
</ul>
<h2 id="security">Security</h2>
<h4 id="credentials-and-secrets">17. Credentials and Secrets:</h4>
<ul>
<li>Where are your credentials stored?</li>
<li>Can they be exposed as plain text in any way? What happens if a developer tries to <code>echo $SECRET</code> in a pipeline?</li>
</ul>
<p>It’s important to understand that giving access to run a pipeline is giving access to the secret. Once in the build job (via pipeline as code), a rogue developer could send the credential off to another location if she wanted to. Therefore it’s important to discuss the role of pull requests here and how to separate credentials across environments.</p>
<h4 id="how-are-you-separating-and-storing-configuration">18. How are you separating and storing configuration?</h4>
<ul>
<li>What is saved in git?</li>
<li>What is configured in environment variables?</li>
<li>Which credentials are stored in the build server?</li>
<li>Which credentials are stored in a secret management service, e.g. <a href="https://azure.microsoft.com/en-us/services/key-vault/">Azure Key Vault</a> or <a href="https://www.vaultproject.io/">HashiCorp Vault</a>?</li>
<li>How do you ensure development environments only have access to development credentials and ditto with production?</li>
</ul>
<h2 id="governance">Governance</h2>
<h4 id="are-you-using-a-single-identity-plane-across-cicd-and-the-cloud">19. Are you using a single identity plane across CI/CD and the cloud?</h4>
<p>Basically I am asking if you have the same RBAC both to the cloud API directly as well as to CI/CD starting with git? If not, you may have a back door somewhere because you would need to keep your RBACs in sync.</p>
<p>I have seen scenarios where developers were not allowed to access production environments from Azure Portal and Azure CLI. But if they knew how to trigger the pipelines, they could potentially take down production anyway.</p>
<h4 id="how-have-you-documented-rbac-and-acls">20. How have you documented RBAC and ACLs?</h4>
<p>Governance is complex, even for smaller teams. Maybe you can answer my questions today, what about next month? That’s why you need to document.</p>
<p>See <a href="https://docs.microsoft.com/en-us/azure/devops/organizations/security/permissions-access?toc=%2Fazure%2Fdevops%2Fsecurity-access-billing%2Ftoc.json&bc=%2Fazure%2Fdevops%2Fsecurity-access-billing%2Fbreadcrumb%2Ftoc.json&view=azure-devops">“Default permissions and access for Azure DevOps”</a> as an example.</p>
<h4 id="how-are-you-ensuring-only-authorized-developers-can-deploy-to-production">21. How are you ensuring only authorized developers can deploy to production?</h4>
<p>This is an open ended question designed to test how well you understand your workflow. If you are in my customer session, I would ask you to share your screen and show access controls. I don’t go over them line by line but look at:</p>
<ul>
<li>Branch Protection configuration, e.g. require pull requests, are force pushes allowed?</li>
<li>Pull Request configuration, e.g. who can can approve, passing build requirements, etc.</li>
</ul>
<h4 id="how-do-you-handle-access-to-shared-protected-resources-if-applicable">22. How do you handle access to shared protected resources? (if applicable)</h4>
<p>In larger organizations, there may be shared resources, e.g. an artifact registry that is managed outside the developer team.</p>
<ul>
<li>Who has write access? To which scope?</li>
<li>Which resources must be shared and how do you ensure that developers have read-only access?</li>
</ul>
<h4 id="are-you-signing-your-commits-to-verify-identity-if-applicable">23. Are you signing your commits to verify identity? (if applicable)</h4>
<p>Note: git only checks integrity, not authentication. The only way to verify authorship of a commit is to sign commits. Unfortunately Azure DevOps does not support this. But GitHub does.</p>
<p><img src="/assets/images/2021/github-verified-commit.png" alt="GitHub shows verified commits" class="has-border" /></p>
<h2 id="cost-optimization">Cost Optimization</h2>
<h4 id="do-you-clean-up-artifacts">24. Do you clean up artifacts?</h4>
<p>Some build jobs will produce an artifact for every run. How do you clean up the ones that never make it to production and store the ones that do?</p>
<h4 id="do-you-use-a-different-environment-for-development-that-is-sized-accordingly">25. Do you use a different environment for development that is sized accordingly?</h4>
<p>To save costs, your non-production environments should be configured for less performance.</p>
<h2 id="conclusion">Conclusion</h2>
<p>So how did you do? After going through this list you should be able to measure your own personal confidence in your CI/CD workflow to meet <em>your</em> requirements. Not every question in this list may be (or should be) relevant to you. If you are unsure, do not worry. Sometimes you just need to stop for a second and document what you are currently doing.</p>
<p>The most important thing is to realize where you stand now, and where you want to be.</p>
<p><em>List of questions last updated 28 February 2021</em>.</p>
ARM Templates vs Terraform vs Pulumi - Infrastructure as Code in 2021http://julie.io/writing/arm-terraform-pulumi-infra-as-code/2021-01-26T01:00:00+01:002024-01-20T10:41:06+01:00Julie Ng<!-- After I published my article about [Azure Pipelines and Terraform Best Practices](/writing/terraform-on-azure-pipelines-best-practices/) last week, a friend asked me - what about [Pulumi](https://pulumi.com)? If I chose Terraform for its DSL, wouldn't I love Pulumi's JavaScript code even more? Well it's 2021 and Microsoft and HashiCorp have been watching and copying. Let's take an updated look at Infrastructure as Code in 2021.
{:.lead} -->
<div>
<p class="lead">A few years ago Pulumi introduced code-native programming language for Infrastructure as Code (IaC), bringing it closer to the developer and their existing skillset. Fast-forward to 2021 and Microsoft and HashiCorp are playing catch-up to Pulumi and to each other. To help you choose IaC technology, let’s look at IaC programming languages for short-term developer happiness and code re-use for long-term productivity.</p>
<!-- 1. [Optimizing for Developer Happiness](#optimizing-for-developer-happiness)
1. [Azure Resource Manager (ARM) Templates](#azure-resource-manager-arm-templates)
* [Biggest Pain Point - JSON](#arms-biggest-pain-point---json)
* [Bicep DSL](#arm-bicep-dsl)
1. [Terraform - Human Friendly IaC](#terraform---human-friendly-iac)
* [Terraform in TypeScript and Python - New since 2020](#terraform-in-typescript-and-python---new-since-2020)
1. [Pulumi - Code Native IaC](#pulumi---code-native-iac)
1. [Code Re-Use Comparison](#code-re-use-comparison)
* [ARM Templates - Linked and Painful](#arm-templates---linked-and-painful)
* [Terraform - Re-usable Modules](#terraform---re-usable-modules)
* [Pulumi - Re-usable Packages](#pulumi---re-usable-packages)
1. [Which IaC makes you most happy?](#which-iac-makes-you-most-happy) -->
<p><em>Just want a summary? Watch the Ask Me Anything (AMA) style answer</em></p>
...</div><!-- After I published my article about [Azure Pipelines and Terraform Best Practices](/writing/terraform-on-azure-pipelines-best-practices/) last week, a friend asked me - what about [Pulumi](https://pulumi.com)? If I chose Terraform for its DSL, wouldn't I love Pulumi's JavaScript code even more? Well it's 2021 and Microsoft and HashiCorp have been watching and copying. Let's take an updated look at Infrastructure as Code in 2021.
{:.lead} -->
<div>
<p class="lead">A few years ago Pulumi introduced code-native programming language for Infrastructure as Code (IaC), bringing it closer to the developer and their existing skillset. Fast-forward to 2021 and Microsoft and HashiCorp are playing catch-up to Pulumi and to each other. To help you choose IaC technology, let’s look at IaC programming languages for short-term developer happiness and code re-use for long-term productivity.</p>
<!-- 1. [Optimizing for Developer Happiness](#optimizing-for-developer-happiness)
1. [Azure Resource Manager (ARM) Templates](#azure-resource-manager-arm-templates)
* [Biggest Pain Point - JSON](#arms-biggest-pain-point---json)
* [Bicep DSL](#arm-bicep-dsl)
1. [Terraform - Human Friendly IaC](#terraform---human-friendly-iac)
* [Terraform in TypeScript and Python - New since 2020](#terraform-in-typescript-and-python---new-since-2020)
1. [Pulumi - Code Native IaC](#pulumi---code-native-iac)
1. [Code Re-Use Comparison](#code-re-use-comparison)
* [ARM Templates - Linked and Painful](#arm-templates---linked-and-painful)
* [Terraform - Re-usable Modules](#terraform---re-usable-modules)
* [Pulumi - Re-usable Packages](#pulumi---re-usable-packages)
1. [Which IaC makes you most happy?](#which-iac-makes-you-most-happy) -->
<p><em>Just want a summary? Watch the Ask Me Anything (AMA) style answer</em></p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/KHvVWdqvAvI" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<h3 id="features-comparison-table">Features Comparison Table</h3>
<p>Although I have created a feature comparison table below, I discuss many of the features, but not all of them. This should be a good springboard to help you learn more about each technology.</p>
</div>
<table class="table is-comparison">
<thead>
<tr>
<th style="text-align: left">Feature</th>
<th style="text-align: center">ARM</th>
<th style="text-align: center">Terraform</th>
<th style="text-align: center">Pulumi</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">Language</td>
<td style="text-align: center">JSON + <a href="https://github.com/Azure/bicep">Bicep</a></td>
<td style="text-align: center">HCL/DSL</td>
<td style="text-align: center">Code Native, e.g. JavaScript, Python</td>
</tr>
<tr>
<td style="text-align: left">Languages (in preview)</td>
<td style="text-align: center"><a href="https://github.com/Azure/bicep">Bicep</a> DSL</td>
<td style="text-align: center"><a href="https://www.hashicorp.com/blog/cdk-for-terraform-enabling-python-and-typescript-support">CDK for Terraform</a>, Python and TypeScript Support</td>
<td style="text-align: center">-</td>
</tr>
<tr>
<td style="text-align: left">Clouds</td>
<td style="text-align: center">Azure-only</td>
<td style="text-align: center">Agnostic + on-prem</td>
<td style="text-align: center">Agnostic + on-prem</td>
</tr>
<tr>
<td style="text-align: left">Preview Changes</td>
<td style="text-align: center"><a href="https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/template-deploy-what-if?tabs=azure-powershell"><code>az deployment … what-if</code></a></td>
<td style="text-align: center"><a href="https://www.terraform.io/docs/cli/commands/plan.html"><code>terraform plan</code></a></td>
<td style="text-align: center"><a href="https://www.pulumi.com/docs/reference/cli/pulumi_preview/"><code>pulumi preview</code></a></td>
</tr>
<tr>
<td style="text-align: left">Rollback Changes</td>
<td style="text-align: center"><a href="https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/rollback-on-error">Rollback</a></td>
<td style="text-align: center">Revert code & Re-deploy</td>
<td style="text-align: center">Revert code & Re-deploy</td>
</tr>
<tr>
<td style="text-align: left">Infrastructure Clean Up</td>
<td style="text-align: center">No</td>
<td style="text-align: center"><a href="https://www.terraform.io/docs/cli/commands/destroy.html"><code>terraform destroy</code></a></td>
<td style="text-align: center"><a href="https://www.pulumi.com/docs/reference/cli/pulumi_destroy/"><code>pulumi destroy</code></a></td>
</tr>
<tr>
<td style="text-align: left">Deployment History</td>
<td style="text-align: center"><a href="https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/deployment-history?tabs=azure-portal">Deployment History</a></td>
<td style="text-align: center">SCM + <a href="https://www.hashicorp.com/blog/hashicorp-terraform-cloud-audit-logging-with-splunk">Auditing</a>*</td>
<td style="text-align: center">SCM + <a href="https://www.pulumi.com/docs/intro/console/collaboration/auditing/">Auditing</a>*</td>
</tr>
<tr>
<td style="text-align: left">Code Re-Use</td>
<td style="text-align: center"><a href="https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/linked-templates#linked-template">Hosted JSON URIs</a></td>
<td style="text-align: center"><a href="https://learn.hashicorp.com/collections/terraform/modules">Modules</a> + <a href="https://learn.hashicorp.com/tutorials/terraform/module-private-registry">Registry</a>*</td>
<td style="text-align: center">Code-Native Packages, e.g. npm or pip</td>
</tr>
<tr>
<td style="text-align: left">State Files</td>
<td style="text-align: center">No State File</td>
<td style="text-align: center">Plain-text</td>
<td style="text-align: center">Encrypted</td>
</tr>
</tbody>
</table>
<div>
<p><em>* refers to a premium feature from vendor, i.e. Terraform Cloud or Pulumi Enterprise.</em></p>
<p>Instead I want to focus on optimizing your choice for developer happiness, which is strongly tied with productivity. People choose human friendly Domain Specific Languages (DSL) and Code-Native languages because if they can code faster and deploy more often, they are more productive - and thus more happy.</p>
<p>So let’s do a comparison from these 2 perspectives</p>
<ul>
<li><strong>Happiness Today</strong> - how quickly can I as an engineer work with each technology’s flavor of Infrastructure as Code?</li>
<li><strong>Happiness Tomorrow</strong> - as my application and company grows, how easily can I scale my IaC with re-usable components?</li>
</ul>
<h2 id="arm-templates">ARM Templates</h2>
<p>As a Microsoft engineer, I should point out the major reasons to use Azure Resource Manager (ARM) before I elaborate on why I personally don’t use it:</p>
<ul>
<li>
<p><strong>First Party Support</strong><br />
Because ARM is Azure exclusive, all Azure resources are supported, from the simple resource group to complicated policies and blueprints. And your deployments are most likely to work out of the box <em>as expected</em>.</p>
</li>
<li>
<p><strong>No state file required</strong><br />
ARM Templates queries the APIs directly for current state. So you do not have to worry about securing this state file like with other IaC technologies.</p>
</li>
<li>
<p><strong>Deployment Histories included</strong><br />
<a href="https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/deployment-history?tabs=azure-portal">Deployment history</a> is included out of the box. While you have IaC with your _intended changes_in your git history, Azure can tell you the actual deployed changes.</p>
</li>
</ul>
<h3 id="arm-improvements-in-2021">ARM Improvements in 2021</h3>
<p>The following were gaps in ARM that existed before 2020 and the major reasons I never properly learned it. But Microsoft has caught on to the competitors and are filling the following gaps:</p>
<ul>
<li>
<p><strong>Detect Drift with <code>what-if</code></strong><br />
Last year Microsoft implemented the <code>what-if</code> command, which is the equivalent of <code>terraform plan</code>, which lets you preview infrastructure changes before you deploy. This lets your preview if destructive changes might happen.</p>
</li>
<li>
<p><strong>JSON is for machines</strong><br />
If I want to author infrastructure, I don’t think in JSON, which is why it feels so unnatural. See below for more details, including new DSL <a href="https://github.com/Azure/bicep">Bicep</a>.</p>
</li>
</ul>
<h4 id="arms-biggest-pain-point---json">ARM’s Biggest Pain Point - JSON</h4>
<p>The main reason I don’t use ARM is because I don’t like writing JSON. When I write code I often use comments and the <code>/* */</code> syntax in ARM feels like a cheat. To illustrate, this is an <a href="https://github.com/Azure/bicep/blob/main/docs/examples/101/storage-blob-container/main.json">example ARM Template</a> for an Azure Storage Account:</p>
<pre><code class="language-json">{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"storageAccountName": {
"type": "string"
},
"containerName": {
"type": "string",
"defaultValue": "logs"
},
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]"
}
},
"functions": [],
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"apiVersion": "2019-06-01",
"name": "[parameters('storageAccountName')]",
"location": "[parameters('location')]",
"sku": {
"name": "Standard_LRS",
"tier": "Standard"
},
"kind": "StorageV2",
"properties": {
"accessTier": "Hot"
}
},
{
"type": "Microsoft.Storage/storageAccounts/blobServices/containers",
"apiVersion": "2019-06-01",
"name": "[format('{0}/default/{1}', parameters('storageAccountName'), parameters('containerName'))]",
"dependsOn": [
"[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]"
]
}
]
}
</code></pre>
<p>I’ve been at Microsoft for over 1.5 years and I still can’t write ARM templates. The reality is I will probably skip ARM and instead learn to <em>write</em> Bicep.</p>
<h3 id="arm-bicep-dsl">ARM Bicep DSL</h3>
<p><a href="https://github.com/Azure/bicep">Bicep</a> is a Domain Specific Language or DSL, which compiles to standard ARM template JSON. Looking at this <a href="https://github.com/Azure/bicep/blob/main/docs/examples/101/storage-blob-container/main.bicep">example from the GitHub project repo</a>, you may see similarities to Terraform’s HashiCorp Language DSL:</p>
<pre><code class="language-clike">// Bicep 💪
param storageAccountName string
param containerName string = 'logs'
param location string = resourceGroup().location
resource sa 'Microsoft.Storage/storageAccounts@2019-06-01' = {
name: storageAccountName
location: location
sku: {
name: 'Standard_LRS'
tier: 'Standard'
}
kind: 'StorageV2'
properties: {
accessTier: 'Hot'
}
}
resource container 'Microsoft.Storage/storageAccounts/blobServices/containers@2019-06-01' = {
name: '${sa.name}/default/${containerName}'
}
</code></pre>
<p>Although I personally would prefer <code>storageaccount</code> over <code>sa</code>, I am overall quite excited about Bicep.</p>
<h3 id="arm--bicep-summary---promising-future">ARM & Bicep Summary - Promising Future</h3>
<p>If we can get a DSL like Terraform but also get first party support for Azure features sooner, that could be an IaC game changer for Azure-only workloads. Azure has also filled the preview gap with the <code>az deployment… what-if</code> command which was really missing.</p>
<p>The code re-use strategy with modules is still very experimental. See <a href="https://github.com/Azure/bicep/discussions/1170">this discussion about sharing references across modules</a>. This is a last major gap for me personally before I would consider using Bicep in production.</p>
<p>Everything is still experimental but very promising.</p>
<h2 id="terraform">Terraform</h2>
<p>Terraform is my favorite IaC technology and what I personally use because it’s so human-friendly, cloud-agnostic and solid. These are the major features of Terraform:</p>
<ul>
<li>
<p><strong>HashiCorp Language - Human Friendly DSL</strong><br />
Reading and writing the HCL flows naturally and is a joy to use. More details below.</p>
</li>
<li>
<p><strong>Cloud Agnostic</strong><br />
Although the cloud vendor providers are rather specific, mastering Terraform helps you master IaC for <em>any cloud</em>.</p>
</li>
<li>
<p><strong>Preview Infrastructure Changes</strong><br />
Run <code>terraform plan</code> and check you don’t accidentally blow up your infrastructure. Also use the <code>-detailed-exitcode</code> flag, so you can adjust your CI/CD builds based on whether or not configuration drift was detected..</p>
</li>
<li>
<p><strong>Clean Up Infrastructure</strong><br />
Run <code>terraform destroy</code> and easily remove any infrastructure, great for clean up after an experiment, or for starting over if something breaks beyond repair. This works because Terraform keeps a record of your infrastructure in a state file.</p>
</li>
<li>
<p><strong>Code Re-use with Modules</strong><br />
This is so easy that it’s fun to write modules. The DSL is easy to understand and I can have local and hosted modules, either in git or a Terraform Registry (public or private). This is the deciding factor and most important Terraform advantage over its competitors. See details in last section of this article.</p>
</li>
</ul>
<h3 id="hashicorp-language----terraforms-dsl">HashiCorp Language - Terraform’s DSL</h3>
<p>Ok let’s look at the main reason I chose Terraform - for HashiCorp Language (HCL), the human friendly DSL. This is an <a href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/storage_account">example</a> from the Terraform Documentation:</p>
<pre><code class="language-hcl">resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "West Europe"
}
resource "azurerm_storage_account" "example" {
name = "storageaccountname"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
account_tier = "Standard"
account_replication_type = "GRS"
tags = {
environment = "staging"
}
}
</code></pre>
<p>It’s like reading English. I LOVE it.</p>
<h4 id="disadvantages-vs-arm">(Dis)advantages vs ARM</h4>
<p>These are the most common arguments I hear against Terraform when compared to ARM:</p>
<ul>
<li>
<p><strong>Not every Azure Resource exists outside ARM</strong> <br />
There isn’t a Terraform Provider for every ARM type. Or even if there is, e.g. <a href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/policy_definition">Azure Policy</a>, you’re still just writing ARM JSON <em>inside</em> another language.</p>
</li>
<li>
<p><strong>State File in plain text 🧐</strong><br />
If you create resources with credentials, e.g. a database or create service principals, these secrets are stored in <em>plain text</em> in your Terraform state file.</p>
</li>
</ul>
<p>State files as plain text scares many people. Personally I am less concerned and accept this trade-off because I have confidence in my code quality, CI/CD governance, and security e.g. I use short lived tokens and scoped permissions.</p>
<p>If your security team cannot live with this, <strong>then delete the state file after the resources are created</strong>. No file, no problem 🤷♀️ Some tasks, like creating scoped service principals at scale are so much easier with Terraform because it can talk to both the ARM and the Azure Active Directory API. Create the credentials, immediately throw them in Key Vault and delete the state file afterwards. I’m pragmatic.</p>
<h3 id="terraform-in-typescript-and-python---new-since-2020">Terraform in TypeScript and Python - New since 2020</h3>
<p>In July 2020 HashiCorp introduced <a href="https://www.hashicorp.com/blog/cdk-for-terraform-enabling-python-and-typescript-support">Cloud Development Kit (CDK) for Terraform</a>, which lets you write IaC in code native languages like TypeScript and Python.</p>
<p>This is an <a href="https://github.com/hashicorp/terraform-cdk/blob/master/examples/typescript/azure/main.ts">example</a> from their GitHub repo:</p>
<pre><code class="language-javascript">import { Construct } from 'constructs';
import { App, TerraformStack } from 'cdktf';
import { AzurermProvider, VirtualNetwork } from './.gen/providers/azurerm'
class MyStack extends TerraformStack {
constructor(scope: Construct, name: string) {
super(scope, name);
new AzurermProvider(this, 'AzureRm', {
features: [{}]
})
new VirtualNetwork(this, 'TfVnet', {
location: 'uksouth',
addressSpace: ['10.0.0.0/24'],
name: 'TerraformVNet',
resourceGroupName: '<YOUR_RESOURCE_GROUP_NAME>'
})
}
}
const app = new App();
new MyStack(app, 'typescript-az');
app.synth();
</code></pre>
<p>Because it’s TypeScript, it’s very familiar to JavaScript engineers like myself.</p>
<p>But <strong>I personally prefer HashiCorp Language (HCL) because it is meant for humans</strong>. As a human it is much easier for me to read and scan. It’s like <em>HCL speaks to me</em>, meeting me halfway. Even though I know JavaScript, I still have to read the code entirely.</p>
<p>That is my personal preference. Maybe JavaScript speaks more to you 🤓</p>
<h2 id="pulumi">Pulumi</h2>
<p>And finally we have Pulumi, the new kid on the IaC block who introduced the concept of code-native IaC. Pulumi’s largest value proposition is that engineers don’t have to learn a new programming language.</p>
<p>And looking at this Pulumi <a href="https://www.pulumi.com/docs/reference/pkg/azure/storage/account/">example from their documentation</a>, it looks much cleaner than the CDK for Terraform:</p>
<pre><code class="language-javascript">import * as pulumi from "@pulumi/pulumi";
import * as azure from "@pulumi/azure";
const exampleResourceGroup = new azure.core.ResourceGroup("exampleResourceGroup", {location: "West Europe"});
const exampleAccount = new azure.storage.Account("exampleAccount", {
resourceGroupName: exampleResourceGroup.name,
location: exampleResourceGroup.location,
accountTier: "Standard",
accountReplicationType: "GRS",
tags: {
environment: "staging",
},
});
</code></pre>
<p>It probably looks cleaner because it’s been around longer and Pulumi has had ample time to fine tune its abstraction to make it as close to a friendly DSL as possible. And <strong>this kind of friendly abstraction layers is an art form</strong>. So kudos to Pulumi for achieving this 👌</p>
<h3 id="encrypted-state-file">Encrypted State File</h3>
<p>Like Terraform, Pulumi also uses a state file to keep track of your infrastructure, which helps it do configuration drift detection and clean up resources.</p>
<p>Unlike Terraform, however, Pulumi’s state file is <a href="https://www.pulumi.com/docs/intro/concepts/state/"><em>encrypted</em></a> which is more secure.</p>
<h3 id="give-pulumi-a-chance">Give Pulumi a Chance</h3>
<p>Sorry I am not covering Pulumi further. I don’t use it so I am not going to pretend to be an expert. I did some research because one of my YouTube subscribers asked me to do this comparison. This does not mean I do not recommend Pulumi.</p>
<p>If you are still deciding which IaC technology is right for you, you should also consider Pulumi, especially if you want to write IaC in a code-native programming language like JavaScript, Python, etc.</p>
<h2 id="code-re-use">Code Re-Use</h2>
<p>So now you have had and introduction to the “flavors” of Infrastructure as Code. You may even have a favorite. We can imagine ourselves writing a bit of code. Now let’s imagine scaling that IaC to many environments and applications. How can we leverage code re-use?</p>
<h4 id="arm-template-links">ARM Template Links</h4>
<p>If you want to create a template for re-use you need to <a href="(https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/linked-templates#linked-template)">send a URI</a> to the main template. It is not possible to pass a local file. If if you can send a protected link, you still have to publish it, which makes development and iteration of templates painfully slow.</p>
<p>This is what a <code>templateLink</code> looks like:</p>
<pre><code class="language-json">"resources": [
{
"type": "Microsoft.Resources/deployments",
"apiVersion": "2019-10-01",
"name": "linkedTemplate",
"properties": {
"mode": "Incremental",
"templateLink": { // Painful 😖
"uri": "https://mystorageaccount.blob.core.windows.net/AzureTemplates/newStorageAccount.json",
"contentVersion": "1.0.0.0"
},
"parametersLink": { // Painful 😖
"uri": "https://mystorageaccount.blob.core.windows.net/AzureTemplates/newStorageAccount.parameters.json",
"contentVersion": "1.0.0.0"
}
}
}
]
</code></pre>
<p>And don’t forget to append a SAS token to the URI to access the JSON file… now it’s clear why I don’t use ARM, right?</p>
<h4 id="terraform-modules">Terraform Modules</h4>
<p>As an engineer I need to be able to work with local code when I am initially experimenting or for quick debugging. In Terraform, it’s really easy to create <a href="https://www.terraform.io/docs/language/modules/develop/index.html">modules</a>, which can be local or published to an external <a href="https://registry.terraform.io/">registry</a>.</p>
<pre><code class="language-hcl"># Custom Module example
module "dev_cluster" {
source = "./../aks-cluster"
name = "dev-cluster"
vm_size = "Standard_D2s_v3" # ca. 68 EUR/mo.
ssh_public_key = "~/.ssh/id_rsa.pub"
vnet_address_space = ["10.100.0.0/25"]
aks_subnet_prefixes = ["10.100.0.0/28"]
}
</code></pre>
<p>From the example it is clear how I can re-use infrastructure modules to easily create different deployment environments that vary slightly. For example, I can use the same custom <code>aks-cluster</code> module to create a cluster for production and choose more expensive Virtual Machines.</p>
<p>You can also publish your modules to the <a href="https://registry.terraform.io/">public terraform registry</a> or a <a href="https://www.terraform.io/docs/cloud/registry/index.html">private registry</a> in Terraform Cloud.</p>
<h4 id="pulumi-packages">Pulumi Packages</h4>
<p>Because Pulumi uses code native programming languages, you would leverage the language’s code re-use techniques. For example in JavaScript you create packages that you could publish to a registry as a node module.</p>
<p>This is a piece of example code from this <a href="https://www.pulumi.com/blog/creating-and-reusing-cloud-components-using-package-managers/">Pulumi Blog article</a> describes re-use in detail:</p>
<pre><code class="language-javascript">/**
* Static website using Amazon S3, CloudFront, and Route53.
*/
export declare class StaticWebsite extends pulumi.ComponentResource {
readonly contentBucket: aws.s3.Bucket;
readonly logsBucket: aws.s3.Bucket;
readonly cdn: aws.cloudfront.Distribution;
readonly aRecord?: aws.route53.Record;
constructor(name: string , contentArgs: ContentArgs,
domainArgs?: DomainArgs, opts?: pulumi.ResourceOptions);
}
</code></pre>
<p>Then you could use it like this:</p>
<pre><code class="language-javascript">// If you have publshed it to an NPM registry
import { StaticWebsite } from "static-website-aws";
// OR reference a local file
import { StaticWebsite } from "./static-website-aws";
// Then
const website = new StaticWebsite ("browserhack", {
pathToContent:"./browserhack",
custom404Path:"/404.html",
});
</code></pre>
<h2 id="which-iac-makes-you-most-happy">Which IaC makes you most happy?</h2>
<p>So now you’ve seen how programming Infrastructure as Code in ARM Templates, Terraform and Pulumi compare to each other.</p>
<p>You know my opinions. Which one is your favorite? I’d love to know, especially if you are using Pulumi in production. Let me know via <a href="https://twitter.com/jng5">@jng5</a> on Twitter or on <a href="https://www.youtube.com/watch?v=KHvVWdqvAvI">YouTube</a>.</p>
<!-- ### Further Reading
- ARM Templates
- [Microsoft Docs - ARM Template Best Practices](https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/template-best-practices)
- [Project Bicep, an ARM DSL](https://github.com/Azure/bicep)
- Terraform
- [State](https://www.terraform.io/docs/language/state/index.html)
- [CDK for Terraform: Enabling Python & TypeScript Support](https://www.hashicorp.com/blog/cdk-for-terraform-enabling-python-and-typescript-support)
- Pulumi
- [Pulumi vs Terraform](https://www.pulumi.com/docs/intro/vs/terraform/)
- [State and Backends](https://www.pulumi.com/docs/intro/concepts/state/)
- [Inter-Stack Dependencies](https://www.pulumi.com/docs/intro/concepts/organizing-stacks-projects/#inter-stack-dependencies) -->
</div>
Terraform on Azure Pipelines Best Practiceshttp://julie.io/writing/terraform-on-azure-pipelines-best-practices/2021-01-14T01:00:00+01:002024-01-20T10:41:06+01:00Julie Ng<p class="lead">Azure Pipelines and Terraform make it easy to get started deploying infrastructure from templates. But how do you go from sample code to real life implementation, integrating git workflows with deployments and scaling across across multiple teams? Here are 5 Best Practices to get you started on the right foot.</p>
<p>As an engineer in the Azure Customer Experience (CXP) organization, I advise customers with best practice guidance and technical deep dives for specific use cases. This article is based...</p><p class="lead">Azure Pipelines and Terraform make it easy to get started deploying infrastructure from templates. But how do you go from sample code to real life implementation, integrating git workflows with deployments and scaling across across multiple teams? Here are 5 Best Practices to get you started on the right foot.</p>
<p>As an engineer in the Azure Customer Experience (CXP) organization, I advise customers with best practice guidance and technical deep dives for specific use cases. This article is based both on recurring themes with customers as well as my previous role as an <a href="/who/resume">Enterprise Architect</a> at Allianz Germany when we started our cloud migration in 2016.</p>
<h4 id="five-best-practices">Five Best Practices</h4>
<ol>
<li><a href="#tip-1---use-yaml-pipelines-not-ui">Use YAML Pipelines, not UI</a></li>
<li><a href="#tip-2---use-the-command-line-not-yaml-tasks">Use the Command Line, not YAML Tasks</a></li>
<li><a href="#tip-3---use-terraform-partial-configuration">Use Terraform Partial Configuration</a></li>
<li><a href="#tip-4---authenticate-with-service-principal-credentials-stored-in-azure-key-vault">Authenticate with Service Principal Credentials stored in Azure Key Vault</a></li>
<li><a href="#tip-5-create-a-custom-role-for-terraform">Create a Custom Role for Terraform</a></li>
</ol>
<p><em>TL;DR; Watch this 5 minute summary instead:</em></p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/UaehcmoMAFc" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<h2 id="tip-1---use-yaml-pipelines-not-ui">Tip #1 - Use YAML Pipelines, not UI</h2>
<p>The Azure DevOps service has its roots in <a href="https://docs.microsoft.com/en-us/azure/devops/server/tfs-is-now-azure-devops-server?view=azure-devops-2020">Visual Studio Team Foundation Server</a> and as such it carries legacy features, including Classic Pipelines. If you’re creating new pipelines, do not start with classic pipelines. If you have classic pipelines, plan on migrating them to YAML. Industry best practice is to author <strong>Pipelines as Code</strong> and in Azure Pipelines, that means <a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/yaml-schema?view=azure-devops&tabs=schema%2Cparameter-schema">YAML Pipelines</a>.</p>
<p>If you use Classic Pipelines, do not panic. They will be around for a while. But as you can see from <a href="https://docs.microsoft.com/en-us/azure/devops/release-notes/features-timeline">public features timeline</a> and <a href="https://dev.azure.com/mseng/AzureDevOpsRoadmap/_workitems/recentlyupdated">public road map</a>, Microsoft is investing more in YAML pipelines. To be more future proof, choose YAML pipelines.</p>
<h2 id="tip-2---use-the-command-line-not-yaml-tasks">Tip #2 - Use the Command Line, not YAML Tasks</h2>
<p>I have a love hate relationship with <a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/process/tasks?view=azure-devops&tabs=yaml">Pipeline Tasks</a>. As an abstraction it lowers the barrier to entry. They make tasks platform independent (Windows vs Linux)and pass return codes so you don’t have to handle <code>stderr</code> and <code>stdout</code> by hand. See <a href="https://github.com/microsoft/azure-pipelines-tasks">Source Repo on GitHub</a> for other advantages.</p>
<p>But as the <a href="https://github.com/microsoft/azure-pipelines-tasks">README</a> itself says:</p>
<blockquote>
<p>If you need custom functionality in your build/release, it is usually simpler to use the existing script running tasks such as the PowerShell or Bash tasks.</p>
</blockquote>
<p>And indeed, I find it <strong>simpler</strong> to use plain old CLI commands in Bash. Over time, as you iterate and create tailored pipelines beyond the “Hello World” examples, you may also find that tasks becomes yet another layer to debug. For example, I used the <a href="https://github.com/julie-ng/azure-nodejs-demo/blob/main/azure-pipelines.yml#L201">AzCopy task</a> only to have to wait a few minutes for the pipeline fail because it’s <a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/tasks/deploy/azure-file-copy?view=azure-devops">Windows only</a>.</p>
<h4 id="iterate-faster">Iterate Faster</h4>
<p>If I use the command line, I can figure out exactly which <code>-var</code> and other options I need to pass to <code>terraform</code> to achieve the results I want <em>from my local machine</em> without having to wait minutes for each pipeline job to run to know if it worked or not. Once I am confident in my CLI commands, I can put those in my YAML pipeline.</p>
<h4 id="master-the-technology-not-a-task">Master the Technology not a Task</h4>
<p>In general I recommend every engineer learn how to use a technology from the command line. Do not learn how to use the git extension in your code editor. If you learn something on the command line, be it <a href="https://git-scm.com/">git</a> or <a href="https://www.terraform.io/">terraform</a>, you learn <em>how it works</em>. Debugging will be far less frustrating as you can skip an abstraction layer (the YAML task) that does not necessarily make your life easier.</p>
<p>For example, I prefer to skip this verbose format found in this example from the <a href="https://docs.microsoft.com/en-us/azure/developer/terraform/best-practices-integration-testing">Azure documentation</a></p>
<pre><code class="language-yaml"># Verbose 😑
- task: charleszipp.azure-pipelines-tasks-terraform.azure-pipelines-tasks-terraform-cli.TerraformCLI@0
displayName: 'Run terraform plan'
inputs:
command: plan
workingDirectory: $(terraformWorkingDirectory)
environmentServiceName: $(serviceConnection)
commandOptions: -var location=$(azureLocation)
</code></pre>
<p>You can do the same using Bash and just pass the flags, e.g. <code>-var</code> or <code>-out</code> as is.</p>
<pre><code class="language-yaml"># Less noise 👌
- bash: terraform plan -out deployment.tfplan
displayName: Terraform Plan (ignores drift)
</code></pre>
<p>Because I do not use tasks, I never need to look up in more documentation what <code>environmentServiceName</code> and other attributes do and expect. I only ever need to know Terraform, which lets me focus on <em>my</em> code instead of debugging a dependency - even if it’s provided by Microsoft.</p>
<h4 id="do-not-install-terraform---keep-up-with-latest">Do not install Terraform - keep up with “latest”</h4>
<p>There are many Azure Pipeline samples out there with “installer” tasks, including official examples. While dependency versioning is important, I find Terraform to be one of the more stable technologies that rarely have breaking changes. Before you lock yourself down to a version, consider always running with the latest version. In generally it’s easier to make incremental changes and fixes than to have giant refactors later that block feature development.</p>
<p>You can see which version is installed on the Microsoft hosted build agents on GitHub, e.g. <a href="https://github.com/actions/virtual-environments/blob/main/images/linux/Ubuntu1804-README.md">Ubuntu 18.04</a>. Note these build agents are used both by Azure Pipelines <strong>and</strong> <a href="https://github.com/features/actions">GitHub Actions</a>.</p>
<h4 id="cli-is-vendor-agnostic">CLI is vendor agnostic</h4>
<p>This preference for CLI mastery over YAML tasks is not Terraform specific. If you browse through <a href="http://github.com/julie-ng/">my various demos on GitHub</a>, I usually prefer <a href="https://www.docker.com/">Docker</a> and <a href="https://nodejs.org/en/">Node.js</a> on the command line over the equivalent YAML tasks.</p>
<p>The industry is fast-paced. Using the CLI also makes your migration path to new vendors easier. If in the future, when <a href="https://github.com/features/actions">GitHub Actions</a> have matured and you want to migrate from Azure Pipelines, you would not need to migrate the YAML task abstraction layer. Use the CLI and make your future life easier.</p>
<h4 id="but-then-how-do-i-authenticate-to-azure">But Then How do I Authenticate to Azure?</h4>
<p>That is a common question I get from customers. Keep reading. This is in the last section of this article which also discusses secret management in pipelines.</p>
<h2 id="tip-3---use-terraform-partial-configuration">Tip #3 - Use Terraform Partial Configuration</h2>
<p>This topic deserves its own article. But I will mention the most important points. You will need a state file when collaborating with other engineers or deploying from a headless build agent.</p>
<h4 id="start-with-local-state">Start with Local State</h4>
<p>If you don’t know how your infrastructure <em>should</em> look, experiment locally i.e. don’t use a remote backend to avoid CI/CD wait time of minutes.</p>
<p>As you try things out, you will probably break things. At this phase instead of trying to fix it, I will just tear everything down do a <code>rm -rf .terraform</code> and start over.</p>
<p>Once your infrastructure architecture is stable, proceed to create a remote state file.</p>
<h4 id="create-a-storage-account-for-your-state-file">Create a Storage Account for your State File</h4>
<p>Terraform needs an Azure Blob Storage account. ProTip - create the Storage Account by hand using the Azure CLI:</p>
<pre><code class="language-bash">$ az storage account create \
--name mystorageaccountname \
--resource-group myresourcegroupname \
--kind StorageV2 \
--sku Standard_LRS \
--https-only true \
--allow-blob-public-access false
</code></pre>
<p>Because Terraform state files store everything including secrets in clear text, take extra precaution in securing it. Confirm that you have <strong>disabled public blob access</strong>.</p>
<p>Please do not rely on a pipeline task to create the account for you! There is a task that does this, but the storage account is configured to allow public access to blob files by default. The individual state files themselves are secured. But the <em>defaults are not secure</em>, which is a security risk waiting to happen.</p>
<h4 id="do-not-use-default-configuration">Do not use Default Configuration</h4>
<p>When using a remote backend, you need to tell Terraform where the state file is. Examples Configuration from the <a href="https://www.terraform.io/docs/backends/types/azurerm.html">official documentation</a> look like this:</p>
<pre><code class="language-hcl"># Don't do this
terraform {
backend "azurerm" {
resource_group_name = "StorageAccount-ResourceGroup"
storage_account_name = "abcd1234"
container_name = "tfstate"
key = "prod.terraform.tfstate"
# Definitely don't do this!
access_key = "…"
}
}
</code></pre>
<h4 id="use-partial-configuration">Use Partial Configuration</h4>
<p>Further in the documentation Terraform recommends moving out those properties and using <a href="https://www.terraform.io/docs/backends/config.html#partial-configuration">Partial Configuration</a>:</p>
<pre><code class="language-hcl"># This is better
terraform {
backend "azurerm" {
}
}
</code></pre>
<h4 id="create-and-ignore-the-backend-configuration-file">Create and Ignore the Backend Configuration File</h4>
<p>Instead of using Azure Storage Account Access Keys, I use short-lived <a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview">Shared Access Signature (SAS) Tokens</a>. So I create a local <code>azure.conf</code> file that looks like this:</p>
<pre><code class="language-hcl"># azure.conf, must be in .gitignore
storage_account_name="azurestorageaccountname"
container_name="storagecontainername"
key="project.tfstate"
sas_token="?sv=2019-12-12…"
</code></pre>
<p>Triple check that your <code>azure.conf</code> is added to the <code>.gitignore</code> file so that it is not checked into your code repository.</p>
<h4 id="its-ok-to-use-a-file-in-local-development">It’s OK to use a File in Local Development</h4>
<p>On my local machine, I initialize Terraform by passing whole configuration file.</p>
<pre><code class="language-bash">$ terraform init -backend-config=azure.conf
</code></pre>
<p>Side note: one of the reasons I use SAS tokens is that I usually only need to work with the remote state file in a project’s initial phase. Instead of leaving an access key lying around, I have a an expired SAS token on my local machine.</p>
<h4 id="your-configuration-should-not-be-a-tfvars-file">Your configuration should NOT be a .tfvars File</h4>
<p>Variables in with a <code>.tfvars</code> extension are automatically loaded, which is an accident waiting to happen. This is how people unintentionally check credentials into git. Don’t be that person or company. Add a little bit of friction and use the <code>-backend-config=azure.conf</code> CLI option.</p>
<p>You can also give the file a <code>.hcl</code> extension for your editor to do syntax highlighting. I use <code>.conf</code> as a convention to signal a warning that this file may contain sensitive information and should be protected.</p>
<h4 id="use-key-value-pairs-in-cicd-builds">Use Key Value Pairs in CI/CD Builds</h4>
<p>Personally I do not use <a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/library/secure-files?view=azure-devops">Secure File</a>s in Azure Pipelines because I don’t want to have my credentials in <em>yet another place I have to find and debug</em>. To solve the first problem, I use Key Vault (keep reading).</p>
<p>To solve the second problem I pass the configuration as individual variables to the <code>terraform init</code> command:</p>
<pre><code class="language-bash">$ terraform init \
-backend-config="storage_account_name=$TF_STATE_BLOB_ACCOUNT_NAME" \
-backend-config="container_name=$TF_STATE_BLOB_CONTAINER_NAME" \
-backend-config="key=$TF_STATE_BLOB_FILE" \
-backend-config="sas_token=$TF_STATE_BLOB_SAS_TOKEN"
</code></pre>
<p>If you are not using SAS Tokens, you can pass the Storage Account Access Key with <code>-backend-config="access_key=…"</code></p>
<p>By using key value pairs, I am being explicit, forcing myself to do sanity checks at every step and increasing traceability. Your future self will thank you. Note also that my variables are named with the <code>TF_</code> prefix to help with debugging.</p>
<p>So the complete step in YAML looks like this</p>
<pre><code class="language-yaml"># Load secrets from Key Vault
variables:
- group: e2e-gov-demo-kv
# Initialize with explicitly mapped secrets
steps:
- bash: |
terraform init \
-backend-config="storage_account_name=$TF_STATE_BLOB_ACCOUNT_NAME" \
-backend-config="container_name=$TF_STATE_BLOB_CONTAINER_NAME" \
-backend-config="key=$TF_STATE_BLOB_FILE" \
-backend-config="sas_token=$TF_STATE_BLOB_SAS_TOKEN"
displayName: Terraform Init
env:
TF_STATE_BLOB_ACCOUNT_NAME: $(kv-tf-state-blob-account)
TF_STATE_BLOB_CONTAINER_NAME: $(kv-tf-state-blob-container)
TF_STATE_BLOB_FILE: $(kv-tf-state-blob-file)
TF_STATE_BLOB_SAS_TOKEN: $(kv-tf-state-sas-token)
</code></pre>
<p>Continue reading to learn how the Key Vault integration works. We will also use this strategy to authenticate to Azure to manage our infrastructure.</p>
<h2 id="tip-4---authenticate-with-service-principal-credentials-stored-in-azure-key-vault">Tip #4 - Authenticate with Service Principal Credentials stored in Azure Key Vault</h2>
<p>We often celebrate when we finally have something working on our local machine. Unfortunately it may be too soon to party. Moving those same steps to automation pipelines requires more effort that conceptually is sometimes difficult to understand.</p>
<h4 id="why-does-az-login-not-work-in-cicd">Why does <code>az login</code> not work in CI/CD?</h4>
<p>In short, it does not work because a build agent is headless. It is not a human. It cannot interact with Terraform (or Azure for that matter) in an interactive way. Some customers try to authenticate via the CLI and ask me how to get the headless agent past Multi-factor Authentication (MFA) that their organization has in place. That is exactly why we will not use the Azure CLI to login. As the <a href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/azure_cli">Terraform Documentation</a> explains</p>
<blockquote>
<p>We recommend using either a Service Principal or Managed Service Identity when running Terraform non-interactively (such as when running Terraform in a CI server) - and authenticating using the Azure CLI when running Terraform locally.</p>
</blockquote>
<p>So we will authenticate to the Azure Resource Manager API by setting our service principal’s <strong>client secret</strong> as environment variables:</p>
<pre><code class="language-yaml">
- bash: terraform apply -auto-approve deployment.tfplan
displayName: Terraform Apply
env:
ARM_SUBSCRIPTION_ID: $(kv-arm-subscription-id)
ARM_CLIENT_ID: $(kv-arm-client-id)
ARM_CLIENT_SECRET: $(kv-arm-client-secret)
ARM_TENANT_ID: $(kv-arm-tenant-id)
</code></pre>
<p>The names of the environment variables, e.g. <code>ARM_CLIENT_ID</code> are found in this <a href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/service_principal_client_secret#configuring-the-service-principal-in-terraform">Terraform Documentation</a>. Some of you might be thinking, are environment variables secure? Yes. By the way the official Azure CLI Task is doing the same thing if you examine <a href="https://github.com/microsoft/azure-pipelines-tasks/blob/master/Tasks/AzureCLIV2/azureclitask.ts#L43">line 43</a> in the task source code.</p>
<p>To be clear we authenticate headless build agents by setting client IDs and secrets as environment variables, which is common practice. The best practice part involves <em>securing</em> these secrets.</p>
<h4 id="double-check-you-are-using-pipeline-secrets">Double Check You are Using Pipeline Secrets</h4>
<p>In Azure Pipelines having credentials in your environment however is only secure if you mark your pipeline variables as secrets, which ensures:</p>
<ul>
<li>The variable is encrypted at rest</li>
<li>Azure Pipelines will mask values with <code>***</code> (on a best effort basis).</li>
</ul>
<figure class="figure-center">
<img src="/assets/images/2021/az-pipelines-secrets.png" alt="Use Secrets in Azure Pipelines" class="has-border" />
<figcaption>
Look for the lock icon to ensure you've marked your variables as secrets
</figcaption>
</figure>
<p>If you switch to plain text, the secret does not appear. Instead you need to reset it.</p>
<p>The caveat to using secrets is that you have to <a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#secret-variables"><em>explicitly</em> map every secret to an environment variable</a>, <em>at every pipeline step</em>. It may be tedious, but it is intentional and makes the security implications clear. It is also like performing a small security review every time you deploy. These reviews have the same purpose as the checklists that have been <a href="https://www.hsph.harvard.edu/news/magazine/fall08checklist">scientifically shown to save lives</a>. Be explicit to be secure.</p>
<h4 id="go-further---key-vault-integration">Go Further - Key Vault Integration</h4>
<p>Ensuring you are using Pipeline Secrets may be good enough. If you want to go a step further, I recommend <a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#secret-variables">integrating Key Vault via secret variables</a> - not a YAML task.</p>
<figure class="figure-center">
<img src="/assets/images/2021/az-pipelines-kv-1.png" alt="Use Secrets in Azure Pipelines" class="has-border" />
<figcaption>
Use the "Link secrets…" toggle to integrate Key Vault.
</figcaption>
</figure>
<p>Note “Azure subscription” here refers to a service connection. I use the name <code>msdn-sub-reader-sp-e2e-governance-demo</code> to indicate that the service principal under the hood only has <strong>read-only</strong> access to my Azure Resources.</p>
<p>These are reasons large companies and enterprises may choose this route:</p>
<ul>
<li>
<p><strong>Re-use secrets</strong> across Azure DevOps projects <em>and</em> Azure DevOps organizations. You can only share Service Connections across projects.</p>
</li>
<li>
<p><strong>Stronger security</strong> with Azure Key Vault. Together with the proper service principal permissions and Key Vault access policy, it becomes impossible to change or delete a secret from Azure DevOps.</p>
</li>
<li>
<p><strong>Scalable secret rotation</strong>. I prefer short-lived tokens over long-lived credentials. Because Azure Pipelines fetches secrets at start of build run-time, they are always up to date. If I regularly rotate credentials, I only need to change them in 1 place: Key Vault.</p>
</li>
<li>
<p><strong>Reduced attack surface</strong>. If I put the credential in Key Vault, the client secret to my service principal is stored <em>only</em> in 2 places: A) Azure Active Directory where it lives and B) Azure Key Vault.</p>
<p>If I use a Service Connection, I have increased my attack surface to 3 locations. Putting on my former Enterprise Architect hat… I trust Azure DevOps as a managed service to guard my secrets. However, as an organization we can accidentally compromise them when someone (mis)configures the permissions.</p>
</li>
</ul>
<p>ProTip - the variables above are all prefixed with <code>kv-</code> which is a naming convention I use to indicate those values are stored in Key Vault.</p>
<h2 id="tip-5-create-a-custom-role-for-terraform">Tip #5 Create a Custom Role for Terraform</h2>
<p>Security and <a href="https://docs.microsoft.com/en-us/azure/role-based-access-control/best-practices">RBAC best practice</a> is to grant <em>only as much access as necessary</em> to minimize risk. So which Azure role do we assign the Service Principal used by Terraform? <a href="https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#owner">Owner</a> or <a href="https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#contributor">Contributor</a>?</p>
<p>Neither. Because we are deploying infrastructure, we will probably also need to set permissions, for example create a <a href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/key_vault_access_policy">Key Vault Access Policy</a>, which requires elevated permissions. To see which permissions Contributors lack we can run this Azure CLI command:</p>
<pre><code class="language-bash">az role definition list \
--name "Contributor" \
--output json \
--query '[].{actions:permissions[0].actions, notActions:permissions[0].notActions}'
</code></pre>
<p>which will output the following:</p>
<pre><code class="language-json">[
{
"actions": [
"*"
],
"notActions": [
"Microsoft.Authorization/*/Delete",
"Microsoft.Authorization/*/Write",
"Microsoft.Authorization/elevateAccess/Action",
"Microsoft.Blueprint/blueprintAssignments/write",
"Microsoft.Blueprint/blueprintAssignments/delete"
]
}
]
</code></pre>
<p>To create a Key Vault Access Policy, our service principal will need <code>"Microsoft.Authorization/*/Write"</code> permissions. The easiest solution is to give the service principal the Owner role. But this is the equivalent of God mode.</p>
<h4 id="consequences-of-delete">Consequences of Delete</h4>
<p>There are fine but important differences not just for large enterprises but also compliant industries. So if you’re a small Fintech startup, this applies to you too. Some data cannot be deleted by law, e.g. financial data needed for tax audits. Because of the severity and legal consequences of losing such data, it is a common cloud practice to apply <a href="https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/lock-resources">management locks</a> on a resource to prevent it from being deleted.</p>
<p>We still want Terraform to create and manage our infrastructure, so we grant it <code>Write</code> permissions. But we will not grant the <code>Delete</code> permissions because:</p>
<ul>
<li>
<p>Automation is powerful. And with great power comes great responsibility, which we don’t want to grant a headless (and therefore brainless) build agent.</p>
</li>
<li>
<p>It’s important to understand that git (even with signed commits) gives <em>technical</em> traceability, but in your organization that might not satisfy requirements for <em>legal</em> audit-ability.</p>
</li>
</ul>
<p>So even if you have secured your workflow with Pull Requests and protected branches, it may not be enough. Therefore, we will move the <code>Delete</code> action from the git layer to the cloud management layer, i.e. Azure for audit-ability, using management locks.</p>
<p>So <a href="https://docs.microsoft.com/en-us/azure/role-based-access-control/custom-roles">create a custom role</a> and make sure you have the following <code>notActions</code>:</p>
<pre><code class="language-json">{
"notActions": [
"Microsoft.Authorization/*/Delete"
]
}
</code></pre>
<p>The code does not specify <a href="https://docs.microsoft.com/en-us/azure/governance/blueprints/overview">Azure Blueprints</a>. Use the same reasoning above to determine if in your use case, you need access and when to restrict it.</p>
<h2 id="summary">Summary</h2>
<p>In this long guide we covered a few general Azure Pipeline Best Practices to use Pipelines as Code (YAML) and to use the command line, which helps you master Terraform and any other technology. We also walked through how to properly secure you state file and authenticate with Azure, covering common gotchas. Finally the last two topics of Key Vault integration and creating a custom role for Terraform.</p>
<p><strong>If there is too much security in this article for you, that’s okay.</strong> Do not implement every practice at the same time. Practice one at a time. And over time, at least months, security best practices become second nature.</p>
<p>This article focused specifically on Best Practices when using Azure Pipelines. Stay tuned for another article on generic best practices, where I explain how to use git workflows and manage infrastructure across environments.</p>
Creating Monorepo Pipelines in Azure DevOpshttp://julie.io/writing/monorepo-pipelines-in-azure-devops/2020-03-25T01:00:00+01:002024-01-20T10:41:06+01:00Julie Ng<div class="wrap">
<p class="lead">Although uncommon, there are valid reasons to have a monorepo - a single git repository for multiple projects, for example migration projects. Until yesterday, I thought this was not possible in Azure DevOps.</p>
</div>
<div class="wrap">
<p>A colleague informed me it’s possible to rename the file something <em>other</em> than <code>/azure-pipelines.yml</code>. From there I figured out how to accomplish create multiple Azure DevOps YAML pipelines in a monorepo.</p>
<p>In this tutorial, you will learn how to:</p>
<ul>
<li>setup a root pipeline</li>
...</ul>
</div><div class="wrap">
<p class="lead">Although uncommon, there are valid reasons to have a monorepo - a single git repository for multiple projects, for example migration projects. Until yesterday, I thought this was not possible in Azure DevOps.</p>
</div>
<div class="wrap">
<p>A colleague informed me it’s possible to rename the file something <em>other</em> than <code>/azure-pipelines.yml</code>. From there I figured out how to accomplish create multiple Azure DevOps YAML pipelines in a monorepo.</p>
<p>In this tutorial, you will learn how to:</p>
<ul>
<li>setup a root pipeline</li>
<li>setup 2 pipelines in subfolders and triggered by changes <em>in those folders.</em></li>
<li>rename pipelines in DevOps UI</li>
<li>use triggers</li>
<li>change working directories</li>
</ul>
<p>The full example is available here<br />
<a href="https://github.com/julie-ng/azure-devops-monorepo">https://github.com/julie-ng/azure-devops-monorepo →</a></p>
</div>
<p class="article-image"><img src="/assets/images/2020/devops-monorepo-goal.png" alt="" /></p>
<p class="article-photo-source">Note: these commits were pushed separately to generate distinct “Last run”s.</p>
<div class="wrap">
<h2 id="i-project-structure-and-yaml-files">I. Project Structure and YAML files</h2>
<p>Let’s imagine we have the following setup:</p>
<pre class="lang-tree"><code> .
├── README.md
├── azure-pipelines.yml
├── service-a
|── azure-pipelines-a.yml
│ └── …
└── service-b
|── azure-pipelines-b.yml
└── …
</code></pre>
<p>In a standard Azure DevOps project, you have a single <code>azure-pipelines.yml</code> file in your project root folder. In our project, we will have 3 different pipeline files:</p>
<ul>
<li>azure-pipelines.yml</li>
<li>service-a/azure-pipelines-a.yml</li>
<li>service-b/azure-pipelines-b.yml</li>
</ul>
<p>These files will be very similar to your standard YAML pipelines, with two small exceptions: triggers and working directories. We’ll cover those later. First we will add the pipelines.</p>
<h3 id="step-1---add-the-pipelines">Step 1 - Add the Pipelines</h3>
<p>When you create a new DevOps pipeline, select the repository and on the “Configure your pipeline” page, select <strong>“Existing Azure Pipelines YAML file”</strong>, which will open up this overlay on the right:</p>
<p class="article-image"><img src="/assets/images/2020/devops-yaml-file-path.png" alt="Choose existing YAML path" style="max-width:450px" /></p>
<p>You want to go through this process 3 times, each time selecting a different YAML file. In the image above, I have chosen <code>/a/azure-pipelines.yml</code>, which is original filename before I renamed it later.</p>
<h3 id="step-2---rename-your-pipelines">Step 2 - Rename your pipelines</h3>
<p>By default, Azure DevOps names your pipelines per GitHub user/org and repository name, so you will end up with 3 pipelines named similar to this:</p>
<ul>
<li>julie-ng.azure-devops-monorepo</li>
<li>julie-ng.azure-devops-monorepo (1)</li>
<li>julie-ng.azure-devops-monorepo (2)</li>
</ul>
<p>Not very helpful. Find <img src="/assets/images/2020/devops-three-dots.png" alt=""3 Dots" More Options Button" class="inline" /> more options button and select <strong>“Rename/move”</strong>.</p>
<p class="article-image"><img src="/assets/images/2020/devops-rename-pipeline.png" alt="Rename your pipeline" style="max-width:250px" /></p>
<p>I’ve chosen the following names:</p>
<ul>
<li>azure-devops-monorepo (root)</li>
<li>azure-devops-monorepo (Service A)</li>
<li>azure-devops-monorepo (Service B)</li>
</ul>
<h2 id="ii-triggers-and-how-this-works">II. Triggers and how this works</h2>
<p>Normally pipeline runs when <em>anything</em> changes. These three pipelines are defined so they only build when their respective files change. We accomplish this with <em>trigger path</em> definitions.</p>
<h4 id="root-project-must-exclude-paths">Root Project must <code>exclude</code> paths</h4>
<p>Note the root project <em>excludes</em> our subdirectories. This means that a change to <code>service-a/readme.md</code> will not trigger our root pipeline.</p>
<pre class="lang-yaml"><code>trigger:
paths:
exclude: # Exclude!
- 'service-a/*'
- 'service-b/*'
</code></pre>
<h4 id="sub-projects-must-include-paths">Sub-projects must <code>include</code> paths</h4>
<p>We have two sub-projects with their own pipelines. We have to adjust each appropriately so it only runs when the sub-project’s code changes:</p>
<pre class="lang-yaml"><code>trigger:
paths:
include: # Include!
- 'service-a/*' # or 'service-b/*'
</code></pre>
<p>Now you have your multi-pipeline monorepo setup! But you are not finished. There are reasons why the monorepo setup is not common. While it is acceptable to choose this path, you should understand the disadvantages and caveats.</p>
<h2 id="iii-caveats">III. Caveats</h2>
<h3 id="be-aware-of-other-triggers">Be Aware of Other Triggers</h3>
<p>There are many reasons for a pipeline to be built. In fact, the <a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/build/triggers?view=azure-devops&tabs=yaml">official docs</a> name <em>four</em> different types of events that can trigger build pipelines:</p>
<table class="has-border">
<thead>
<tr>
<th style="text-align: left">Trigger</th>
<th style="text-align: left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">CI triggers</td>
<td style="text-align: left">Git Push</td>
</tr>
<tr>
<td style="text-align: left">PR triggers</td>
<td style="text-align: left">Pull Requests</td>
</tr>
<tr>
<td style="text-align: left">Scheduled triggers</td>
<td style="text-align: left">Schedules defined in Cron format</td>
</tr>
<tr>
<td style="text-align: left">Pipeline triggers</td>
<td style="text-align: left">Pipelines can call each other</td>
</tr>
<tr>
<td style="text-align: left">Manual</td>
<td style="text-align: left">A human clicks a button</td>
</tr>
</tbody>
</table>
<p>I add manual runs to make it 5. Although we are limiting path triggers to our subfolder, the <em>when</em> is partially determined by external factors. This means that a pull request that conceptually only affects service B may unintentionally trigger a build of service A.</p>
<p>If you are used to committing and pushing incomplete changes, <strong>you may have an unusual number of broken builds</strong>. A common symptom is seeing multiple commits in a row that start with “update….” This danger also applies to the separate repo use case. But in a monorepo case it is made worse. The danger here is that a developer or team gets used to red or broken builds and stop reacting to them. So it’s important to be disciplined <em>across your entire team</em> when committing and pushing your changes.</p>
<p>I haven’t tried it. But theoretically you should be able to separate schedules for the pipelines.</p>
<h3 id="building-a-or-b-or-both">Building A or B or both?</h3>
<p>Let’s say you have a commit history that looks like this:</p>
<pre class="lang-text"><code>0883cf8 b: change number 3 (47 minutes ago) <=== git push
2896d9c a: change number 5 (49 minutes ago)
3fa6757 root: add newlines to readme (49 minutes ago)
</code></pre>
<p>First off, <em>both</em> pipeline A and pipeline B will run <em>and</em> they will run with the files from the working tree at <code>0883cf8</code>.</p>
<p>In this example, a developer first made changes to service A and then later to service B. Because the changes were pushed together, the <code>azure-pipelines-a.yml</code> pipeline runs with files <strong>not</strong> from <code>2896d9c</code> but from the future 🤯.</p>
<p>This means if you <em>actually</em> have dependencies outside of that include path in the <code>triggers:</code> property, you may experience unexpected build results. It seems unlikely in our example. But what if you had such a project structure?</p>
<pre class="lang-tree"><code> .
├── service-a
| |── pipeline-a.yml
│ └── …
├── service-b
| |── pipeline-b.yml
│ └── …
└── common-components
|── pipeline-c.yml
└── …
</code></pre>
<p>Then you would be more concerned. This is a trade-off that comes with monorepos. Builds may be accidentally triggered and you should prepare for that. If you’re working in teams, make sure it’s very transparent what everyone is working on.</p>
<h3 id="keep-your-working-directory-in-mind">Keep your Working Directory in Mind</h3>
<p>To illustrate this caveat, service B is a Node.js project. Although our YAML file for service B sits in the correct subfolder, the working directory will still be the root. If you try to run <code>npm install</code> without changing directories, it will fail because there is no <code>package.json</code> in the root.</p>
<p>We can change this by using the <code>workingDirectory</code> key in the YAML:</p>
<pre class="lang-yaml"><code>- script: npm install
workingDirectory: service-b/
</code></pre>
<p>Unfortunately <code>workingDirectory</code> is only available under <code>steps:</code>, which means you cannot set it once on the whole pipeline, but rather for every task, script, etc. You can make this less painful by using a variable <a href="https://github.com/julie-ng/azure-devops-monorepo/blob/master/service-b/azure-pipelines-b.yml#L4">like in my code sample</a>. See the <a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/yaml-schema?view=azure-devops&tabs=schema%2Cparameter-schema#script">official docs: YAML Reference</a> for details and further limitations in the YAML syntax.</p>
<h2 id="iv-conclusion">IV. Conclusion</h2>
<p>If you have good reason to use a monorepo and want to setup multiple Azure DevOps pipelines, you can. But remember that you lose some sense of control over <em>when</em> and <em>what</em> you are building in your CI pipeline. So if you march down this path, over-communicate within your team, keep your commits squeaky clean, and carry on.</p>
</div>