Exploring a simple deployment pipeline with Terraform

This set of notes is a set of mental notes on working with terraform. The goal here was to take an existing maxscale cluster and be able to add/remove mariadb instances running on docker containers. The focus was to explore how to build a system around terraform, and have an idea of what should, and what should not be done within it.

The Problem

Assuming we already have a basic set-up, we might want to add, or remove new servers:

root@linuxpc:~# maxctrl list servers
┌─────────┬──────────────┬──────┬─────────────┬─────────────────┬───────────┬─────────────────┐
│ Server  │ Address      │ Port │ Connections │ State           │ GTID      │ Monitor         │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────┼─────────────────┤
│ server1 │ 192.168.2.37 │ 3306 │ 0           │ Master, Running │ 0-1-61043 │ MariaDB-Monitor │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────┼─────────────────┤
│ server2 │ 192.168.2.99 │ 3306 │ 0           │ Slave, Running  │ 0-1-61043 │ MariaDB-Monitor │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────┼─────────────────┤
│ server3 │ 192.168.2.98 │ 3306 │ 0           │ Slave, Running  │ 0-1-61043 │ MariaDB-Monitor │
└─────────┴──────────────┴──────┴─────────────┴─────────────────┴───────────┴─────────────────┘

We want to make sure that terraform itself will not try to interfere with these existing servers.

This means that we will need a way to tell terraform about “where to start” based on the output of maxscale, and create the server variables dynamically for terraform.

Operational Workflow Plan

Have a simple command to trigger the workflow
Discover existing servers
Compute a boundary, to determine which server number is our last server
generate a locals.tf in case we plan to add new servers
we should have an option for terraform apply to add new servers, and an option for terraform destroy to remove them
in either case, we need to update maxscale config
Running terraform apply twice with different params should not destroy any resources, this requires us to identify created resources/servers with batching

The intended layout of the project is:

project/
├── Makefile
├── config.mk
├── scripts/
│    ├── new_batch.sh
│    ├── deploy_to_new_batch.sh
│    ├── destroy_in_batch.sh
│    ├── delete_batch_dir.sh
│    └── list_batches.sh
├── batches/ (generated)
└── terraform-template/
     ├── awk_scripts/
     │    ├── service_discovery.awk
     │    ├── port_check.awk
     │    ├── create_maxscale.awk
     │    └── destroy_maxscale.awk
     ├── docker/
     │    ├── Dockerfile
     │    ├── entrypoint.sh
     │    └── my.cnf
     ├── dump/
     │    └── *.sql (generated)
     ├── create_maxscale.sh
     ├── destroy_maxscale.sh
     ├── init.sh
     ├── Makefile
     ├── config.mk
     ├── boundary.mk (generated)
     ├── boundary.txt (generated)
     ├── created_servers.txt (generated)
     ├── main.tf
     ├── locals.tf (generated)
     └── misc files generated by terraform like the terraform state file

By batching, what is meant is that the top-level Makefile acts as an orchestrator to allow running terraform in distinct directories located under batches/. This way, it is possible to create resources, and then use terraform again to create new resources, without terraform wanting to delete the existing resources, as they are in separate state files, managed separately.

Before we talk further about batching, it is good to walk through what happens during a single batch/simple usage of terraform.

Workflow of a single batch

Since the tasks have a lot of distinct moving parts to support terraform, such as taking and copying database dumps, copying dockerfiles, creating docker images, inspecting maxscale, modifying maxscale configurations, and restarting maxscale, it is imperative to have a way to easily tie commands and scripts together, and orchestrate execution. As a result, Makefile is to be used for this orchestration. It’s mostly helpful in avoiding the creation of some overcomplicated bash spaghetti.

The other advantage is that we can keep all the important variables across the project easily configurable within config.mk. There are only a couple:

root@linuxpc:/opt/terraform/project# cat config.mk.example
ROOT_SRC=$(CURDIR)
TARGET_SRV="{{IP OF SERVER WHERE DOCKER CONTAINERS SHOULD BE CREATED}}"
REMOTE_DOCKER_DIR={{/path/to/directory/with/dockerfile}}
REMOTE_DOCKER_SCRIPTS_DIR=$(REMOTE_DOCKER_DIR)/docker-entrypoint-initdb.d
DOCKER_BUILD="{{docker_image:tag}}"

We will go through explaining each of the steps one by one, and loop back to the orchestration at the end.

Deploying MariaDB with Docker

For the sake of simplicity, the deployment of extra MariaDB servers will be done by spinning up docker containers on server2/192.168.2.99. To make things easier to organize, I’ve put everything that is strictly docker related into a docker directory.

The Dockerfile can be quite barebones, we just need some starting point and install mariadb:

FROM debian:12-slim

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update \
 && apt-get install -y --no-install-recommends \
      mariadb-server mariadb-client \
      ca-certificates tzdata \
 && rm -rf /var/lib/apt/lists/*

COPY docker-entrypoint-initdb.d/ /docker-entrypoint-initdb.d/

COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh

VOLUME ["/var/lib/mysql"]

ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["mariadbd"]

In the entrypoint.sh script, we will check if this is the first time the docker container is about to be started, and if it is, we will import some sql files, like the dump from master, set up replication with gtid position from master, and apply a new my.cnf before startup, to allocate a unique server-id to the mariadb in the container:

#!/bin/bash

MARIADB_PORT=${MARIADB_PORT:-3311}
MARIADB_SERVER_ID=${MARIADB_SERVER_ID:-4}
DATADIR="/var/lib/mysql"
RUNDIR="/run/mysqld"
LOCK_FILE=/var/lib/mysql/setup.lock

if ! [ -f "$LOCK_FILE" ]; then
#Don't do all this in case of an accidental docker restart
    sed -E -i "s/(server-id=)[0-9]+/\1${MARIADB_SERVER_ID}/" /docker-entrypoint-initdb.d/my.cnf
    cat /docker-entrypoint-initdb.d/my.cnf > /etc/mysql/my.cnf
    echo "lock" > $LOCK_FILE

    mkdir -p "$RUNDIR"
    chown -R mysql:mysql "$RUNDIR"
    chown -R mysql:mysql "$DATADIR"

    if [ ! -d "${DATADIR}/mysql" ]; then
        echo "Initializing MariaDB data directory..."
        mariadb-install-db --user=mysql --datadir="$DATADIR" >/dev/null
    fi

    #Initialize DB here for importing SQL files, don't allow connections
    echo "Starting temporary MariaDB (no networking) for init..."
    mariadbd --user=mysql --datadir="$DATADIR" --skip-networking --socket=/tmp/mysqld.sock &
    pid="$!"

    #Wait until mariadb server finished initializing before trying to import anything
    for i in {1..60}; do
        if mariadb-admin --protocol=socket --socket=/tmp/mysqld.sock ping >/dev/null 2>&1; then
          break
        fi
        sleep 0.5
    done

    echo "Running init scripts in /docker-entrypoint-initdb.d ..."
    shopt -s nullglob
    for f in /docker-entrypoint-initdb.d/*; do
        case "$f" in
            *.sql)
                echo "    -> importing $f"
                mariadb --protocol=socket --socket=/tmp/mysqld.sock < "$f"
            ;;
            *.sql.gz)
                echo "    -> importing $f"
                gunzip -c "$f" | mariadb --protocol=socket --socket=/tmp/mysqld.sock
            ;;
            *.sh)
                echo "    -> running $f"
                bash "$f"
            ;;
            *)
                echo "    -> ignoring $f"
            ;;
        esac
    done

    echo "Shutting down temporary MariaDB..."
    mariadb-admin --protocol=socket --socket=/tmp/mysqld.sock shutdown
    wait "$pid" || true

fi

echo "Starting MariaDB on port ${MARIADB_PORT}..."
exec "$@" --user=mysql --datadir="$DATADIR" --bind-address=0.0.0.0 --port="$MARIADB_PORT"

The most important aspect here is to create a lock file on first time this script runs, so that if the docker container is accidentally restarted, we don’t try to load the dump again.

Given that terraform isn’t the best tool to manage the transport and creation of mariadb dumps, and docker images, we’ll also need a small initializer script that can take a dump from master, record the gtid position, and build the docker image:

#!/bin/bash

TARGET_SRV=$1
ROOT_SRC=$2
REMOTE_DOCKER_DIR=$3
REMOTE_DOCKER_SCRIPTS_DIR=$4
DOCKER_BUILD=$5
DUMP_DIR=$ROOT_SRC/dump
DOCKER_DIR=$ROOT_SRC/docker
mkdir -p $DUMP_DIR
mkdir -p $DOCKER_DIR

if [[ $(ps aux | grep -v grep | grep -c mariadbd) -lt 1 ]]; then
    echo "Can not find active mariadb instance"
    exit 1
fi

sed -E -i "s/(gtid_slave_pos=)'[0-9-]+'/\1'%SLAVE%'/" $DUMP_DIR/01_queries.sql

mysqldump --all-databases --master-data=2 --single-transaction -u root > $DUMP_DIR/00_dump.sql
mysql -u root --skip-column-names -e "SELECT @@gtid_binlog_pos;" > $DUMP_DIR/gtid_pos.txt
GTID=$(cat $DUMP_DIR/gtid_pos.txt)
sed -E -i "s/%SLAVE%/${GTID}/" $DUMP_DIR/01_queries.sql
# The dump captures a consistent snapshot at a point in time, and the GTID position captured alongside it marks exactly where that snap
shot ends. The replication can then pick up from this marker.

scp $DUMP_DIR/00_dump.sql root@$TARGET_SRV:$REMOTE_DOCKER_SCRIPTS_DIR
scp $DUMP_DIR/01_queries.sql root@$TARGET_SRV:$REMOTE_DOCKER_SCRIPTS_DIR

scp $DOCKER_DIR/Dockerfile root@$TARGET_SRV:$REMOTE_DOCKER_DIR
scp $DOCKER_DIR/entrypoint.sh root@$TARGET_SRV:$REMOTE_DOCKER_DIR

scp $DOCKER_DIR/my.cnf root@$TARGET_SRV:$REMOTE_DOCKER_SCRIPTS_DIR

ssh root@$TARGET_SRV "docker build -t $DOCKER_BUILD $REMOTE_DOCKER_DIR"

The --master-data=2 flag captures the binary log filename and position, it is needed when setting up the replication on the docker container’s mariadb.

The other important thing here is to record a gtid position, so that the replication knows where to catch up from, once the dump is loaded into the replica. I do not have docker installed on my host PC, so this is why I copy over the docker files and build the images on the VM instead.

Important Caveat

This is a terribly slow way of doing things in practice. For a reasonably big dataset, both the dump and the loading of the dump would take a considerable amount of time. But for the sake of learning terraform, this is OK. In practice, it would be better to work with e.g. ZFS snapshots on VMs.

The `main.tf` terraform file

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    docker = {
      source  = "kreuzwerker/docker"
      version = "~> 3.0"
    }
    local = {
      source  = "hashicorp/local"
      version = "~> 2.0"
    }
  }
}

variable "target_ip" { type = string }
variable "docker_image" { type = string }
variable "batch_id" {type = string}

provider "docker" {
  host = "ssh://root@${var.target_ip}"
}

resource "docker_network" "lab" {
  name = "tf-mariadb-lab-net${var.batch_id}"
}

resource "docker_container" "server" {
  for_each = local.servers

  name     = each.key
  image    = "${var.docker_image}"
  hostname = each.key

  env = [
    "MARIADB_PORT=${each.value.port}",
    "MARIADB_SERVER_ID=${each.value.server_id}",
  ]

  ports {
    internal = each.value.port
    external = each.value.port
  }

  networks_advanced {
    name = docker_network.lab.name
  }

  labels {
    label = "deployed_by"
    value = "terraform"
  }
}

The task here is simple. Set up a docker network on the target server, where each docker container should have a unique network port assigned (remembering that docker containers share kernel resources, such as network ports), as well as a unique IP address each, so that they would be reachable externally.

Note that local.servers is not present here. This is because the number of servers that could be added or removed is not a constant number, so this block needs to be generated on the fly, before terraform itself runs. This can be put into a separate locals.tf file, as terraform will compile all .tf files together anyway.

If we want to add two new servers, locals.tf should look like this:

locals {
  servers = {
    "server4" = { port = 3314, server_id = 4 },
    "server5" = { port = 3315, server_id = 5 }
  }
}

Service Discovery

The first problem that we need to tackle is service discovery. Before running any terraform commands, we need to make sure that we actually know what’s running. If we’re adding new servers, we need to make sure that terraform will ignore any server that already exists. If we’re destroying servers, we will assume that working “backwards” is good enough (i.e. destroy only terraform managed resources, in the reverse of the order they were created).

The simplest way to do this, is to parse the output of maxctrl list servers. It outputs a table, as shown in the introduction, so we can parse this by piping it into an awk script like (service_discovery.awk):

#!/bin/awk -f

BEGIN{
    FS="│";
    max_id = 0;
    if (port == "){port = 3313;}
    if (number == "){number = 2;}
}
{
    if($2 ~ /server[0-9]+/){
        for (i=1; i<=NF; i++) {
            gsub(/^[ \t]+|[ \t]+$/, ", $i);
        }
        match($2, /([0-9]+)$/, current_id);
        if (current_id[1] > max_id) {
             max_id = current_id[1];
        }
    }
}
END{
    boundary = max_id+1;
    print boundary;
    print "locals {";
    print "  servers = {";
    for(i = boundary; i < boundary+number; i++) {
        port++;
        print "    \"server"i"\" = { port = "port", server_id = "i" },";
    }
    print "  }"
    print "}";
}

Here we are doing two things:

-> grab how many servers are know to maxscale, and keep track of it as max_id. The next value after max_id is our boundary. It means that if we add a new server, the first server-id should be this boundary value. We’ll write the result to the top of the output. If we want to destroy servers, then this boundary value is a good starting point for calculations to walk backwards from.
-> write a locals block, this will become our locals.tf once we remove that boundary value from the top, the number of servers created depends on the input “number”

The output is not perfect, as it adds an extra , at the final server definition, but this can be piped into a simple sed expression sed -e 'H;1h;$!d;x;s/$.*$,/\1/' to remove that final comma. Quick breakdown of this one-liner:

#sed -e
H; #append the current line from pattern space to hold space
1h; #on line one, copy the line into the hold space, this is just because H otherwise leaves a blank line on line one
$!d; # for every line that is not the last line, delete the pattern space and starts the next cycle , basically means not to print anything until the very end, essentially holding the whole file in memory
x; # once the last line is reached, swap everything from hold space back into the pattern space
s/\(.*\),/\1/ #now perform the substitution, match everything that is not a comma into a capture group, match a comma, add the capture group into the replacement pattern

With that out of the way, we can put the awk script into an awk_scripts folder, and build the service discovery script (service_discovery.sh):

#!/bin/bash

NUMBER=$1
PORT=$2
OPERATION=$3
ROOT_SRC=$4
TMP=$(mktemp)
PORT_CHECK=$(maxctrl list servers | awk -f $ROOT_SRC/awk_scripts/port_check.awk)
DESIRED_PORT=$(echo "$PORT + 1" | bc)

if [[ $DESIRED_PORT -le $PORT_CHECK  && $OPERATION =~ "APPLY" ]]; then
    echo "Desired port of $DESIRED_PORT is unavailable, failed check against $PORT_CHECK, try setting a higher port!"
    exit 1
fi

maxctrl list servers | awk -v number=$NUMBER -v port=$PORT -f $ROOT_SRC/awk_scripts/service_discovery.awk | sed -e 'H;1h;$!d;x;s/\(.*\),/\1/' > $TMP

if ! head -n1 "$TMP" | grep -Eq '^[0-9]+$'; then
    echo "Failed to derive boundary from maxctrl output" >&2
    exit 1
fi

cat $ROOT_SRC/boundary.txt > $ROOT_SRC/boundary.txt.last
awk '(NR==1){print $0}' $TMP > $ROOT_SRC/boundary.txt
sed -i '1d' $TMP

cat $ROOT_SRC/locals.tf > $ROOT_SRC/locals.tf.last
cat $TMP > $ROOT_SRC/locals.tf

rm $TMP

Here we tie it all together, we invoke maxctrl, pipe it into awk and sed as described above, quit the script if the boundary value is nonsense. If that check passes, we save the boundary value to another file, and then cut it off, so that we can have our locals.tf with the appropriate server definitions. The boundary.txt file will be turned into a boundary.mk to be usable in Makefile.

Essentially, in the orchestration phase, we need to make sure this script always runs well before we run terraform itself. We need both the updated boundary value to be fresh, as well as the locals.tf file to be refreshed.

When deploying new resources, we also need a check that we aren’t trying to use ports that are already busy (port_check.awk):

#!/bin/awk -f

BEGIN{
    FS="│";
    max_port = 0;
}
{
    if($4 ~ /[0-9]+/) {
        match($4, /[0-9]+/, port);
        if (port[0] > max_port) {
             max_port = port[0];
        }
    }
}
END{
    print max_port;
}

Maxscale

Whenever we create or destroy a mariadb container, we need to update the awareness of the situation for maxscale accordingly. This means we will need one script to update /etc/maxscale.cnf when we add servers, and another script for when we destroy them.

The main assumption I have made here is that /etc/maxscale.cnf keeps its default layout that comes with the package manager. So there’s a big block of comments between each of the sections.

There’s three major parts that we want to edit:

-> Server Definitions

############################################################################
# Server definitions                                                       #
#                                                                          #
# Set the address of the server to the network address of a MariaDB server.#
############################################################################

[server1]
type=server
address=192.168.2.37
port=3306
protocol=MariaDBBackend

[server2]
type=server
address=192.168.2.99
port=3306
protocol=MariaDBBackend

[server3]
type=server
address=192.168.2.98
port=3306
protocol=MariaDBBackend

We want our create script to add more like these, for each new server, or the delete script to delete blocks like these, so long as it will not delete these original ones.

-> Monitor Settings

servers=server1,server2,server3

We just want to add/remove servers on this line depending on whether we create or destroy servers. Whenever we add a server, we want maxscale to monitor it, and whenever we destroy one, we want maxscale to stop monitoring it.

-> Service Definitions

servers=server2,server3

All the new servers would be replicas, so they’d just need to be added to the service definition of the read-only listener when creating them, and removed from here when destroying them.

Create

Since this is just some basic text manipulation, awk is an excellent tool to achieve the above goals (create_maxscale.awk):

#!/bin/awk -f

BEGIN{
    found = 0;
    if(boundary == ") {boundary = 4;}
    if(number == ") {number = 2;}
    if(port == ") {port = 3313;}
    if(address == ") {address="192.168.2.99";}
    pat="\\[server"(boundary-1)"\\]";
    pat2=";
    repl=";
    for(i=boundary-2; i<boundary; i++) {
        pat2=pat2"server"i",";
    }
    pat2=substr(pat2, 1, length(pat2) - 1);
    for(i=boundary; i<boundary+number; i++) {
        repl=repl",server"i;
    }
}
{
    if($0 ~ pat2) {
        sub(pat2, pat2"repl, $0);
        print $0;
    } else {
        print $0;
    }
    if($0 ~ pat) {
        found = 1;
    }
    if((found == 1) && ($0 ~ /protocol=MariaDBBackend/)) {
        for(i = boundary; i < (boundary+number); i++) {
            port++;
            print "\n[server"i"]";
            print "type=server";
            print "address="address;
            print "port="port;
            print "protocol=MariaDBBackend\n";
    }
        found = 0;
        next;
    }
}
END{
}

Handles creating the server blocks, and updates the server list references. We can call this from a bash script (create_maxscale.sh), and restart maxscale after the modification:

#!/bin/bash

BOUNDARY=$1
NUMBER=$2
PORT=$3
ROOT_SRC=$4
TARGET_SRV=$5
TMP=$(mktemp)
CREATED_SERVERS_FILE=$ROOT_SRC/created_servers.txt

if [[ $BOUNDARY -lt 4 ]]; then
    exit 1
fi

CREATED_SERVERS_VAR="
ENDPOINT=$(echo "$BOUNDARY + $NUMBER" | bc)
for ((i = $BOUNDARY; i < $ENDPOINT; i++  )); do
    CREATED_SERVERS_VAR="${CREATED_SERVERS_VAR}server${i},"
done

echo "$CREATED_SERVERS_VAR" | sed 's/,$//' > $CREATED_SERVERS_FILE

awk -v boundary=$BOUNDARY -v number=$NUMBER -v port=$PORT -v address=$TARGET_SRV -f $ROOT_SRC/awk_scripts/create_maxscale.awk /etc/maxscale.cnf > $TMP
cat /etc/maxscale.cnf > /etc/maxscale.cnf.last
cat $TMP > /etc/maxscale.cnf

systemctl restart maxscale

rm $TMP

We are outputting a created_servers.txt to make deletion logic easier (also useful for the batching mentioned at the beginning). Essentially, we want to have a record of what resources we are creating, so that we can reference it when destroying the resources. This avoids having to rely on boundary calculations for deletions.

Destroy

Similarly, awk can take care of removing things from maxscale config (destroy_maxscale.awk):

#!/bin/awk -f

BEGIN{
    if(number == ") {exit 1;}
    if(boundary == ") {exit 1;}
    startpos = boundary - number;
    endpos = boundary - 1;
    skip = 0;
    blank_seen = 0;

    for (i = startpos; i <= endpos; i++) {
        managed["[server"i"]"] = 1;
    }

}
{
    if(skip > 0) {
        skip--;
        next;
    }
    if($0 in managed) {
        skip = 4;
        next;
    }

    if ($0 ~ /^[[:space:]]*$/) {
        if (blank_seen) {
            next;
        }
        blank_seen = 1;
        print $0;
        next;
    }

    blank_seen = 0;
    print $0;
}

Basically, the script receives the boundary value, what would be the next server, and computes which server-id’s are to be deleted, by identifying the relevant [serverX] lines, deleting them together with the next 4 lines. Here we also take care of any stray newlines and collapse them. The server list references are updated in the bash caller (destroy_maxscale.sh):

#!/bin/bash

ROOT_SRC=$1

if [[ ! -s "$ROOT_SRC/created_servers.txt" ]]; then
    echo "No created servers file found, aborting"
    exit 1
fi

MANAGED_SERVERS=$(awk '(NR==1){print $0}' $ROOT_SRC/created_servers.txt)
echo "DELETING SERVERS ${MANAGED_SERVERS} FROM maxscale.cnf"
LAST_SERVER_ID=$(echo $MANAGED_SERVERS | grep -Eo 'server[0-9]+$' | sed 's/server//')
NEXT_SERVER_ID=$(echo "$LAST_SERVER_ID + 1" | bc)
NUMBER=$(awk 'BEGIN{FS=","}{if(NR==1){print NF}}' $ROOT_SRC/created_servers.txt)

if [[ $LAST_SERVER_ID -lt 4 ]]; then
    exit 1
fi

TMP=$(mktemp)

awk -v boundary=$NEXT_SERVER_ID -v number=$NUMBER -f $ROOT_SRC/awk_scripts/destroy_maxscale.awk /etc/maxscale.cnf > $TMP
cat /etc/maxscale.cnf > /etc/maxscale.cnf.last
cat $TMP > /etc/maxscale.cnf
sed -E -i "s/,$MANAGED_SERVERS//g" /etc/maxscale.cnf

systemctl restart maxscale

rm $TMP

Here we can simply reuse the created_servers.txt file from the creation/apply stage, to easily determine what we should be removing from service.

The Orchestration

Now that all the elements are in place, we can put all of this together into a Makefile. As established earlier, make helps us with not having to write a complex bash script to act as the orchestration and entry point of the project. We can ensure that all distinct elements (e.g. bash, awk, terraform) of the projects can rely on the same set of initial variables, making the project more portable and configurable.

This way, we can also set up simple commands to operate the project with, e.g. make OPERATION=APPLY/DESTROY without having to think about chaining commands manually based on exit codes etc.

-include config.mk
-include boundary.mk
OPERATION ?=
VALID_OPERATIONS := APPLY DESTROY
NUMBER ?=
PORT ?=
BATCH_ID ?=

ifeq ($(NUMBER),)
        NUMBER = 2
endif

ifeq ($(PORT),)
        PORT = 3313
endif

ifeq ($(BATCH_ID),)
        BATCH_ID = 0
endif

ifneq ($(OPERATION),APPLY)
    ifneq ($(OPERATION),DESTROY)
        $(error Invalid value for OPERATION: "$(OPERATION)". Must be "APPLY" or "DESTROY")
    endif
endif

IS_MAXSCALE_ACTIVE := $(shell systemctl is-active maxscale)

ifneq ($(IS_MAXSCALE_ACTIVE),active)
$(error Maxscale service is not running, can not do service discovery)
endif

ifeq ($(OPERATION),APPLY)
        TARGETS := build apply
else ifeq ($(OPERATION),DESTROY)
        TARGETS := destroy
endif

operate: service_discovery
        $(MAKE) execute_operation
#we need a dynamically generated value from a file, so we must call make from make to refresh state

execute_operation:
        $(MAKE) $(TARGETS)

#checks maxctrl, tells what's currently running gives the number of current servers, and builds a locals.tf file
service_discovery:
        rm -f boundary.mk
        bash service_discovery.sh $(NUMBER) $(PORT) $(OPERATION) $(ROOT_SRC)
        @echo "BOUNDARY=$$(cat boundary.txt)" > boundary.mk

#builds and transports the docker images to the target server, along with a fresh mysqldump
build:
        bash init.sh $(TARGET_SRV) $(ROOT_SRC) $(REMOTE_DOCKER_DIR) $(REMOTE_DOCKER_SCRIPTS_DIR) $(DOCKER_BUILD)

#runs the new docker containers, updates maxscale
apply:
        terraform init
        terraform apply -var="target_ip=$(TARGET_SRV)" -var="docker_image=$(DOCKER_BUILD)" -var="batch_id=$(BATCH_ID)"
        bash create_maxscale.sh $(BOUNDARY) $(NUMBER) $(PORT) $(ROOT_SRC) $(TARGET_SRV)

#destroys the docker containers, updates maxscale
destroy:
        bash destroy_maxscale.sh $(ROOT_SRC)
        terraform destroy -var="target_ip=$(TARGET_SRV)" -var="docker_image=$(DOCKER_BUILD)"

.PHONY: service_discovery execute_operation operate build apply destroy

The main key takeaways:

-> ifneq ($(OPERATION),APPLY)/ifneq ($(OPERATION),DESTROY), we want to restrict the user to simple commands, and give a helpful error message
-> IS_MAXSCALE_ACTIVE := $(shell systemctl is-active maxscale), if maxscale isn’t running, the operation is fundamentally impossible, so we should crash out in that case
-> operate: service_discovery, we want to ensure service discovery is always done, so we make it a dependency of operate
-> $(MAKE), recursive make, we need this to be able to get the updated boundary value, in other words, we generate a configuration file boundary.mk in an early target, and call make again to use the fresh file for the rest of the targets
-> even without much knowledge of Makefile, any user can easily configure the project via config.mk

Batching, aka the top-level orchestration

Now that the lifecycle of a single terraform apply/destroy is clear, we can look at top-level orchestration. The goal is that we should be able to run terraform apply multiple times, and it should not destroy resources created in the past. It should append. We should be able to destroy specific resources, and leave the rest unaffected.

There are five things we might want to do on a top-level:

-> establish a new batch, but not deploy any resource
-> deploy a resource belonging to some batch
-> destroy a resource belonging to some batch
-> delete a batch structure, regardless of what is deployed
-> list all the batches and their members

It makes sense to keep each of these things as a script, and tie operations together with a Makefile.

Establishing a new batch

Basically, we just want the system to be ready for terraform commands to come in, but without actually executing any terraform yet (new_batch.sh):

#!/bin/bash

BATCH_NAME=$1
THIS_BATCH=$2
BATCH_DIR=$3
TEMPLATE_DIR=$4

echo "Creating batch: ${BATCH_NAME}"
if [ -d "${THIS_BATCH}" ]; then
        echo "Error: Batch ${BATCH_NAME} already exists"
        exit 1
fi

mkdir -p "$BATCH_DIR" && mkdir -p "$THIS_BATCH"

cp -r ${TEMPLATE_DIR} ${THIS_BATCH}
cd ${THIS_BATCH} && rm -f terraform.tfstate* .terraform.lock.hcl boundary.txt boundary.mk boundary.txt.last locals.tf locals.tf.last created_servers.txt dump/00_dump.sql dump/gtid_pos.txt
cd ${THIS_BATCH} && rm -rf .terraform/
echo "Batch created @ ${THIS_BATCH}"

Takes the terraform-template/ directory, creates a copy with the desired batch name, and deletes any files and directories that are generated, so that the batch will start with a clean slate. At this point, it is possible to direct terraform commands at this directory and expect them to work.

Deploying resources into a batch

If we have created a batch, then this is where we want to be able to cd into the batch directory, and execute a make OPERATION=APPLY on this batch. This means that terraform will manage only the resources belonging to this batch, and it will not attempt to manage/destroy/etc resources belonging to other batches. A clear separation of concerns (deploy_to_new_batch.sh).

#!/bin/bash

BATCH_NAME=$1
THIS_BATCH=$2
WORKING_DIR=$3
NUMBER=$4
PORT=$5
TEMPLATE_DIR=$6
BATCH_REGISTRY=$7

if [ ! -d "${THIS_BATCH}" ]; then
        echo "Error: Batch ${BATCH_NAME} not found"
        exit 1
fi

if [ ! -d "${WORKING_DIR}" ]; then
    echo "Error: Batch ${BATCH_NAME} has no template dir!"
    exit 1
fi

echo "Deploying batch: ${BATCH_NAME} with ${NUMBER} servers"
cd ${WORKING_DIR} && make OPERATION=APPLY NUMBER=${NUMBER} PORT=${PORT} BATCH_ID=${BATCH_NAME}
echo "${BATCH_NAME}: ${NUMBER} servers" >> ${BATCH_REGISTRY}
echo "Batch deployed successfully!"

We’ll also keep a top-level registry of batches, so that we can have a simple bird’s eye view of what’s been deployed per batch.

Destroying resources in a batch

Basically, if we create resources in a batch, we should be able to destroy them too (`destroy_in_batch.sh):

#!/bin/bash

BATCH_NAME=$1
THIS_BATCH=$2
WORKING_DIR=$3
NUMBER=$4
TEMPLATE_DIR=$5
BATCH_REGISTRY=$6

if [ ! -d "${THIS_BATCH}" ]; then
        echo "Error: Batch ${BATCH_NAME} not found"
        exit 1
fi

if [ ! -d "${WORKING_DIR}" ]; then
    echo "Error: Batch ${BATCH_NAME} has no template dir!"
    exit 1
fi

echo "Destroying batch: ${BATCH_NAME}"
cd ${WORKING_DIR} && make OPERATION=DESTROY NUMBER=${NUMBER}
sed -i "/^${BATCH_NAME}:/d" ${BATCH_REGISTRY}
echo "Batch destroyed successfully!"

Once again, good to update the top level registry.

Deleting a batch structure

Self-explanatory. This script deletes the batch structure, but not the resources (delete_batch_dir.sh):

#!/bin/bash

BATCH_NAME=$1
THIS_BATCH=$2
TEMPLATE_DIR=$3
BATCH_REGISTRY=$4

if [ ! -d "${THIS_BATCH}" ]; then
        echo "Error: Batch ${BATCH_NAME} not found!"
        exit 1;
fi
echo "Removing batch directory: ${THIS_BATCH}"
read -p "Are you sure? [y/N] " CONFIRM;
if [ "$CONFIRM" = "y" ] || [ "$CONFIRM" = "Y" ]; then
        rm -rf ${THIS_BATCH};
        sed -i "/^${BATCH_NAME}:/d" ${BATCH_REGISTRY}
        echo "Batch removed!";
else
        echo "Cancelled";
fi

Once again, good to keep the top level registry informed.

Displaying a simple top level registry (list)

If we do not do this, we rely purely on maxscale, or /etc/maxscale.cnf to see this information.

#!/bin/bash

BATCH_NAME=$1
THIS_BATCH=$2
BATCH_DIR=$3
TEMPLATE_DIR=$4
BATCH_REGISTRY=$5

echo "[=== Deployed Batches ===]"
cat $BATCH_REGISTRY | awk -v dir="$(echo $BATCH_DIR/)" 'BEGIN{FS=":"}{print "echo "$1" && cat "dir$1"/terraform-template/created_servers.txt"}' | bash

The output is very simple:

root@linuxpc:/opt/terraform/part2# make list
[=== Deployed Batches ===]
batch-1
server4,server5
batch-2
server6,server7

Makefile

Given that we’ve offsourced the logic to bash, the makefile to hold all this together becomes very simple, and largely self-explanatory:

-include config.mk
BATCH_NAME ?= batch-$(shell date +%Y%m%d-%H%M%S)
THIS_BATCH := $(BATCH_DIR)/$(BATCH_NAME)
WORKING_DIR := $(THIS_BATCH)/terraform-template
NUMBER ?= 2
PORT ?= 3313

help:
        @echo "Batch Deployment Orchestrator"
        @echo "
        @echo "Targets:"
        @echo "  make new [BATCH_NAME=name]              - Create new batch"
        @echo "  make deploy BATCH_NAME=name [NUMBER=2]  - Deploy batch"
        @echo "  make destroy BATCH_NAME=name [NUMBER=2] - Destroy batch"
        @echo "  make list                               - List all batches"
        @echo "  make clean BATCH_NAME=name              - Delete batch directory"
        @echo "
        @echo "Examples:"
        @echo "  make new BATCH_NAME=batch-1"
        @echo "  make deploy BATCH_NAME=batch-1"
        @echo "  make destroy BATCH_NAME=batch-1"

new:
        @bash $(SCRIPTS_DIR)/new_batch.sh $(BATCH_NAME) $(THIS_BATCH) $(BATCH_DIR) $(TEMPLATE_DIR)

deploy:
        @bash $(SCRIPTS_DIR)/deploy_to_new_batch.sh $(BATCH_NAME) $(THIS_BATCH) $(WORKING_DIR) $(NUMBER) $(PORT) $(TEMPLATE_DIR) $(BATCH_REGISTRY)

destroy:
        @bash $(SCRIPTS_DIR)/destroy_in_batch.sh $(BATCH_NAME) $(THIS_BATCH) $(WORKING_DIR) $(NUMBER) $(TEMPLATE_DIR) $(BATCH_REGISTRY)

list:
        @bash $(SCRIPTS_DIR)/list_batches.sh $(BATCH_NAME) $(THIS_BATCH) $(BATCH_DIR) $(TEMPLATE_DIR) $(BATCH_REGISTRY)

clean:
        @bash $(SCRIPTS_DIR)/delete_batch_dir.sh $(BATCH_NAME) $(THIS_BATCH) $(TEMPLATE_DIR) $(BATCH_REGISTRY)

.PHONY: new deploy destroy list clean help

.DEFAULT_GOAL := help

We just need to add targets that map to our bash scripts and push through the relevant variables. The config file just keeps a few variables that I felt might be useful to toggle (config.mk):

ROOT_SRC=$(CURDIR)
TEMPLATE_DIR=$(ROOT_SRC)/terraform-template
BATCH_DIR=$(ROOT_SRC)/batches
BATCH_REGISTRY=$(BATCH_DIR)/batch_registry.txt
SCRIPTS_DIR=$(ROOT_SRC)/scripts

And that brings us to the end here. It is now possible to keep deploying new resources of the same type in a way such that terraform will not try to destroy existing resources.

Conclusion

This exercise helps to see that while terraform is an amazing tool for resource provision, it is not intended as a tool to implement an d configure application deployments, nor is it a good orchestrator, and it is in practice used in tandem with other tools to achieve en d-to-end deployments.

Final Caveat

There is nothing here to prevent simultaneous runs of this process, which would result in chaos. Some kind of DB-bound locking mechanism would be in order in practice.