Fully automated zero downtime deployments with Floating IPs on DigitalOcean Droplets

In this tutorial I will show you how I have implemented a zero downtime deployment (blue-green like) of a small web application which is running on a DigitalOcean Droplet. The cool thing is, that this deployment is fully automated. After pushing a code change to the web application code a CI/CD-Pipeline will be executed which does the following tasks:

  1. build a new version of the web application
  2. put that web application into a docker image
  3. push the docker image to an external docker registry
  4. create the infrastructure which consists of :
    1. two droplets (on one we will have the updated version of our web application and on the other we will have the pre-update version as a fallback)
    2. one droplet which serves as the load balancer
    3. a floating IP-Address which will be used in the load balancer’s configuration to route user requests to the active droplet
    4. a domain (and a dns record) to tie the domain to the load balancer
  5. pull the latest version of the docker image of our web application from the docker registry
  6. start a container with the newest image
  7. wait until the application is up and running
  8. link the floating IP address to the droplet with the updated version of our web application container which thereby becomes our new production droplet

If we put the infrastructure into a diagram it will look like this:

floatingip (1)

You can find the complete source code here.

Tech-Stack

  • Web application: a small „Hello World“-Vaadin-Application (with Spring Boot) which shows the current time when clicking a button.
  • Repository/Docker Registry/CI/CD/Pipeline: gitlab.com
  • Infrastructure/Cloud provider: DigitalOcean
  • Infrastructure as code: Terraform (with an S3 bucket hosted in a DigitalOcean Space as the backend so that terraform knows which resources were already created during previous pipeline run. In fact: The infrastructure will be build a single time on the first pipeline run and from the second pipeline run there is no need to recreate everything – the only exception is when we modify the specs of our resources. Not only is there no need: We don’t WANT everything to be recreated because we don’t want any downtime).
  • Configuration management: ansible (we will use a small ansible playbook to copy a script to our worker droplets and execute it on those. This script will do nothing more than pulling the newest docker image and stop the currently running container and replace it with a container of the pulled image)

Create the web application

So, let us start with the web application. We will take a starter pack from https://vaadin.com/start/latest/project-base-spring which is a Spring Boot application (maven project) consisting of only a single button that displays the time when clicked.

starter

Put the web application into a container (image)

As a next step we will put that web application into a container. Therefore let’s create a Dockerfile (at the top level of the vaadin project) which builds an image that takes the generated jar-File and executes it – it’s pretty straight forward:

FROM openjdk:11-jdk-oracle
RUN useradd --no-log-init -r codinghaus
USER codinghaus
WORKDIR /home/codinghaus
ADD target/my-starter-project.jar .
EXPOSE 8080
CMD java -jar /home/codinghaus/my-starter-project.jar

Before building the image we have to to a maven clean build so that the jar-file is generated (my-starter-project.jar). After a clean build we can generate the image. Navigate to the project and execute:

docker build . -t codinghaus/webgui:1

This will create an image called codinghaus/webgui with the tag 1 (for version 1).

webgui1

Before we start to build our pipeline and automate everything, let us test if the container works:

docker run -d -p 8080:8080 codinghaus/webgui:1

running

After executing that command the container should run in the background and when calling http://localhost:8080/ the web application should be running.

Okay cool, we have a containerized web application now. Time to get it into the cloud.

Create the infrastructure (as code)

So, we want the app to be deployed into the cloud so that it is accessible for other users. The cloud provider of our choice is DigitalOcean. The resources that we want to have:

  • 1 Domain (www.gotcha-app.de)
  • 2 Droplets running our app (one droplet will serve as our production system where the latest version of our app is running. The other droplet will contain the previous version of our app – if an update of our app fails horrible we will be able to easily switch back to the previous version)
  • 1 Droplet which serves as a load balancer
  • 1 Floating IP which will be used in our load balancer configuration

The point of that setup is: We will have a domain http://www.gotcha-app.de. When a user enters http://www.gotcha-app.de the request will be redirected to the load balancer. The load balancer will take the request and forward it to the floating IP.  A DigitalOcean Floating IP is static and we are able to bind one droplet (its ip) to that static IP. This is great because we can configure our load balancer to forward all requests to that static IP, but on the other hand we are able to dynamically switch the target behind that IP so later we can decide if requests to that IP will be forwarded to our droplet with the most current version of our web app running or to the droplet with the previous version.

Think about it: Later when we have a pipeline, which is executed when we updated the code of our web application, we will deploy the newest version of our web app to one of our droplets. During this time all user requests will be forwarded to the droplet running the pre-update version. The users won’t recognize that we are deploying an update to the second  droplet. When the update is finished on the second droplet, we can then tell the Floating IP: „Hey, update finished. Stop forwarding user requests to the droplet running the old version and instead forward them to the droplet running the newest version“. As we are only changing the target behind the static Floating IP we don’t have to touch our load balancer’s configuration and so we don’t have to restart the load balancer. The users won’t recognize that their requests are now forwarded to another droplet.

But now it’s time for the infrastructure code. We use terraform to describe our infrastructure as code. There is a great DigitalOcean-Provider for terraform so we can describe everything we want to be created on DigitalOcean as code.

I will keep explanations on that code very short. There are enough sources on the internet if you are interested in learning terraform. Just a few words on how terraform works in general: What you will need (if you try this example yourself) is an account on DigitalOcean where you can create a token. With that token terraform will be able to create resources like droplets, domains, etc. on your account. It therefore will call API-endpoints from DigitalOcean. You will also need to put a ssh key to your DO-Account so that terraform will be able to connect to the created droplets.

First, let us create an .infrastructure-folder to our project which will contain the complete code describing our wanted infrastructure (in terraform-syntax):

projectstructure

The digitalocean-folder contains everything that should be created on DigitalOcean.

  • domain.tf: contains the domain to create
  • droplets.tf: contains both worker droplets to create
  • floatingip.tf: contains the floating IP to create
  • loadbalancer: contains the droplet to create which will server as a load balancer
  • provider: contains our DigitalOcean-Account-Credentials (our token)
  • vars: defines all variables we need
  • backend: here we define the space where we want terraform to save the current state of our infrastructure

This is really important as a backend serves as a location where terraform puts the current state of the infrastructure that it already created. We want those informations to be saved remotely (and not on our repository) and during each run of our pipeline we want terraform to check the current state so that our infrastructure won’t be created again and again with each pipeline run. Just think of what our idempotent CI/CD-Pipeline should do: Create the infrastructure if not already there, but if already there: only do the updates.

Info: During the creation of that blog post the digitalocean provider for terraform doesn’t support creating spaces programatically with terraform (but it will be possible with probably the next release – see https://github.com/terraform-providers/terraform-provider-digitalocean/pull/77). So – sadly – we have to create that space from the DigitalOcean-GUI but then at least we can use the manually created space in our terraform code and use it as a backend.

Let us have a look at the single files:

vars.tf

This file contains variables which in general contain credential stuff like keys/tokens/passwords that we don’t want to have hardcoded in our code (and in our repository). It also contains some configuration stuff like the number of worker droplets we want to be created, the size of each droplet, the location where to create the droplets, …. . Later when creating our pipeline you will see, that we will fill those variables using environment variables.

variable "DO_TOKEN" {}
variable "DO_SPACES_ACCESS_ID" {}
variable "DO_SPACES_SECRET_KEY" {}

variable "DO_PUBKEY_PLAIN" {}
variable "DO_KEYFINGERPRINT" {}
variable "DO_REGION" {}
variable "DO_SIZE" {}
variable "DO_WORKERCOUNT" {}

variable "DOCKER_REGISTRY_URL" {}
variable "DOCKER_REGISTRY_USERNAME" {}
variable "DOCKER_REGISTRY_PASSWORD" {}

variable "TAGS" {
  type = "list"
  default = ["PROD", "FALLBACK"]
}

provider.tf

In this file we tell terraform which provider to choose, so it knows what API-Endpoints to use to create/modify resources on DigitalOcean and what credentials to use to find and login to our account.

provider "digitalocean" {
  token = "${var.DO_TOKEN}"
}

droplets.tf

This is the first „real“ resource file. Here we describe the two droplets on which we will have our application running later. What we already do here: After creating the resource we connect to it (via ssh) and install docker there so we don’t have to do that later. The only purpose of those droplets is to run our container and therefore a docker installation is required – and why not already do this during the creating process? In the end we already login to our docker registry (we will cover that later during this blog post).

The last block („lifecycle“) is pretty important because we tell terraform here that it should not recognize if a droplet’s tag has changed. If we would not do this: Think of a second pipeline run. We don’t want terraform to do anything here, we just want to create a new image and update the running container. Without that block terraform would recognize (on the second pipeline run, during the deploy-infrastructure stage) that our two worker droplets have changed (their tags) and terraform would reset the tags again. But we want to handle the droplet’s tags without terraform – during our update stage.

resource "digitalocean_droplet" "droplets" {
  image = "ubuntu-16-04-x64"
  name = "${format("droplet%02d", count.index + 1)}"
  count = "${var.DO_WORKERCOUNT}"
  region = "${var.DO_REGION}"
  size = "${var.DO_SIZE}"
  tags = ["${element(var.TAGS, count.index)}"]
  private_networking = true
  ssh_keys = [
    "${var.DO_KEYFINGERPRINT}"
  ]
  connection {
    user = "root"
    type = "ssh"
    private_key = "${file("~/.ssh/id_rsa")}"
    timeout = "2m"
  }

  provisioner "remote-exec" {
    inline = [
      "sleep 10",
      "apt-get update",
      "apt-get install apt-transport-https ca-certificates curl software-properties-common -y",
      "curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -",
      "add-apt-repository \"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable\"",
      "apt-get update",
      "apt-get install docker-ce -y",
      "usermod -aG docker `whoami`",
      "docker login ${var.DOCKER_REGISTRY_URL} --username ${var.DOCKER_REGISTRY_USERNAME} --password ${var.DOCKER_REGISTRY_PASSWORD}"
    ]
  }

  lifecycle {
    ignore_changes = ["tags"]
  }
}

floatingip.tf

The floating IP will be created after the droplets (have a look at the depends_on-attribute). During the creation process (which will probably only be executed once during our very first pipeline run) we will set the first of our two droplets as the target behind the floating IP (if you think about it: The target doesn’t matter during creation as both droplets are empty/plain and have no web application container running).

Pay attention to the lifecycle block again. Without it terraform would (during the second pipeline run) reattach droplet01 to the floating ip. We don’t want terraform to do anything after creating the infrastructure because we will handle that ourselves during the update stage).

resource "digitalocean_floating_ip" "floatingip" {
  droplet_id = "${element(digitalocean_droplet.droplets.*.id, 0)}"
  region     = "${element(digitalocean_droplet.droplets.*.region, 0)}"
  depends_on = ["digitalocean_droplet.droplets"]

  lifecycle {
    ignore_changes = ["droplet_id"]
  }
}

loadbalancer.tf

The load balancer droplet will be created after the floating IP. This is because we want to use the IP address of the floating IP in our load balancer config, so it must be already existent when creating/configuring the load balancer. As you can see we will use a HAProxy here and modify the haproxy.cfg to route all incoming requests to the IP of the floating IP and do a restart of the HAProxy after that.

resource "digitalocean_droplet" "loadbalancer" {
  image = "ubuntu-16-04-x64"
  name = "loadbalancer"
  region = "${var.DO_REGION}"
  size = "${var.DO_SIZE}"
  private_networking = true
  ssh_keys = [
    "${var.DO_KEYFINGERPRINT}"
  ]
  depends_on = ["digitalocean_floating_ip.floatingip"]

  connection {
    user = "root"
    type = "ssh"
    private_key = "${file("~/.ssh/id_rsa")}"
    timeout = "2m"
  }

  provisioner "remote-exec" {
    inline = [
      "apt-get update",
      "apt-get install jq -y",
      "apt-get update",
      "apt-get install haproxy -y",
      "printf \"\toption forwardfor\" >> /etc/haproxy/haproxy.cfg",
      "printf \"\n\nfrontend http\n\tbind ${self.ipv4_address}:80\n\tdefault_backend web-backend\n\" >> /etc/haproxy/haproxy.cfg",
      "printf \"\nbackend web-backend\n\tserver floatingIP ${digitalocean_floating_ip.floatingip.ip_address}:8080 check\" >> /etc/haproxy/haproxy.cfg",
      "/etc/init.d/haproxy restart"
    ]
  }
}

domain.tf

When the load balancer creation has finished, we are able to create (let terraform create) a domain on DigitalOcean. We will tie requests to that domain to the IP of the created load balancer droplet.

resource "digitalocean_domain" "domain-www" {
  name       = "www.gotcha-app.de"
  ip_address = "${digitalocean_droplet.loadbalancer.ipv4_address}"
  depends_on = ["digitalocean_droplet.loadbalancer"]
}

backend.tf

This last file is pretty important. It describes our terraform backend. A terraform backend is a remote location, where terraform saves the state of the created infrastructure. Without a backend terraform would recreate all described resources on each pipeline run because it doesn’t know that it already was created. But of course we want terraform to be idempotent and only create the resources if they don’t exist already. By default terraform saves those information locally in the same folder where it is executed (.infrastructure/digitalocean) but we don’t want it in our code and we don’t want it in our repository. Instead we will use a DigitalOcean Space (which is the same as  AWS S3) for that (I read through https://medium.com/@jmarhee/digitalocean-spaces-as-a-terraform-backend-b761ae426086 to understand how to do that and can absolutely recommend that blog post)

terraform {
  backend "s3" {
    endpoint = "ams3.digitaloceanspaces.com"
    region = "us-west-1"
    key = "terraform-state"
    skip_requesting_account_id = true
    skip_credentials_validation = true
    skip_get_ec2_platforms = true
    skip_metadata_api_check = true
  }
}

Thanks to that code we can then use the following command, which ensures that terraform will always load the current state of our infrastructure before executing any command:

terraform init \
-backend-config="access_key=$TF_VAR_DO_SPACES_ACCESS_ID" \
-backend-config="secret_key=$TF_VAR_DO_SPACES_SECRET_KEY" \
-backend-config="bucket=$TF_VAR_DO_SPACES_BUCKET_NAME"

(taken from the mentioned blog post from @jmarhee, thanks!)

Nice, our infrastructure code seems complete. After creating a space manually on DigitalOcean, we can (for testing purposes) try out if everything works by first initiating the terraform backend:

backendinit

and then let terraform do its work. First we will run a „terraform plan“, to get an overview of what terraform is about to create:

terraform plan -var="DO_TOKEN=<value here>" -var="DO_SPACES_ACCESS_ID=<value here>" -var="DO_SPACES_SECRET_KEY=<value here>" -var="DO_PUBKEY_PLAIN=<value here>" -var="DO_KEYFINGERPRINT=<value here>" -var="DO_REGION=fra1" -var="DO_SIZE=s-1vcpu-1gb" -var="DO_WORKERCOUNT=2"

Seems good, so now let us terraform to do the magic:

terraform apply -var="DO_TOKEN=<value here>" -var="DO_SPACES_ACCESS_ID=<value here>" -var="DO_SPACES_SECRET_KEY=<value here>" -var="DO_PUBKEY_PLAIN=<value here>" -var="DO_KEYFINGERPRINT=<value here>" -var="DO_REGION=fra1" -var="DO_SIZE=s-1vcpu-1gb" -var="DO_WORKERCOUNT=2"

 

This will take some minutes. The result will then look like:

terraformfinished

And if you have a look into the DigitalOcean-GUI you can see that all three droplets, the floating IP and the domain have been created.

droplets

floatingips

domains

If you have a further look into the bucket in your space, you will see that terraform has automatically uploaded a file there, which contains all informations on the current state of the created infrastructure. This ensures that if you now run a second terraform apply nothing will happen as all resources are already created (if you want to clear your infrastructure / remove all resources you can do a terraform destroy).

space

So let us just try a second terraform apply and ensure that nothing happens:

terraformapply22

Perfect! Now that the automated infrastructure creation works, it is the right time to put the project into a repository and create a pipeline!

Create the CI/CD-Pipeline

We will host our project in a gitlab repository. Why gitlab? It offers everything we want.

  • We will need a docker registry where our web applications image versions are pushed to / pulled from and gitlab has an integrated docker registry that we can use for that (no need to host an own registry on an additional server).
  • We need a system, where our pipeline is executed (the pipeline will consist of different steps like „build the application“, „push the new image to the registry“, „build the infrastructure“, …). Gitlab offers non-cost shared runners which we can use for that. If we create a pipeline file in our repository the pipeline will automatically be triggered and executed on those shared runners (read on).
  • I mentioned earlier that we will keep all the stuff we don’t want hardcoded in our repository (credentials, configuration, …) in environment variables: In Gitlab you are able to create environment variables (key/value pairs) that are available during the pipeline run on the shared runners automatically – great!!

I will skip the creation of a repository here and presuppose that you have done that already / will do that on your own.

Now that we have a repository on gitlab, let us first create the environment variables which are needed for terraform. As you could see above when executing the terraform commands we appended lots of -var=“KEY=VALUE“ so that terraform has values for all variables defined in vars.tf. An alternative to that appending approach is to have environment variables which have the same names as the defined variables in vars.tf but with a prepended TF_VAR_. So what we have to do is create those environment variables in gitlab (Settings –> CI/CD –> Environment Variables). It should look like this in the end:

envvars2

Awesome, now that everything is prepared let us finally define the pipeline. This is done by putting a file called „.gitlab-ci.yml“ at the root of our repository. In this file we define what the pipeline should do and when. We will define four stages:

  1. build – where the web application is built and packed
  2. push – where the web application is put into a docker image and pushed into our docker registry
  3. deploy-infrastructure – where all our DigitalOcean-resources are created by terraform if needed (what we did manually / by typing the terraform commands manually in the previous chapter)
  4. update – where we take one of the worker droplets, pull the newest image of our web application image, start a container from that newest image and then modify the Floating IP to point to that updated droplet (running the container with the updated image).

Let us have a look at each single stage and what exactly is done there. You will find the full file in the repository here.

build-stage

build:
  stage: build
  script:
    - mvn clean install
  artifacts:
    paths:
      - target/my-starter-project.jar
  only:
    - master

As we have a maven project in our repository we will use the maven base image for our pipeline (see full file in the example repository) so we can simply run a

mvn clean install

here as the only command in that stage. The „artifacts“ block ensures that the built jar file will be stored for the whole pipeline run so that other stages have access to that file (as we need it to build the docker image etc..). The „only“ block is pretty self explaining. It says that this stage should only be executed when something is pushed to the master branch of the repository.

push-stage

push:
  stage: push
  image: docker:latest
  before_script:
    - docker login $TF_VAR_DOCKER_REGISTRY_URL --username $TF_VAR_DOCKER_REGISTRY_USERNAME --password $TF_VAR_DOCKER_REGISTRY_PASSWORD
  script:
    - until VERSION=`docker run -v $(pwd):/app -w /app maven:3.6.0-jdk-11 mvn org.apache.maven.plugins:maven-help-plugin:3.1.1:evaluate -Dexpression=project.version -q -DforceStdout`; do echo "mvn command timed out...trying again..."; sleep 2; done
    - docker build --tag=$TF_VAR_DOCKER_REGISTRY_URL/$TF_VAR_DOCKER_REGISTRY_USERNAME/$DOCKER_REGISTRY_REPO_NAME/webgui:$VERSION .
    - docker push $TF_VAR_DOCKER_REGISTRY_URL/$TF_VAR_DOCKER_REGISTRY_USERNAME/$DOCKER_REGISTRY_REPO_NAME/webgui:$VERSION
  only:
    - master

The first thing we do here is to use the docker image for that stage instead of the maven image as we will mainly run docker commands here to build our image and push it to our gitlab docker registry. As you can see the first step (in the „before_script“ block) is to log into the docker registry. If you use the gitlab docker registry the URL is registry.gitlab.com and you can log into it with your normal gitlab account credentials. As we don’t wont these hardcoded in our repository we will instead use environment variables here again which we have to add in the CI/CD-Settings in our gitlab project as we did it with the other credential information for our infrastructure code (see above chapters).

envvariables2

Afterwards the first command in the „script“ block is for extracting the version number from our pom.xml to create a docker image tagged with the same version so there is always an image matching the version from our pom.xml (which is 1.0-SNAPSHOT for our current version).  I put that command into a loop because it occasionally  fails (I think because it lasts very long and so it sometimes runs into a timeout – which is bad as our pipeline fails if any command in the pipeline fails). Then we build the image, tag it and push it to the docker registry. No magic here.

deploy-infrastructure-stage

Time to put the terraform commands into our pipeline:

deploy-infrastructure:
  stage: deploy-infrastructure
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
  before_script:
    - mkdir -p ~/.ssh
    - echo "$PRIVKEY_PLAIN" | tr -d '\r' > ~/.ssh/id_rsa
  script:
    - cd .infrastructure/digitalocean
    - terraform init -backend-config="access_key=$TF_VAR_DO_SPACES_ACCESS_ID" -backend-config="secret_key=$TF_VAR_DO_SPACES_SECRET_KEY" -backend-config="bucket=$TF_VAR_DO_SPACES_BUCKET_NAME"
    - terraform plan
    - until terraform apply -auto-approve; do echo "Error while using DO-API..trying again..."; sleep 2; done
  only:
  - master

based on the hashicorp/terraform:light-image we will step into our digitalocean folder including our infrastructure code, init our terraform backend so terraform knows about the current state of our infrastructure (if there were previous pipeline runs) and then plan and apply the code. I put the „terraform apply“ command into a loop as sometimes there occur errors when talking to the DigitalOcean-API. Without the loop the whole pipeline would fail in that case but thanks to the loop it will be retried. Of course this can result in an endless loop if there are „real“ errors and not only „temporary“ ones but most of the times my pipeline runs fail are related to temporary api errors so in most cases that loop is more helpful than harming.

the before_script-block copies our private ssh key onto the gitlab runner so that the remote_exec-blocks of our terraform files are able to ssh connect to our created resources/droplets.

update-stage

update:
  stage: update
  before_script:
    - echo "deb http://ppa.launchpad.net/ansible/ansible/ubuntu trusty main" >> /etc/apt/sources.list
    - apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 93C4A3FD7BB9C367
    - apt-get update
    - apt-get install wget software-properties-common -y
    - apt-get install ansible -y
    - apt-get install jq -y
  script:
    - mkdir -p /root/.ssh
    - echo "$PRIVKEY_PLAIN" | tr -d '\r' > ~/.ssh/id_rsa
    - chmod -R 700 ~/.ssh
    - VERSION=$(mvn help:evaluate -Dexpression=project.version -q -DforceStdout)
    - "FALLBACK_DROPLET_ID=$(curl -sX GET https://api.digitalocean.com/v2/droplets?tag_name=FALLBACK -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" | jq -c '.droplets[0]'.id)"
    - "FALLBACK_DROPLET_IP=$(curl -sX GET https://api.digitalocean.com/v2/droplets?tag_name=FALLBACK -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" | jq -c '.droplets[0].networks.v4[0]'.ip_address)"
    - "PROD_DROPLET_ID=$(curl -sX GET https://api.digitalocean.com/v2/droplets?tag_name=PROD -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" | jq -c '.droplets[0]'.id)"
    - "PROD_DROPLET_IP=$(curl -sX GET https://api.digitalocean.com/v2/droplets?tag_name=PROD -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" | jq -c '.droplets[0].networks.v4[0]'.ip_address)"
    - "FLOATING_IP=$(curl -sX GET https://api.digitalocean.com/v2/floating_ips -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" | jq -c '.floating_ips[0]'.ip)"
    - FALLBACK_DROPLET_IP="${FALLBACK_DROPLET_IP%\"}" # cut off leading "
    - FALLBACK_DROPLET_IP="${FALLBACK_DROPLET_IP#\"}" # cut off trailing "
    - PROD_DROPLET_IP="${PROD_DROPLET_IP%\"}" # cut off leading "
    - PROD_DROPLET_IP="${PROD_DROPLET_IP#\"}" # cut off trailing "
    - FLOATING_IP="${FLOATING_IP%\"}" # cut off leading "
    - FLOATING_IP="${FLOATING_IP#\"}" # cut off trailing "
    - echo $FALLBACK_DROPLET_IP > /etc/ansible/hosts
    - sed -i -- 's/#host_key_checking/host_key_checking/g' /etc/ansible/ansible.cfg
    - ansible-playbook .infrastructure/digitalocean/conf/updateWebgui-playbook.yml -e "registry_url=$TF_VAR_DOCKER_REGISTRY_URL username=$TF_VAR_DOCKER_REGISTRY_USERNAME repository=$DOCKER_REGISTRY_REPO_NAME version=$VERSION"
    - "curl -X DELETE -H \"Content-Type: application/json\" -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" -d '{\"resources\":[{\"resource_id\":\"'$PROD_DROPLET_ID'\",\"resource_type\":\"droplet\"}]}' \"https://api.digitalocean.com/v2/tags/PROD/resources\""
    - "curl -X DELETE -H \"Content-Type: application/json\" -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" -d '{\"resources\":[{\"resource_id\":\"'$FALLBACK_DROPLET_ID'\",\"resource_type\":\"droplet\"}]}' \"https://api.digitalocean.com/v2/tags/FALLBACK/resources\""
    - "curl -X POST -H \"Content-Type: application/json\" -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" -d '{\"resources\":[{\"resource_id\":\"'$PROD_DROPLET_ID'\",\"resource_type\":\"droplet\"}]}' \"https://api.digitalocean.com/v2/tags/FALLBACK/resources\""
    - "curl -X POST -H \"Content-Type: application/json\" -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" -d '{\"resources\":[{\"resource_id\":\"'$FALLBACK_DROPLET_ID'\",\"resource_type\":\"droplet\"}]}' \"https://api.digitalocean.com/v2/tags/PROD/resources\""
    - "curl -X POST -H \"Content-Type: application/json\" -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" -d '{\"type\":\"assign\",\"droplet_id\":\"'$FALLBACK_DROPLET_ID'\"}' \"https://api.digitalocean.com/v2/floating_ips/$FLOATING_IP/actions\""
  only:
    - master

Our aims for that stage are:

  1. Pull the most current image from the docker registry on the current FALLBACK-droplet
  2. remove the currently running container (with the outdated version) on our FALLBACK-droplet
    1. Yes, through our very first pipeline run there will be no container running
  3. start a new container with the updated version on the FALLBACK-droplet
  4. retag the current FALLBACK-droplet (from FALLBACK to PROD)
  5. retag the current PROD-droplet (from PROD to FALLBACK)
  6. update the floating IP to point to the new PROD-droplet

So first we are reading our droplet’s IDs and IPs (via the DigitalOcean-API and jq). Then we are using ansible to call a playbook which copies a script to the droplet that we want to update – which pulls the image, removes the current container and starts a new one.

After that we are updating (switching) the tags on our droplets again with the help of the DigitalOcean-API (because our FALLBACK-droplet becomes the PROD-droplet during the update and then the previous PROD-droplet becomes the FALLBACK-droplet which will be updated when the next pipeline is running).

At the end we update the target behind our floating IP again by using the DigitalOcean-API.

The playbook looks like the following:

- name: Transfer and execute a script.
  hosts: all
  remote_user: root
  vars:
    ansible_python_interpreter: /usr/bin/python3
    registry_url: "{{ registry_url }}"
    username: "{{ username }}"
    repository: "{{ repository }}"
    version: "{{ version }}"
  tasks:
      - name: Copy and Execute the script
        script: updateWebgui.sh {{ registry_url }} {{ username }} {{ repository }} {{ version }}

and the mini script (updateWebgui.sh) looks like:

#!/bin/bash
docker pull $1/$2/$3/webgui:$4
docker rm -f webgui || true
docker run -d -p 8080:8080 --name=webgui $1/$2/$3/webgui:$4

and both are located in the .infrastructure/digitalocean folder (subfolder conf):

infra

First pipeline run

So – we are done – time to let the magic begin. On our initial push the deploy-infrastructure stage will create our infrastructure (which takes some time). Usually the initial pipeline run takes about 10 minutes.

pipelineruns

(The three pipeline runs you see in the screenshot were all „first“ / initial runs as I always destroyed the infrastructure after each run)

When the pipeline has finished we have a complete setup and are able to open a browser, enter http://www.gotcha-app.de and enjoy the first version of our web application.

version10gui

Additionally let us ssh into our PROD-droplet and ensure that the right container is running:

webgui10container

Also we can see that during the pipeline run an image of our web application was pushed to the gitlab container registry:

registry

I won’t paste screenshots of all the created resources here again, as I did that already in a previous chapter when we executed the terraform commands by hand. But here is just one screenshot of our droplets in the state right after the infrastructure creation (droplet01 = PROD, droplet02 = FALLBACK, floating IP tied to droplet01) and then right after the update stage (droplet01 switched to FALLBACK, droplet 02 switched to PROD and the floating IP now tied to droplet02).

beforeUpdate

afterupdate

second pipeline run

Now that everything is up and running, let us modify our web application. We will enhance the text displayed when the button is clicked (MainView.java) and update the version in our pom.xml to 1.1-SNAPSHOT (for the complete files please refer to the example repository containing the complete code).

MainView.java

public MainView(@Autowired MessageBean bean) {
    Button button = new Button("Click me",
            e -> Notification.show(bean.getMessage() 
                    + " YAY THIS IS V1.1-SNAPSHOT!!!"));
    add(button);
}

pom.xml

<version>1.1-SNAPSHOT</version>

Cool, let us push that and wait for our pipeline to do its work.

pipeline2

As you can see this time the pipeline only needed ~7 minutes. We saved time because the whole infrastructure was already existing and there was no need for terraform to do anything.

terraformnoneed

As you can see the droplet tags were switched again. This time we updated droplet01 and marked that one as PROD:

prod

And after the pipeline run we are ready to retry our web application:

version2

Let us have a look at the PROD-droplet again to check that the correct container is running:

11ondroplet

If we had a big bug in our new version we could now just tie the floating IP to the FALLBACK-Droplet and all requests would be forwarded to the pre-update version of our web application.

If you want to try it out yourself and you don’t have a DigitalOcean account yet feel free to use my referral link which gives you 100$ credit to play around a bit 🙂

The full code is available on https://gitlab.com/mebbinghaus/codinghaus_20190302_dofloatingip

If you have any questions or feedback feel free to leave a comment or contact me via the form or email. Thank you for reading.

Use DigitalOcean Volumes to backup your droplet’s data (by example)

In this tutorial we will create an automated pipeline which creates three droplets on DigitalOcean. We will have a script running on each droplet, which will create important files constantly. To backup those file regularly we will extend our infrastructure code to not only create three droplets, but also one DigitalOcean Volume for each droplet. Volumes offer block storage which can simply be mounted to our droplets. Perfect to keep our backups.

When the droplets are destroyed/recreated (when changing the droplet’s infrastructure code) the worker scripts running on our droplets will look for backups in the mounted DigitalOcean Volume so that it can continue from the last backups state.

To let terraform (the tool that will automatically create/manage our infrastructure on AWS and DigitalOcean as we described it in our infrastructure code) always know about the current state of all our infrastructure resources (droplets and volumes) we will use an AWS S3 bucket as a terraform backend to save the terraform.tfstate.

What you need if you want to try out this example for yourself is an account for Amazon AWS (access key and secret key) and a DigitalOcean account. AWS offers a free tier where you are allowed to create S3 buckets without costs as long as you don’t upload more than 5GB. If you don’t have a DigitalOcean account yet, feel free to use my referral link and get 100$ credit for free to try things out without costs:  Create account

So let us directly jump into action. The steps we will do are the following:

  1. Create the infrastructure code which will create our droplets and volumes and mount the volumes onto the corresponding droplets.
  2. create the worker script which will run on each droplet and create the files we want to backup.
  3. create a cronjob which will do the backups in regular intervals
  4. create the gitlab-ci.yml which will contain the code describing what our pipeline should do (execute terraform to build our infrastructure).

Here is an overview of the complete project (you can find the project here on gitlab):

project

Let us start:

Create the infrastructure code

The infrastructure code is split into two folders. First let us have a look at ’setup_backend‘:

provider "aws" {
  region = "${var.AWS_REGION}"
  access_key = "${var.AWS_ACCESSKEY}"
  secret_key = "${var.AWS_SECRETKEY}"
}

resource "aws_s3_bucket" "terraform_state" {
  bucket = "${var.AWS_BUCKET_NAME}"

  versioning {
    enabled = true
  }

  lifecycle {
    prevent_destroy = true
  }
}

It only contains the S3 bucket resource. So the only purpose is to create that bucket. In our pipeline code, we will add an if statement where we will check if the S3 bucket already exists. And only if it doesn’t exist we will tell terraform to create that resource.

You can find the if-statement in the code snippet from our gitlab-ci.yml below. It looks like:

if aws s3api head-bucket --bucket "de.codinghaus.s3" 2>/dev/null ; then echo "Skipping Backend-Creation, S3-Bucket already existing!"; else cd setup_backend && terraform init && terraform plan && terraform apply -auto-approve && cd ..; fi

If you wonder where the variables come from, have a look at vars.tf:

variable "AWS_REGION" {}
variable "AWS_TF_STATEFILE" {}
variable "AWS_BUCKET_NAME" {}
variable "AWS_ACCESSKEY" {}
variable "AWS_SECRETKEY" {}

Yes, they are empty. We do not want our AWS-Keys to appear in our source code. So what we do is the following: We use gitlab CI/CD environment variables which can be found at „Settings“ –> „CI/CD“ –> „Variables“. There we can add environment variables which are available on the pipeline runners, where our pipeline code is executed. terraform will now recognize, that we defined variables in vars.tf. Then it will try to find values for those variables. As we didn’t set values at the definition, terraform will next search for environment variables in the form of TF_VAR_. So e.g. for AWS_REGION terraform will look for an environment variable TF_VAR_AWS_REGION. As terraform is executed on the gitlab runner, we only have to define the needed gitlab environment variables and terraform will find them including their values:

environmentvariables

‚resources‘ is the main folder which contains the code for three droplets, three volumes and the attachments from droplet to volume. It also contains the definition of the backend resource (in our case the S3 bucket).

First let us have a look at the droplet resources in workers.tf:

provider "digitalocean" {
  token = "${var.DO_TOKEN}"
}

/* here we tell terraform to create three droplets (as we defined
the gitlab environment variable TF_VAR_DO_WORKERCOUNT = 3). The names
will be worker0X (worker01, worker02 and worker03).*/
resource "digitalocean_droplet" "worker" {
  image = "ubuntu-16-04-x64"
  name = "${format("worker%02d", count.index + 1)}"
  count = "${var.DO_WORKERCOUNT}"
  region = "${var.DO_REGION}"
  size = "${var.DO_SIZE}"
  private_networking = true
  ssh_keys = [
    "${var.DO_KEYFINGERPRINT}"
  ]
  connection {
    user = "root"
    type = "ssh"
    private_key = "${file("~/.ssh/id_rsa")}"
    timeout = "2m"
  }

/* now we will copy the workerscript and the the backup /cronjob stuff
onto the droplet)
  provisioner "file" {
    source = "../../scripts/workerscript.sh"
    destination = "/workerscript.sh"
  }

  provisioner "file" {
    source = "../../scripts/backup_to_volume.sh"
    destination = "/etc/backup_to_volume.sh"
  }

  provisioner "file" {
    source = "../../scripts/backup_crontab"
    destination = "/etc/cron.d/backup_crontab"
  }

/* as the last step during the droplet creation, we give all scripts
the execute flag, install zip which is needed the create the
backups in zipped form and run the workerscript (see below). /*
provisioner "remote-exec" {
  inline = [
    "sleep 10",

    "chmod +x /workerscript.sh",
    "chmod +x /etc/backup_to_volume.sh",
    "chmod +x /etc/cron.d/backup_crontab",

    "apt-get install zip -y",

    "nohup bash /workerscript.sh &",
    "sleep 2"
  ]
}
}

Now, that the droplets are created, let us have a look at the code describing our volumes:

/* the first resource block describes our volumes. It will be executed
after the droplet creation has finished (see the depends_on attribute).
the name of each volume will be worker0X-backup. */
resource "digitalocean_volume" "worker-backup-volume" {
  count = "${var.DO_WORKERCOUNT}"
  region = "${var.DO_REGION}"
  name = "${format("worker%02d", count.index + 1)}-backup"
  size = "${var.DO_VOLUME_SIZE}"
  initial_filesystem_type = "${var.DO_VOLUME_FS_TYPE}"
  depends_on = ["digitalocean_droplet.worker"]

/* this ensures that terraform will never try to destroy/recreate
our volumes (which contain our importand backups!) /*
  lifecycle {
    prevent_destroy = true
  }
}

/* when the droplets and the volumes exist, it is time to couple
each volume to each droplet. Therefore we can use the
digitalocean_volume_attachment resource type. */
resource "digitalocean_volume_attachment" "worker-backup-volume-attachments" {
  count = "${var.DO_WORKERCOUNT}"
  droplet_id = "${element(digitalocean_droplet.worker.*.id, count.index)}"
  volume_id  = "${element(digitalocean_volume.worker-backup-volume.*.id, count.index)}"
  depends_on = ["digitalocean_volume.worker-backup-volume"]
}

At last, let us have a look at backend.tf

provider "aws" {
  region = "${var.AWS_REGION}"
  access_key = "${var.AWS_ACCESSKEY}"
  secret_key = "${var.AWS_SECRETKEY}"
}

/* this tells terraform where to look for the current state of
our infrastructure (in the form of a terraform.tfstate file).
We are not able to use variable references in the backend definition.
Therefore we have the values hard coded here. But still .. we don't
want sensitive data (aws keys) in the code here. So we will once again
use gitlab environment variables here. We will run the following in
our gitlab pipeline script:
terraform init -backend-config="access_key=$TF_VAR_AWS_ACCESSKEY" -backend-config="secret_key=$TF_VAR_AWS_SECRETKEY"
which will contain the two keys.
*/
terraform {
  backend "s3" {
    bucket = "de.codinghaus.s3"
    key = "dovolumetutorial_terraform.tfstate"
    region = "eu-central-1"
    access_key = ""
    secret_key = ""
  }
}

Create the worker script

#!/bin/bash
mkdir /workdir
touch workerscript.log
# wait until backup volume is mounted
while [ ! -d /mnt/$HOSTNAME\_backup ]
do
    echo "waiting for DO-Volume to be mounted...." >> workerscript.log
    sleep 10
done
echo "DO-Volume is now mounted!" >> workerscript.log
# restore backup from volume to droplet if existing
newestBackup=$(ls -Frt /mnt/$HOSTNAME\_backup | grep "[^/]$" | tail -n 1)
if [ -z "$newestBackup" ]; then
echo "No backup found on DO-Volume!" >> workerscript.log
else
cp /mnt/$HOSTNAME\_backup/$newestBackup /workdir
unzip /workdir/$newestBackup -d /workdir
rm -rf /workdir/$newestBackup
echo "Found backup ($newestBackup) on DO-Volume! Copied and unzipped it into working directory!" >> workerscript.log
fi
newestFile=$(ls -Frt /workdir | grep "[^/]$" | tail -n 1)
counter=0
if [ -z "$newestFile" ]; then
echo "No previous file found. Starting with 1!" >> workerscript.log
counter=1
else
echo "Found file to start with! ($newestFile)" >> workerscript.log
((counter+=$newestFile))
((counter+=1))
fi
while [ 1 ]; do
sleep 5
fallocate -l 1M /workdir/$counter
echo "Created file: $counter" >> workerscript.log
((counter+=1))
done

When the script is started (which is the case when our droplets were (re-)created) it checks whether the volume is already mounted and if not waits until it is. This is necessary because the script is executed at the end of the remote-exec block of the worker resource (which are the droplets). The volumes are created AFTER the the worker droplets. So the script will be executed before the volumes are created and therefore before they are mounted onto the droplets.

When the volume is mounted, the script checks if there are backups in the volume mount directory. If so the newest backup is copied to the /workdir and unzipped. The file creation then continues from the backup’s last file. If there is no backup the script will start to create file 0. Whether there was a backup or not, the script will then – in the main loop – create a file every five seconds.

Create a cronjob

Well, this is pretty straight-forward. We need a crontab which is copied to /etc/cron.d (see infrastructure code / worker.tf):

*/5 * * * * root /etc/backup_to_volume.sh

and the script doing the backup:

#!/bin/bash
date=$(date +%s)
timestamp=$(date +"%Y_%m_%d_%H_%M")
if [ ! -d /mnt/$HOSTNAME\_backup ]; then
    echo "----- ZIPPING FILES AND COPYING ZIP TO DO-VOLUME (1/2) -----"
    cd /workdir
    zip -r $timestamp-backup.zip ./*
    mv $timestamp-backup.zip /mnt/$HOSTNAME\_backup

    echo "----- DELETING OUTDATED BACKUPS (2/2) -----"
    cd /mnt/$HOSTNAME\_backup
    ls -tp | grep -v '/$' | tail -n +6 | xargs -I {} rm -- {} #https://stackoverflow.com/questions/25785/delete-all-but-the-most-recent-x-files-in-bash
fi

As you can see the script is divided into two parts.

  1. First we create a zip-file containing all files our worker script created into /workdir. That zip file is then uploaded to the volume (the cool thing is that it seems as if we are just moving the zip file into another directory because we mounted our volume into that directory).
  2. In the second part we delete the most outdated backups (only keeping the 5 newest)

Create the gitlab-ci.yml

stages:
  - deploy-infrastructure

deploy-infrastructure:
  stage: deploy-infrastructure
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
  before_script:
    - apk add --no-cache python3
    - apk add --no-cache curl
    - apk add --no-cache bash
    - mkdir -p ~/.ssh
    - echo "$TF_VAR_DO_PRIVKEY_PLAIN" | tr -d '\r' > ~/.ssh/id_rsa
    - chmod -R 700 ~/.ssh
    - curl -O https://bootstrap.pypa.io/get-pip.py
    - echo "export PATH=~/.local/bin:$PATH" >> ~/.bash_profile
    - python3 get-pip.py --user
    - source ~/.bash_profile
    - pip install awscli --upgrade --user
    - aws configure set aws_access_key_id $TF_VAR_AWS_ACCESSKEY
    - aws configure set aws_secret_access_key $TF_VAR_AWS_SECRETKEY
  script:
    - cd .infrastructure
    - if aws s3api head-bucket --bucket "de.codinghaus.s3" 2>/dev/null ; then echo "Skipping Backend-Creation, S3-Bucket already existing!"; else cd setup_backend && terraform init && terraform plan && terraform apply -auto-approve && cd ..; fi
    - cd resources
    - terraform init -backend-config="access_key=$TF_VAR_AWS_ACCESSKEY" -backend-config="secret_key=$TF_VAR_AWS_SECRETKEY"
    - terraform plan
    - terraform apply -auto-approve
  only:
    - master

We only have one stage here. In the before_script-block we first add our private key to ~/.ssh/id_rsa to be able to connect to the droplets via ssh (from the gitlab runner). After that we install and configure awscli which we need to check if the S3 bucket is already existing or if it has to be created.

After checking (and creating) the S3 bucket we run the three terraform commands: init, plan and apply which will create our infrastructure in the first run, and then recreate (or not) in all future runs. During the init step (with our backend-config given) terraform will look at the terraform.tfstate file in the S3 bucket so it knows what the current state of our infrastructure is and if there is a need to (re-)create resources or not.

Now that we have everything we need, we have to do one more thing: Create the environment variables in gitlab. You can find/add them under „Settings“ –> „CI/CD“ –> „Variables“.

If we push our code the first time the pipeline will start and terraform will create an AWS S3 bucket and then create our infrastructure (left screenshot). From the second pipeline run on there is no need to create the S3 bucket as it already exists. Our pipeline script will recognize this and terraform will initialize the backend to know if our already existing resources need to be recreated (right screenshot).

After the pipeline finishes, let us have a look at the DigitalOcean-GUI and check that everything is there:

Now we will let the infrastructure run some time and see how the backups are created on the volumes.

When we connect to our droplet via ssh and have a look at the workerscript log we will see:

scriptWithoutBackup

After a couple of minutes let us see what’s inside our backup directory. We will find some uploaded backups now:

backupcontent

Okay, as it seems everything works. Now, we want to see if the backup mechanism works. Oh, have a look at our worker droplet’s image in the workers.tf.. it is ubuntu 16.04:

resource "digitalocean_droplet" "worker" {
  image = "ubuntu-16-04-x64"
....

A pretty old ubuntu version! We now want to update that to ubuntu 18.04. So we will change workers.tf to:

resource "digitalocean_droplet" "worker" {
  image = "ubuntu-18-04-x64"
....

Then we will push that change. During the triggered pipeline run, terraform will now recognize that the image for the worker droplets was changed. It will decide, that it has to destroy the three droplets and recreate them. The volumes will stay untouched: Nothing changed here and they are marked as undestroyable anyways. But the attachments for the volumes have to change, because the droplet ids will change. No problem: Terraform will automatically destroy the attachments and recreate them.

recreate2

When the triggered pipeline run is finished after the push we only have one problem:

Yes, the droplets were recreated and the attachments for their volumes, too. But (in contrast to the initial creation) the volumes are not automatically mounted onto our droplets (this was a new insight for me when writing this tutorial, I assumed it would be mounted again automatically). The result: Our workerscript will stay forever in the loop waiting for volume to be mounted:

# wait until backup volume is mounted
while [ ! -d /mnt/$HOSTNAME\_backup ]
do
    echo "waiting for DO-Volume to be mounted...." >> workerscript.log
    sleep 10
done

Well..unexpected.. but let us fix this in a simple (naive) way: We have two cases. The initial creation of our infrastructure (including the automatical mount of our volumes onto the droplets), and recreation of our droplets (not including the automatical mount of our volumes). We will extend the loop and assume, that if after two minutes no volume is mounted, we are in the latter case. And so we will then try to mount the volume manually:

# wait until backup volume is mounted
loopCount=0
while [ ! -d /mnt/$HOSTNAME\_backup ]
do
    echo "waiting for DO-Volume to be mounted...." >> workerscript.log
    sleep 10
    ((loopCount+=10))
    if (( loopCount > 120 )); then
        echo "Volume not mounted after two minutes, trying manual mount..."  >> workerscript.log
        mkdir -p /mnt/$HOSTNAME\_backup; mount -o discard,defaults /dev/disk/by-id/scsi-0DO_Volume_$HOSTNAME-backup /mnt/$HOSTNAME\_backup; echo /dev/disk/by-id/scsi-0DO_Volume_$HOSTNAME-backup /mnt/$HOSTNAME\_backup ext4 defaults,nofail,discard 0 0 | sudo tee -a /etc/fstab
    fi
done

The one liner for the manual mount was copied from the DigitalOcean-GUI (see screenshot below). I just replaced the hard coded hostname with the $HOSTNAME environment variable.

do

With that change in our waiting loop, the result of a second run of the pipeline (including the recreation of the droplets / attachments) looks like the following:

result

As you can see, the backups from the volume are now found. The newest is taken and our worker script will continue from the state from the newest backup.

Yay!

As already mentioned you can find the full example code at gitlab on https://gitlab.com/mebbinghaus/codinghaus_20181028_dovolumes_backup

If you have questions or feedback, feel free to leave a comment or contact me via twitter or mail.

CI/CD-Pipeline for a java microservice application running on Docker Swarm mode cluster on DigitalOcean with Maven/Docker/Gitlab/Terraform/Ansible

Hey friends,

I finally finished to implement a continuous delivery/deployment pipeline for my java based microservice architectured application which is hosted on Gitlab and running on a swarm mode cluster on DigitalOcean (created/deployed automatically by Terraform). In this article I want to share an overview of the way I got everything running (I won’t go too deep into details because that would make the article too long, but if you are interested in more informations about any part feel free to contact me). This is not meant to be a best practice or the best way to implement it because I don’t know if it is..quite contrary – I’m pretty sure there are smarter ways to do it and it is still work in progress. But maybe you get some inspiration by the way I did it.

My application is a search engine infrastructure (the gui for the search engine is still missing) and consists (as by now) of three microservices (crawler, index and a gui for configuring the crawler) and four jar-files which are created. I will skip the source code of the application and just tell you how the jar files / microservices relate / work together.

The crawler microservice is for scanning the net and collecting everything that was found. It uses Nutch as the crawl engine. Besides Nutch I created an api-service as a jar file, which is also running in the nutch/crawler-container and which is used by the crawler-gui-microservice for communication (configuration/control of the crawler).

The crawler gui is a vaadin 10 frontend application which uses the crawler api to display informations of the crawler and which offers screens for configuring/controlling the crawler.

The last microservice is the index. When the crawler has finished one crawling cycle (which always repeats via a cronjob) it pushes the crawled data to the index-service (based on solr) which makes the data searchable (so the index will be used by the search engine gui microservice which is about to be implemented next).

Info on persistence: I am using GlusterFS to generate one Gluster-Volume for the crawler and one Gluster-Volume for the index. Those volumes are mounted as bind-mounts on every swarm mode cluster node so that the crawled/indexed data are reachable from every cluster node – so it is not important on which node a service is running.

The application code is hosted on Gitlab and I am using the free gitlab ci runners for my CI/CD-Pipeline. The running application itself is hosted/deployed on droplets from DigitalOcean.

Gitlab CI works by defining a gitlab-ci.yml on the top level of a repository and this is what my file looks like:

image: maven:latest

services:
  - docker:dind

cache:
  paths:
    - .m2/repository

variables:
  DOCKER_HOST: tcp://docker:2375
  DOCKER_DRIVER: overlay2
  MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"

stages:
  - test
  - build
  - release
  - deploy-infrastructure
  - deploy-services

test:
  stage: test
  script:
    - mvn clean test
  only:
    - master

build:
  stage: build
  script:
    - mvn clean install
  artifacts:
    paths:
      - gotcha-crawler/gotcha-crawler-webgui/target/gotcha-crawler-webgui-0.1-SNAPSHOT.jar
      - gotcha-crawler/gotcha-crawler-api/target/gotcha-crawler-api-0.1-SNAPSHOT.jar
  only:
    - master

release:
  stage: release
  image: docker:latest
  before_script:
    - docker login $TF_VAR_DOCKER_REGISTRY_URL --username $TF_VAR_DOCKER_REGISTRY_USERNAME --password $TF_VAR_DOCKER_REGISTRY_PASSWORD
  script:
    - docker build --tag=gotcha-index ./gotcha-index
    - docker tag gotcha-index docker.gotcha-app.de/gotcha/index:latest
    - docker push docker.gotcha-app.de/gotcha/index
    - docker build --tag=gotcha-crawler ./gotcha-crawler
    - docker tag gotcha-crawler  docker.gotcha-app.de/gotcha/crawler:latest
    - docker push docker.gotcha-app.de/gotcha/crawler
    - docker build --tag=gotcha-crawlergui ./gotcha-crawler/gotcha-crawler-webgui
    - docker tag gotcha-crawlergui docker.gotcha-app.de/gotcha/crawlergui:latest
    - docker push docker.gotcha-app.de/gotcha/crawlergui
  only:
    - master

deploy-infrastructure:
  stage: deploy-infrastructure
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
  before_script:
    - apk add --no-cache python3
    - apk add --no-cache curl
    - mkdir -p ~/.ssh
    - echo "$TF_VAR_DO_PRIVKEY_PLAIN" | tr -d '\r' > ~/.ssh/id_rsa
    - chmod -R 700 ~/.ssh
    - curl -O https://bootstrap.pypa.io/get-pip.py
    - python3 get-pip.py --user
    - touch ~/terraform.log
    - chmod 777 ~/terraform.log
    - echo "export PATH=~/.local/bin:$PATH" >> ~/.bash_profile
    - echo "export TF_LOG_PATH=~/terraform.log" >> ~/.bash_profile
    - echo "export TF_LOG=TRACE" >> ~/.bash_profile
    - source ~/.bash_profile
    - pip install awscli --upgrade --user
    - aws configure set aws_access_key_id $TF_VAR_AWS_ACCESSKEY
    - aws configure set aws_secret_access_key $TF_VAR_AWS_SECRETKEY
  script:
    - cd .infrastructure
    - if aws s3api head-bucket --bucket "de.gotcha-app.s3" 2>/dev/null ; then echo "Skipping Backend-Creation, S3-Bucket already existing!"; else cd setup_backend && terraform init && terraform plan && terraform apply -auto-approve && cd ..; fi
    - cd live/cluster
    - terraform init -backend-config="access_key=$TF_VAR_AWS_ACCESSKEY" -backend-config="secret_key=$TF_VAR_AWS_SECRETKEY"
    - terraform plan
    - until terraform apply -auto-approve; do echo "Error while using DO-API..trying again..."; sleep 2; done
    - ls -la
    - pwd
  artifacts:
    paths:
      - ~/terraform.log
  only:
    - master

deploy-services:
  stage: deploy-services
  before_script:
    - echo "deb http://ppa.launchpad.net/ansible/ansible/ubuntu trusty main" >> /etc/apt/sources.list
    - apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 93C4A3FD7BB9C367
    - apt-get update
    - apt-get install ansible -y
    - apt-get install jq -y
  script:
    - mkdir -p /root/.ssh
    - echo "$TF_VAR_DO_PRIVKEY_PLAIN" | tr -d '\r' > ~/.ssh/id_rsa
    - chmod -R 700 ~/.ssh
    - "GOTCHA_MASTER_IP=$(curl -sX GET https://api.digitalocean.com/v2/droplets -H \"Authorization: Bearer $TF_VAR_DO_TOKEN\" | jq -c '.droplets[] | select(.name | contains(\"gotchamaster00\")).networks.v4[0]'.ip_address)" # extrahieren der IP-Adresse von gotchamaster00 via DO-API und jq anhand des dropletnamens
    - GOTCHA_MASTER_IP="${GOTCHA_MASTER_IP%\"}"
    - GOTCHA_MASTER_IP="${GOTCHA_MASTER_IP#\"}"
    - echo "$GOTCHA_MASTER_IP"
    - export GOTCHA_MASTER_IP
    - echo $GOTCHA_MASTER_IP > /etc/ansible/hosts
    - scp -o StrictHostKeyChecking=no docker-compose.yml root@$GOTCHA_MASTER_IP:/root/docker-compose.yml
    - ansible all --user=root -a "docker stack deploy --compose-file docker-compose.yml --with-registry-auth gotcha"
  only:
    - master

As you can see, it has 5 stages: test, build, release and deploy-infrustructure and deploy-services and they pretty much do, what their names tell:

The test-stage does a mvn clean test.

The build-stage does a mvn clean install and so generates the (spring boot) jar files which are in fact running in the docker containers containing the microservices.

The release-stage builds docker images (based on Dockerfiles) which contain and run the built jar files and pushes them to a SSL-secured docker registry which I installed on a hetzner cloud machine.

The deploy-infrastructure stage is where my server cluster for the docker swarm mode is created. This is done by creating 6 Droplets on DigitalOcean (the smallest ones for 5$ per month each). After creation some tools are installed on those machines (Docker, Docker Compose, GlusterFS Server/Client (for the volumes). When this stage is finished, I have a ready configured swarm mode cluster – but no services running on the swarm. This last step is done in the last stage.

Info on the cluster creation: The pipeline is (of course) idempotent. The server cluster during the deploy-infrastructure code is only created, if it is not already existent. To achieve this, I am using a „remote backend“ for terraform (in fact an aws S3 bucket). When terraform creates a server on e.g. DigitalOcean, a file called terraform.tfstate is created, which contains the information on which servers were created and what their state is. By using a backend for terraform, I tell terraform to save this file on a S3 bucket on aws. So the first time, the deploy-infrastructure stage is run, terraform will create the droplets and save their states in a file terraform.tfstate in the S3 bucket. Every continuing time the terraform stage is triggered, terraform will look into the file saved in the S3 bucket and will skip the creation as the file says, that they were already created.

The deploy-services stage is where my docker images are pulled from the external registry and deployed onto the (in the former stage) created droplets. For that to work, I am requesting the IP of one of the master (docker swarm) droplets via the DigitalOcean API and extracting the IP from the response, containing all created droplets. Then I am using ansible to execute the docker stack deploy command. This command pulls the needed docker images from the external registry and deploys containers on all worker nodes (as configured in the docker-compose.yml). A good thing about that command is, that it can also be used to deploy the services initially into the swarm, and also to update the services already running on the swarm. The docker-compose.yml looks like the following:

version: "3.6"
services:
  index:
    image: docker.gotcha-app.de/gotcha/index
    deploy:
      mode: global
    ports:
     - "8983:8983"
    volumes:
     - "index-volume:/opt/solr/server/solr/backup"
    secrets:
     - index-username
     - index-password
  crawler:
    image: docker.gotcha-app.de/gotcha/crawler
    deploy:
      mode: replicated
      replicas: 1
    volumes:
     - "crawler-volume:/root/nutch/volume"
    secrets:
     - index-username
     - index-password
     - crawler-api-username
     - crawler-api-password
  crawlergui:
     image: docker.gotcha-app.de/gotcha/crawlergui
     deploy:
       mode: global
     ports:
     - "8082:8082"
     secrets:
     - crawler-api-username
     - crawler-api-password
     - crawler-gui-username
     - crawler-gui-password
secrets:
  index-username:
    external: true
  index-password:
    external: true
  crawler-api-username:
    external: true
  crawler-api-password:
    external: true
  crawler-gui-username:
    external: true
  crawler-gui-password:
    external: true
volumes:
  crawler-volume:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /mnt/crawler-volume
  index-volume:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /mnt/index-volume

(If you are wondering where the secrets come from: They are creating inside the docker swarm on the deploy-infrastructure stage (during terraform apply) and are using the contents of environment variables which are created / maintained in gitlab. The volumes are also created during the terraform steps in the deploy-infrastructure stage.)

Summary

So what is happening after I push code changes:

The CI/CD-Pipeline of Gitlab starts and one stage is executed after another. If one stage fails, the pipeline fails in general and won’t continue. If everything works well: The jar files are created –>Docker-Images are pushed to my private docker registry –> the latest docker images are pulled from the registry and the running containers on my swarm cluster are updated one after another (or for the very first push: the cluster is created/intialized – which means: The Droplets are created on DigitalOcean and everything that is needed is installed/configured on those droplets (docker / docker compose / GlusterFS-Server/-Client / …)  – all executed automatically by terraform. Then the pushed docker images are pulled and deployed automatically on the created/running cluster. The pipeline duration is about 20 minutes including the server cluster creation, and about 10 minutes if the server cluster is already running.

pipeline

Right now it is already possible to access (for example) the crawler gui by calling <Public-IP-address-of-one-swarm-worker-droplet>:8082/crawlerui. The next steps will be adding a reverse proxy (probably traefik) which then redirects calls to the corresponding services and bind my domain with the swarm cluster nodes.

I like that tech stack a lot and I am looking forward to extend/improve the pipeline and the application. If you have suggestions / comments / questions feel free to leave them – I will appreciate it a lot!

Greetings!