Build Your Own (BYO)Autoscaling Solution using Open Source Tools on Akamai’s cloud computing services(Linode)

Deepak
13 min readJun 6, 2023
Photo by Jesper Aggergaard on Unsplash

Autoscaling is the process of dynamically allocating resources to match performance requirements. As the volume of work grows, an application may need additional resources to maintain the desired performance levels and satisfy service-level agreements (SLAs).

In this post I am going to talk about how you can Build Your Own (BYO) autoscaling solution using open source tools on Akamai’s cloud computing services (Disclaimer — At the time of writing this article I am proudly working for Akamai Technologies), previously called as Linode( why Linode? 🤔true it can be any other cloud or even on-premises but i am going to use Linode because of its developer friendliness and ease to use). You may also ask why building an autoscaling solution when cloud providers already have it? or more likely why do we need this in microservices era(world of Kubernetes)? Fair enough, let me try to answer some of these:-

  1. Not all public cloud provider provides autoscaling solution. Most of hyperscalers does provide but how much are we paying for it? Nothing comes for free, you might have heard of numerous stories about how cost on hyperscalers spiral and go off the roof. What about clouds where you do not have autoscaling ? or your on-premises infrastructure where admins are still adding servers manually whenever they see alert on dashboard
  2. Flexibility — You always have to play by cloud provider rules i.e. autoscaling based on CPU, memory or other predefined parameter. Can you autoscale based on application specific parameters, maybe threads?
  3. Kubernetes — Not all applications are suited for microservices architecture but that doesn’t mean that they don’t need autoscaling.

Let’s see what we are gonna build in this post.

Simply put we want to create an automation engine that would scale infrastructure based on servers load profile.

In our solution we are going to have 3 major pillars —

  1. Observability Stack — Consisting of Grafana, Prometheus and node-exporter to monitor our infrastructure
  2. Application Stack — Basically servers/apps that we are gonna monitor using observability stack
  3. Automation or autoscaling engine — Built using Jenkins pipeline

Architectures are great way of visualizing the system before its incarnation. Here is our simplistic architecture.

High Level Architecture

We are going to use few open source tools for building our own autoscaling solution, so lets understand these components in brief.

  1. We have group of servers under App stack — These are the VM’s that we would like to monitor and autoscale based on trigger definition.
  2. Prometheus Stack — An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach. We are going to use it for monitoring our App Stack. More about Prometheus here . Within Prometheus stack we have below sub components:-

a) Node Exporter — Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors. Think of it as an agent that will sit on VM’s to be monitored and communicate with Prometheus server.

b) Alert Manager — The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integrations such as email, PagerDuty, OpsGenie, or many other mechanisms thanks to the web-hook receiver.

3. Grafana Stack — Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources. In our solution we are ingesting logs collected by Prometheus to Grafana server for visualization and further act as trigger mechanism for initiating our Jenkins Pipeline to scale VM’s(add or remove VM’s).Within Grafana we will write our alerts to trigger Jenkins pipeline using webhooks, more on it in later section.

4. Jenkins — Jenkins is an open source automation server. It helps automate the parts of software development related to building, testing, and deploying, facilitating continuous integration and continuous delivery. We are going to use Jenkins to add and remove servers from our app stack based on alert raised by Grafana system(which in turn relies on monitoring data collected from app servers via. Prometheus stack)

Another important parts of our project are :-

  1. Akamai’s cloud computing services — Linode is an awesome alternate cloud provider that focuses on Linux workloads. More about Linode here . We are going to use Linode for hosting our end to end solution. You may also sign up and get $100 credit which would be more than enough if you want to try out this solution.
  2. Ansible — Ansible is a radically simple IT automation system. It handles configuration management, application deployment, cloud provisioning, ad-hoc task execution, network automation, and multi-node orchestration. More about Ansible here. In our project we are going to use Ansible heavily for automation of:-

a) Provisioning Linodes

b) Deploying Prometheus, Grafana, alert manager and node-exporter

c) Configuring various part of our solution.

Please note that I already have running instance of Jenkins that I am going to reuse for building pipeline but if you would like to create one then please feel free to do so by either using one click app on Linode or use ansible to do it.

Note — Please note that I am fully aware that in production environment you most probably run your app stack behind a load balancer(nodebalancer in Linode) but for sake of simplicity I am not using it here. But it is very much possible to provision node balancer using ansible and attach provisioned VM’s to it.

Let’s jump into real action ... You can clone my GitHub repo from here — https://github.com/deepakjd2004/BYO-Autoscaling

Step 1 — I will start by showing my repo structure

I will describe the important file/folder. If you are already familiar with Ansible then you would have easily guessed what each folder might contain, if so then please feel free to skip this step, else please follow along.

a) env — I am going to use virtualenv for running my ansible code. Virtualenv is great way to isolated Python environments. You may run it without virtualenv.

b) Roles — Roles provide a framework for fully independent, or interdependent collections of variables, tasks, files, templates, and modules. We have to deploy 4 different types of applications, namely Prometheus, Grafana, node-exporter and alertmanager. For each application we would have one role. Each role would have its own handler, variable, templates etc. If you would like to know more about how role works then check here

roles

c) deployment.yml — this is the main deployment file that contains detail about provisioning Linodes, installing components (like Prometheus, node-exporter, alertmanager and grafan) and configuring them as needed. Here is the deployment.yml

---
- name: create linodes
hosts: localhost
vars_files:
- vars
- secret_vars
vars:
labels:
- prometheus
- grafana
- node_exporter
tasks:
- name: Create Linodes for hosting services
include_tasks: create_linode.yml
loop: "{{ labels }}"
loop_control:
loop_var: label

- name: install and configure prometheus
hosts: prometheus
user: root
roles:
- alertmanager
- prometheus

- name: install and configure node_exporter
hosts: all
user: root
roles:
- prometheus_node_exporter

- name: install and configure grafana
hosts: grafana
user: root
roles:
- grafana

Above code is self explanatory but let me touch on important sections:-

In tasks section, i am calling create_linode.yml(file for provisioning Linode, we will check this file later) for provisioning Grafana, Prometheus and node_exporter server. Once Linodes are provisioned then I am installing and configuring Prometheus, node_exporter(on all nodes) and Grafana on server as identified by labels.

Here is the content of create_linode.yml

---
- name: Create Linode for hosting {{ label }}
linode.cloud.instance:
label: "{{ label }}"
api_token: "{{ token }}"
type: "{{ type }}"
region: "{{ region }}"
image: "{{ image }}"
root_pass: "{{ root_pass }}"
authorized_keys: "{{ ssh_keys }}"
private_ip: true
group: "{{ group }}"
state: present
register: linode

- name: Get info about an instance by {{ label }}
linode.cloud.instance_info:
api_token: "{{ token }}"
label: "{{ label }}"
register: info

- name: Set facts
set_fact:
node1: "{{ info.instance.label }}"
node_ip1: "{{ info.instance.ipv4[0] }}"
node_priv1: "{{ info.instance.ipv4[1] }}"

- name: Update inventory
blockinfile:
path: ./hosts
marker: "# {mark} {{ label }}"
block: |
[{{ label }}]
{{ hostvars[inventory_hostname]['node_ip1'] }}

I am using Ansible Linode module to provision Linodes and also updating inventory (hosts) with ip address so that we can use it for configuring various component of our solution.

Step 2 — Prepare your developer machine by installing pre-requisites.

a) Install python3, pip if not already done so

b) Install Virtualenv if you don’t already have it. Virtualenv can be installed following this doc. If you would not like to execute this project from virtualenv, this is also absolutely fine, just ignore step (b).

c) Activate Virtualenv and install some packages that we are gonna use.

source venv/bin/activate
pip install -r requirements.txt

requirement.txt you would find in project repo here — https://github.com/deepakjd2004/BYO-Autoscaling/blob/main/requirements.txt

Main package that we are installing here are:-

Linode API — for provisioning Linodes

Ansible — for automation

d) During provisioning of Linode you would need to supply root password( we have variablized it in create_linode.yaml and passed it via secret_vars file). But we don’t want to expose our real password in text file. For hiding our secrets we are going to use ansible vault. You can learn about ansible vault here.

First setup your vault password in .vault-pass file in home directory(again it can be stored anywhere, you just need to mention its location in ansible.cfg file). Run below commands to get token for your secrets using vault. We are going to use vault for protecting Linode API tokens and root password for Linodes VM.

ansible-vault encrypt_string 'Pass0w0rd@123' --name 'root_pass'
ansible-vault encrypt_string 'this is linode api token' --name 'token'

You can get Linode api token using this link

Once you have executed above command you need to copy token and root_pass into secret_vars file, something like below.

e). Configure your ansible.cfg file — Here is mine, it is self explanatory.

[defaults]
host_key_checking = False
VAULT_PASSWORD_FILE = ./.vault-pass
enable_plugins = linode
inventory_path = ./hosts
interpreter_python = /usr/bin/python3
log_path = debug.log

f) vars file — This is file used by create-linode.yml for provisioning our VM’s. Here is the code and it is also self explanatory

# linode vars
ssh_keys: ssh-rsa AAAAB3NzaC1yocal
prefix: app_node
cluster_name: POC
type: g6-nanode-1
region: ap-south
image: linode/debian10
group: app_node
linode_tags: app

Now let’s move to next step, which is about deployment.

Step 3 — Deployment

(i) Run Ansible notebook to deploy Linodes vm &install/configure Prometheus, Grafana and other components on respective Linodes vm

ansible-playbook deployment.yml -i inventory

You would see ansible-playbook being executed and output as below

ansible-playbook deployment.yml -i inventory

PLAY [create linodes] *************************************************************************************************************************************************

PLAY [prometheus] *****************************************************************************************************************************************************

PLAY [node_exporter] **************************************************************************************************************************************************

PLAY [prometheus] *****************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************
ok: [170.187.xxxx]

TASK [prometheus : config file] ***************************************************************************************************************************************
changed: [170.187.xxxx]

TASK [prometheus : alert config file] *********************************************************************************************************************************
changed: [170.187.xxx]

TASK [prometheus : Copy systemd init file] ****************************************************************************************************************************
changed: [170.187.230.210]

TASK [prometheus : Start prometheus service] **************************************************************************************************************************
changed: [170.187.xx]

TASK [prometheus : Check if prometheus is accessible] *****************************************************************************************************************
ok: [170.187.xxx]

RUNNING HANDLER [prometheus : Reload systemd] *************************************************************************************************************************
changed: [170.187.xxx]

PLAY [grafana] ********************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************
ok: [139.177xxxx]

If deployment fails at any particular step then you can fix and easily restart it from the failed step using below command.

ansible-playbook deployment.yml -i inventory --start-at-task="<task name from play"

Once ansible playbook executes successfully you would hopefully see that Linodes are provisioned and Prometheus, Grafana, alertmanager, node-exporter all installed. We will verify it in next step.

Step 4 — Verification — In this step we are going to check if all components are installed correctly.

a) Check Prometheus — Go to http://<public_ip_of_promethus_server>:9090/targets

If you are able to see all the three nodes i.e. Grafana, Prometheus and node-exporter, then we are good. Please note that we installed node-exporter on all Linode VM’s and thus they are sending logs to Prometheus.

b) Check if correct metrics are being emitted. — Go to each VM’s http://<public_ip_of_each_VM>:9100/metrics. You should see like below

c). Let’s check our Grafana Server —

Login into http://<public_ip_of_grafana_vm>:3000/

Username is admin and password(either you can use default or mention it in roles/grafana/vars/main.yml). You can also use vault to hide it.

Login screen would guide you to add datasources and dashboard. Just follow the prompt and add your Prometheus datasource and start exploring the data. I also imported “Node Exporter Full with Node Name” dashboard to visualize the data. Here is screen grab of my dashboard.

That’s good. We now have fully functional Prometheus, Grafana, alertmanager (verify it by going to http://<public_ip_of_prometheus_vm>:9090/alerts link) and node-exporter.

Step 5 — Integrate Grafana with Jenkins pipeline.

a) Lets first define alert in Grafana — for this project i want to create alert for monitoring my memory utilization of VM and fire alert when it reaches more than 75%.

Within Grafana UI → Home → Alerting → Alert Rules

Here is the screenshot of my configuration.

Here is my data source query, pretty basic — node_memory_Active_bytes/node_memory_MemTotal_bytes*100

You can customize the rule as you need, their are many options that I am not going to touch base otherwise this post will never end.

b) Create Contact Points for web hook — Home → Alerting → Contact Points

Select Webhook in integration block and URL is — http://JENKINS_URL/generic-webhook-trigger/invoke

Replace JENKINS_URL with your Jenkins instance url.

For authenticating to your Jenkins instance you would have to pass username and token. For that first create a user in Jenkins and they add usename/password here. Within your Jenkins instance go to

Manage Jenkins → Security → Manage Users and create a user.

If you want to test the integration then you may create a dummy pipeline in Jenkins and create a alert to see if jenkins pipeline gets triggered or not.

c) Lets create our Jenkins Pipeline —My Jenkinfile is quite simple and as below:-

Please note that I used linode-cli in below code for provisioning Linode within pipeline. But you can use Ansible as well within pipeline to be consistent in using tools.

def linodeId = ""
def linodeStatus = ""
pipeline {
agent any

environment {
LINODE_CLI_TOKEN = credentials('linode-cli-token')
ROOT_PASS = credentials('linode-root-pass')
}

stages {
stage('Infrastructure Check') {
steps {
sh "linode-cli linodes list"
}
}

stage('Provision') {
steps {
script {
linodeId = sh(
script: "linode-cli linodes create --root_pass $ROOT_PASS --region ap-south --image linode/debian10 --type g6-nanode-1 --json",
returnStdout: true
).trim()

// Extract the Linode ID from the JSON response
linodeId = linodeId.replaceAll('\\[|\\]', '').split(',')[0].split(':')[1].trim()

echo "Linode ID: ${linodeId}"

// Wait for Linode creation to complete
timeout(time: 2, unit: 'MINUTES') {
// def linodeStatus = ""

// Loop until the Linode is running or the timeout is reached
while (linodeStatus != '[running]') {
def linodeInfo = sh(
script: "linode-cli linodes view ${linodeId} --json",
returnStdout: true
).trim()

// Parse the JSON response and extract the status field
echo "${linodeInfo}"
def statusField = readJSON(text: linodeInfo).status
echo "${statusField}"
linodeStatus = statusField ? statusField.toString().toLowerCase() : ""
echo "${linodeStatus}"

if (linodeStatus != '[running]') {
sleep time: 10, unit: 'SECONDS'
}
}
}

echo 'Linode creation completed'
}
}
}

stage('Verify') {
steps {
script {

if (linodeStatus == '[running]') {
echo 'Linode is running, verification successful'
} else {
error 'Linode is not running, verification failed'
}
}
}
}

stage('Deprovision') {
steps {
sh "linode-cli linodes delete ${linodeId}"
}
}
}
}

Pipeline code is quite straightforward , it starts by defining couple of variables that we want to use across multiple stages of pipeline, next we have defined few secrets in Jenkins and are referencing it back in our pipeline using environments section.

Next, we have 3 stages in the pipeline

Stage 1- Infrastructure Check — making sure that we have Linode CLI installed on Jenkins server.

Also as we are using readJSON() function so please make sure that you have https://plugins.jenkins.io/pipeline-utility-steps/ plugin installed on your jenkins instance.

Stage 2 — Provisioning — Here we are using Linode cli to create new instance and wait till it comes in running state.

Stage 3 — Verification — Basically checking if newly provisioned Linode VM is up and running

Stage 4 — De-provisioning — you most probably won’t need it but i didn’t want to spin off ten’s of VM during my testing so i added this stage to delete VM’s.

d) Create your pipeline. — Within Jenkins UI, create new “multi branch pipeline” and name it appropriately.

Within pipeline → Build Trigger, make sure that you checked “Generic Webhook Trigger”

e) Copy/paste jenkinfile created in step c in pipeline definition section.

Step 6. — Run some script to generate load on one of monitored VM to generate alert and see your pipeline running and VM getting provisioned.

What‘s next? — Actually a lot, I agree that it will take lot more to make this enterprise class or production ready solution but i am sure it can be achieved. Purpose of this demo project is to show how we can scale up and down automatically using open source tools.

If you are reading this line then kudos for reaching the very end of this story and thank you for reading. I hope you learned something today.

--

--