Basic Introduction to Prometheus (Part 3)

Disclaimer: this is a continuation of my previous articles 1, 2 where I decided to jot down my own discovery into Prometheus. Familiarity with PHP and awk is helpful for following along, but not essential.

The goal of this article, is to investigate how to gather metrics in a situation, where servers might be ephemeral. Basically, we’re asking, how do we configure Prometheus to monitor things in a more dynamic way?

A more dynamic environment

In the previous steps, the environment was always consistent, two databases, one master, and one replica:

root@linuxpc:~# maxctrl list servers
┌─────────┬──────────────┬──────┬─────────────┬─────────────────┬───────────┬─────────────────┐
│ Server  │ Address      │ Port │ Connections │ State           │ GTID      │ Monitor         │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────┼─────────────────┤
│ server1 │ 192.168.2.37 │ 3306 │ 0           │ Master, Running │ 0-1-61043 │ MariaDB-Monitor │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────┼─────────────────┤
│ server2 │ 192.168.2.99 │ 3306 │ 0           │ Slave, Running  │ 0-1-61043 │ MariaDB-Monitor │
└─────────┴──────────────┴──────┴─────────────┴─────────────────┴───────────┴─────────────────┘

Where two services exist, a readwrite split, and a read only route:

root@linuxpc:~# maxctrl list services
┌────────────────────┬────────────────┬─────────────┬───────────────────┬─────────┐
│ Service            │ Router         │ Connections │ Total Connections │ Targets │
├────────────────────┼────────────────┼─────────────┼───────────────────┼─────────┤
│ Read-Write-Service │ readwritesplit │ 0           │ 0                 │ server1 │
├────────────────────┼────────────────┼─────────────┼───────────────────┼─────────┤
│ Read-Only-Service  │ readconnroute  │ 0           │ 0                 │ server2 │
└────────────────────┴────────────────┴─────────────┴───────────────────┴─────────┘

But it might happen that additional databases need to be quickly deployed. In production systems, this could happen when there’s a(n anticipated) surge in activity, maybe due to some event, like an IPO in a financial environment.

In my case, a script exists that can take the server2 VM, and quickly create and deploy an identical copy of it. This means the database is already installed, replication configured, apache and php already set up, new IP address allocated, basically all ready to plug into maxscale, etc.

Adding a new server:

root@linuxpc:~# maxctrl list servers
┌─────────┬──────────────┬──────┬─────────────┬─────────────────┬───────────┬─────────────────┐
│ Server  │ Address      │ Port │ Connections │ State           │ GTID      │ Monitor         │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────┼─────────────────┤
│ server1 │ 192.168.2.37 │ 3306 │ 0           │ Master, Running │ 0-1-61043 │ MariaDB-Monitor │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────┼─────────────────┤
│ server2 │ 192.168.2.99 │ 3306 │ 0           │ Slave, Running  │ 0-1-61043 │ MariaDB-Monitor │
├─────────┼──────────────┼──────┼─────────────┼─────────────────┼───────────┼─────────────────┤
│ server3 │ 192.168.2.98 │ 3306 │ 0           │ Slave, Running  │ 0-1-61043 │ MariaDB-Monitor │
└─────────┴──────────────┴──────┴─────────────┴─────────────────┴───────────┴─────────────────┘
root@linuxpc:~# maxctrl list services
┌────────────────────┬────────────────┬─────────────┬───────────────────┬──────────────────┐
│ Service            │ Router         │ Connections │ Total Connections │ Targets          │
├────────────────────┼────────────────┼─────────────┼───────────────────┼──────────────────┤
│ Read-Write-Service │ readwritesplit │ 0           │ 0                 │ server1          │
├────────────────────┼────────────────┼─────────────┼───────────────────┼──────────────────┤
│ Read-Only-Service  │ readconnroute  │ 0           │ 0                 │ server2, server3 │
└────────────────────┴────────────────┴─────────────┴───────────────────┴──────────────────┘

Now, we would have to go and update prometheus.yml manually to make it aware of server3, which previously did not exist at all. It is not hard to see how this could get rather tedious if we kept adding new servers.

Sidenote on exporters

While it is possible to create one exporter to monitor all the databases, it is much more complicated to do so, than to create a basic exporter to monitor a single server, and drop it into each of the servers that we want to monitor. This keeps the logic of the exporter much more simple, and Prometheus has no problems with sending http requests to as many exporters on different servers as you like.

Therefore, we will keep it simple here, and create a simple exporter, and distribute it on each DB server.

Service Discovery

If we want to make Prometheus aware of new databases as they are created, we need a way to discover new databases. A rudimentary way is to just use the output of maxctrl list servers itself. It will always contain all the databases we’re interested in, it produces a simple text output, so it’s workable. In real systems, there’ll be more sophisticated methods, but for the sake of understanding, we will not touch on those here.

What does Prometheus need from us here?

Basically, we need JSON file, where we have an array of objects like:

[
  { "targets":["host:port"], "labels":{"server": "some server", "role": "some role"}},
  {...},
  {...} 
]

This should be saved into a location that prometheus can access, and then reference it in prometheus.yml. It should go just under scrape_configs:. I have decided to remove everything from the previous two articles, so basically added this to the bottom of a clean slate `prometheus.yml:

  - job_name: "mariadb_user_from_php"
    scrape_interval: 15s
    metrics_path: "/prometheus_lab/mariadb_user_from_php.php"
    file_sd_configs:
      - files:
          - /opt/prometheus/prometheus-3.9.1.linux-amd64/file_sd/mariadb_servers.json

To recap:

If this all exists, then that is all the configuration we need in Prometheus itself.

The Generator script

So now that we know precisely what we need, it’s time to script it:

#!/bin/bash

SERVER2="192.168.2.99"
REMOTE_DIR="/opt/prometheus/prometheus-3.9.1.linux-amd64/file_sd"
REMOTE_FILE="${REMOTE_DIR}/mariadb_servers.json"
TMP="$(mktemp)"
TMP_REMOTE="${REMOTE_FILE}.tmp"

maxctrl list servers | awk -f ./maxscale_to_file_sd.awk | sed -e 'H;1h;$!d;x; s/\(.*\),/\1/' > $TMP

ssh root@"$SERVER2" "mkdir -p '$REMOTE_DIR'"
scp -q "$TMP" root@"$SERVER2":"$TMP_REMOTE"
ssh root@"$SERVER2" "mv '$TMP_REMOTE' '$REMOTE_FILE'"

rm -f "$TMP"

HOSTS=$(maxctrl list servers | awk 'BEGIN{FS="│";}{if($2 ~ /server[0-9]/){gsub(/ /, ", $3); print $3}}')

for host in $HOSTS; do
      scp -q ./mariadb_user_from_php.php root@$host:/var/www/html/prometheus_lab/
      ssh root@"$host" "chown www-data:www-data /var/www/html/prometheus_lab/mariadb_user_from_php.php"
      ssh root@"$host" "chmod 644 /var/www/html/prometheus_lab/mariadb_user_from_php.php"
done

The idea here is that we now have a dynamic list of servers, anytime this script is invoked.

The awk script is the part that actually generates the json:

#!/bin/awk -f

BEGIN{
    FS="│";
    print "["
}
{
    if($2 ~ /server/){
        for (i=1; i<=NF; i++) {
            gsub(/^[ \t]+|[ \t]+$/, ", $i);
        }
        sub(/,.*/, ", $6);
        print "{\x22targets\x22: [\x22"$3":80\x22],", "\x22labels\x22: {\x22server\x22: \x22"$2"\x22,","\x22role\x22: \x22"$6"\x22}},"
    }
}
END{
    print "]";
}

There’s probably better ways to generate json, but I wanted to get this over with quick and focus on Prometheus, than fancier ways of generating json.

Here, for simplicity’s sake, I just reused the user connection monitoring script as the exporter:

<?php
header('Content-Type: text/plain; version=0.0.4; charset=utf-8');

$mysqli = new mysqli("127.0.0.1", "devel", "devel", "information_schema");
if ($mysqli->connect_errno) {
    echo "# HELP mariadb_exporter_up 1 if connection succeeded, 0 otherwise\n";
    echo "# TYPE mariadb_exporter_up gauge\n";
    echo "mariadb_exporter_up 0\n";
    exit;
}

echo "# HELP mariadb_exporter_up 1 if connection succeeded, 0 otherwise\n";
echo "# TYPE mariadb_exporter_up gauge\n";
echo "mariadb_exporter_up 1\n\n";

$sql = <<<SQL
SELECT p.USER, s.ATTR_VALUE, p.ID
FROM information_schema.PROCESSLIST p
LEFT JOIN performance_schema.session_connect_attrs s
  ON p.ID = s.PROCESSLIST_ID;
SQL;

$res = $mysqli->query($sql);
if (!$res) {
    echo "# HELP mariadb_exporter_query_ok 1 if query succeeded, 0 otherwise\n";
    echo "# TYPE mariadb_exporter_query_ok gauge\n";
    echo "mariadb_exporter_query_ok 0\n";
    exit;
}

echo "# HELP mariadb_exporter_query_ok 1 if query succeeded, 0 otherwise\n";
echo "# TYPE mariadb_exporter_query_ok gauge\n";
echo "mariadb_exporter_query_ok 1\n\n";

$byId = [];
while ($row = $res->fetch_row()) {
    [$user, $attr, $id] = $row;

    if ($id === null) {
        continue;
    }
    $id = (string)$id;

    if (isset($byId[$id])) {
        continue;
    }

    $user = $user !== null ? trim($user) : '';
    $attr = $attr !== null ? trim($attr) : '';

    $byId[$id] = ["user" => $user, "attr" => $attr];
}

$counts = [];
$total = 0;

foreach ($byId as $entry) {
    $k = $entry["user"];
    $counts[$k] = ($counts[$k] ?? 0) + 1;
    $total++;
}

echo "# HELP mariadb_userphp_connections Active MariaDB connections\n";
echo "# TYPE mariadb_userphp_connections gauge\n";

foreach ($counts as $user => $count) {
    $label = addcslashes($user, "\\\"\n");
    echo "mariadb_userphp_connections{user=\"$label\"} $count\n";
}

echo "\n# HELP mariadb_userphp_connections_total Total active MariaDB connections counted \n";
echo "# TYPE mariadb_userphp_connections_total gauge\n";
echo "mariadb_userphp_connections_total $total\n";
?>

Once this script is executed, and Prometheus is running, we should be able to see that it detects all servers correctly.

Verify

We can verify whether all servers are correctly added as targets via curl. The output is quite meaty, so I’ve grepped to reduce it to what we want to inspect:

root@debian-test:/opt/prometheus/prometheus-3.9.1.linux-amd64# curl -s http://localhost:9090/api/v1/targets?state=active | read_response | grep -E "health|instance|global" | sed -E 's/^\s+//'
"instance":"192.168.2.37:80",
"globalUrl":"http://192.168.2.37:80/prometheus_lab/mariadb_user_from_php.php",
"health":"up",
"instance":"192.168.2.99:80",
"globalUrl":"http://192.168.2.99:80/prometheus_lab/mariadb_user_from_php.php",
"health":"up",
"instance":"192.168.2.98:80",
"globalUrl":"http://192.168.2.98:80/prometheus_lab/mariadb_user_from_php.php",
"health":"up",
"instance":"localhost:9090",
"globalUrl":"http://debian-test:9090/metrics",
"health":"up",

We can indeed see that everything is up and running!

Let’s inspect the actual output of the exporter:

root@debian-test:/opt/prometheus/prometheus-3.9.1.linux-amd64# curl -s "http://localhost:9090/api/v1/query?query=mariadb_userphp_connections" | read_response | grep -E 'role|server|"user"|"[0-9]+"' | sed -E 's/^\s+//'
"role":"Slave",
"server":"server2",
"user":"devel"
"1"
"role":"Slave",
"server":"server2",
"user":"maxscaleuser"
"1"
"role":"Slave",
"server":"server2",
"user":"system user"
"2"
"role":"Master",
"server":"server1",
"user":"devel"
"1"
"role":"Master",
"server":"server1",
"user":"repl"
"1"
"role":"Master",
"server":"server1",
"user":"maxscaleuser"
"1"
"role":"Slave",
"server":"server3",
"user":"devel"
"1"
"role":"Slave",
"server":"server3",
"user":"maxscaleuser"
"1"
"role":"Slave",
"server":"server3",
"user":"system user"
"2"

And indeed, Prometheus is aware of all servers at once, through a dynamic configuration.

Conclusion

Prometheus provides a convenient and understandable solution for dynamic service discovery, since all it takes is a json file. It is then up to the user to decide how that json file should be populated, once again giving extreme flexibility to the user. In this example, by deriving the service discovery file from maxscale, we effectively made Prometheus follow the control plane automatically. When a server is added or removed from maxscale, Prometheus adapts without any manual configuration changes.