Beginner’s guide to database backups with ZFS

Preword

The goal of this article is to summarize everything needed to go from zero to being able to having a working setup of zfs that integrates with mariadb. We’ll briefly go through how to create a backup (snapshot) using zfs, and then leverage said backup to restore data.

This guide assumes that you have two machines capable of hosting a modern linux OS (if you want to follow along). I’m using debian here, but very little of the instructions are debian specific - aside from the actual installation of zfs, the rest is distro agnostic.

Why bother with all this?

If you’re a developer, and your application regularly messes up your test tables, or even the whole database, zfs snapshots can be a fast and painless way to reset your environment. It is especially handy to quickly get your test database up and running once again, without having to wait out a full mysqldump + import cycle.

Install ZFS

In /etc/apt/sources.list add:

deb http://deb.debian.org/debian bookworm-backports main contrib non-free

Or instead of bookworm, check whatever the current distro is with lsb_release -cs.

sudo apt update
sudo apt install linux-headers-amd64
sudo apt install -t stable-backports zfsutils-linux

Important note: zfs is not accepted into the linux kernel due to licensing incompatibilities between the license it uses and the GPL license used by the linux kernel. What this means for you, is that zfs is an out-of-tree module, and kernel updates might break the interoperability between the linux kernel and zfs. This means that zfs is best suited for usage with LTS disros, or distros that just have good support around zfs in their package repository.

Create a datapool

In this blog post, I will use a file based pool to make it easier to follow and reproduce the steps. In practice, pools would not be built on actual storage devices, and not files. The file based method is just a risk-free approach for learning and testing.

To create a file-based toy mirror pool, we just need to create some empty files with identical sizes:

cd /tmp
truncate -s 4G 0.raw
truncate -s 4G 1.raw

This means we’ll have two file based “disks” for a 1:1 redundancy.

zpool create -o ashift=12 testpool mirror /tmp/0.raw /tmp/1.raw

Otherwise, using zfs with disks works in a similar way, with the only caveat being that zfs does not identify disks by names like /dev/sdX. A translation between /dev/sdX and what zfs prefers can be found with the following command:

ls -l /dev/disk/by-id

You should see labels like:

root@debian-test:~# ls -l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root  9 May  4 11:45 ata-CT1000BX500SSD1_2341E87DD9DC -> ../../sda
lrwxrwxrwx 1 root root 10 May  4 11:45 ata-CT1000BX500SSD1_2341E87DD9DC-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 May  4 11:45 ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X -> ../../sdb
lrwxrwxrwx 1 root root 10 May  4 11:45 ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May  4 11:45 ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X-part2 -> ../../sdb2

So with actual disks, your command would look something like:

zpool create -o ashift=12 testpool ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X-part1 ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X-part2

zfs prefers these names, because it is possible for the names /dev/sdX to get reassigned, so this way zfs can be absolutely sure that it’s always going to target the same device or partition.

ashift=12?

This value has more to do with hardware: SSDs are often optimized for writing data in 4KiB blocks, because it is the most common block size across different filesystems. So this ashfit=12 instructs zfs that the pool we are about to create should also use a 4KiB sector size. This also aligns with the InnoDB storage engine, which typically uses 4KiB page sizes as well. Finally, it is useful to remark that ashift’s param actually refers to an exponent of 2, so ashift=12 translates to 4096.

With all of that out of the way, let’s check the pool we just created:

root@debian-test:/tmp# zpool status testpool
  pool: testpool
 state: ONLINE
config:

        NAME            STATE     READ WRITE CKSUM
        testpool        ONLINE       0     0     0
          mirror-0      ONLINE       0     0     0
            /tmp/0.raw  ONLINE       0     0     0
            /tmp/1.raw  ONLINE       0     0     0

errors: No known data errors

Sidenote: How to “move” mariadb to zfs if it’s not already there?

A simple solution is to use a bind mount. Essentially, a bind mount is like a symbolic link, but on a filesystem level. It is used to create a mount point for an existing filesystem, so that it is visible at a different spot in the namespace. The data is not copied or duplicated in any way.

This is not really something to do in production. But this does provide a simple way to follow along without having to nuke your test DB.

On a clean server, you would just ask zfs to create a new mountpoint, e.g.

zfs create -o mountpoint=/var/lib/mysql production/mysql

Then, install mariadb and populate it. mariadb will initialize directly into the zfs backed /var/lib/mysql and there is no need to move data or use tricks like bind mount.

cd /testpool
mkdir mysql
systemctl stop mysql
rsync -av /var/lib/mysql/ /testpool/mysql/
chown -R mysql:mysql mysql
sudo mount --bind /testpool/mysql /var/lib/mysql
echo "/testpool/mysql /var/lib/mysql none bind 0 0" | sudo tee -a /etc/fstab
systemctl start mysql

Taking the snapshots

Create a dataset in zfs for holding the mysql installation:

zfs create testpool/mysql

Now we’re at the stage where we can finally take snapshots.

In a production system, you definitely want to automate taking snapshots, so in cron, you’d add something like this:

0 * * * * /sbin/zfs snapshot testpool/mysql@hourly-$(date +\%Y%m%dT\%H\%M)
0 0 * * * /sbin/zfs destroy -r $(/sbin/zfs list -t snapshot -o name -s creation | grep 'testpool/mysql@hourly-' | head -n -24)

Or use something more advanced like systemd timers.

Let’s take a snapshot manually right now:

root@debian-test:/testpool/mysql# zfs snapshot testpool/mysql@first
root@debian-test:/testpool/mysql# zfs list -t snapshot
NAME                   USED  AVAIL     REFER  MOUNTPOINT
testpool/mysql@first     0B      -       96K  -

The backup server should have the same version of mariadb installed as what is generally used on your main server(s). Besides that, just install zfs on the backup server, and make sure that you create a pool there as well. You will use this pool to receive snapshots over from the main server.

The transfer of snapshots happen with the zfs send and zfs recv commands. zfs send outputs to standard output, so you can pipe it, or use file-redirection for a direct output into a file. Likewise, zfs recv takes inputs from standard input, so it can take input from a pipe, or a file.

In my case, there’s no direct network access between my two test machines, so I must export from zfs into a file, and then reimport on the second test machine:

root@debian-test:/testpool/mysql# zfs send testpool/mysql@first > /tmp/mysql-backup.zfs

And on the receiving end:

root@debian-test2:/# zfs recv testpool/mysql < /tmp/mysql-backup.zfs

Otherwise, if the two machines both using zfs have direct network access, it’s possible to of course just send things directly:

root@sender-host:/# zfs send somepool/mysql@some-unique-name | ssh root@target-host zfs recv somepool/mysql

Fishing the data out of the backup

We’ll establish that the following data existed at the time of creating the snapshot.

MariaDB [test]> SELECT * FROM table1;
+---------+---------+
| column1 | column2 |
+---------+---------+
|       1 | test    |
+---------+---------+
1 row in set (0.000 sec)


MariaDB [(none)]> SELECT @@hostname;
+-------------+
| @@hostname  |
+-------------+
| debian-test |
+-------------+
1 row in set (0.000 sec)

On the backup server create the directory structure and load the backup:

cd /testpool
mkdir mysql
zfs recv testpool/mysql < /tmp/mysql-backup.zfs

You should be able to see that everything is transfered:

root@debian-test2:/testpool/mysql# zfs list -t snapshot
NAME                   USED  AVAIL  REFER  MOUNTPOINT
testpool/mysql@first     0B      -  3.31M  -
root@debian-test2:/testpool/mysql# ls
aria_log.00000001   ddl_recovery-backup.log  ib_logfile0        mysql                    mysql_upgrade_info  test
aria_log_control    ddl_recovery.log         ibtmp1             mysqld-relay-bin.000001  performance_schema
bin-mariadb.000001  ib_buffer_pool           master.info        mysqld-relay-bin.000002  relay-log.info
bin-mariadb.index   ibdata1                  multi-master.info  mysqld-relay-bin.index   sys

Now we can use zfs built in ability to mount things from zfs to your OS filesystem to use this backup as a working copy:

cd ~
zfs set mountpoint=legacy testpool/mysql
zfs set mountpoint=/var/lib/mysql testpool/mysql
chown -R mysql:mysql /var/lib/mysql
/sbin/restorecon -rv /var/lib/mysql

We should see a usable instance of mariadb in /var/lib/mysql:

root@debian-test2:~# ls /var/lib/mysql/
aria_log.00000001   ddl_recovery-backup.log  ib_logfile0        mysql                    mysql_upgrade_info  test
aria_log_control    ddl_recovery.log         ibtmp1             mysqld-relay-bin.000001  performance_schema
bin-mariadb.000001  ib_buffer_pool           master.info        mysqld-relay-bin.000002  relay-log.info
bin-mariadb.index   ibdata1                  multi-master.info  mysqld-relay-bin.index   sys

Start the db service:

systemctl start mariadb

And check:

MariaDB [(none)]> SELECT * FROM test.table1;
+---------+---------+
| column1 | column2 |
+---------+---------+
|       1 | test    |
+---------+---------+
1 row in set (0.001 sec)

MariaDB [(none)]> SELECT @@hostname;
+--------------+
| @@hostname   |
+--------------+
| debian-test2 |
+--------------+
1 row in set (0.000 sec)

Starting the database will take a good 15 mins (or more) on a realistic dataset, but it’ll still be an order of magnitude faster than trying to load in a mysqldump! Occasionally, a snapshot might catch mariadb mid-transaction, leaving it in an unclean state.

In that case, you’ll probably fail to start the db, and see something like this in the mysql logs:

[ERROR] Found 1 prepared transactions!

For the sake of restoring some random table, you’ll probably NOT want to do all the work of figuring out which transaction failed and roll it back. You can force your way into the DB via recovery mode:

mariadbd --user=root --skip-grant-tables --innodb-force-recovery=4

Grab what you need, without having to think too much here.

Once the required data has been recovered, stop the service and destroy the dataset:

systemctl stop mariadb
zfs destroy -r testpool/mysql

This method can also be used to restore a full database, in which case you just use zfs send and receive and just overwrite everything.

Beyond the basics

This guide merely looked at the tip of the iceberg of features zfs has to offer. There’s a whole host of advanced configuration parameters that we have not touched upon. Doing so would be beyond the scope of this simple guide, but I strongly recommend checking out the Debian Wiki, as well as the zfs official docs. Moreover, actually using zfs with mariadb on a production environment would require some careful consideration, as by default, innodb and zfs have some overlapping functionalities.