Beginner’s guide to database backups with ZFS
Preword
The goal of this article is to summarize everything needed to go from zero to being able to having a working setup of zfs that integrates with mariadb. We’ll briefly go through how to create a backup (snapshot) using zfs, and then leverage said backup to restore data.
This guide assumes that you have two machines capable of hosting a modern linux OS (if you want to follow along). I’m using debian here, but very little of the instructions are debian specific - aside from the actual installation of zfs, the rest is distro agnostic.
Why bother with all this?
If you’re a developer, and your application regularly messes up your test tables, or even the whole database, zfs snapshots can be a fast and painless way to reset your environment. It is especially handy to quickly get your test database up and running once again, without having to wait out a full mysqldump
+ import cycle.
Install ZFS
In /etc/apt/sources.list
add:
deb http://deb.debian.org/debian bookworm-backports main contrib non-free
Or instead of bookworm, check whatever the current distro is with lsb_release -cs
.
sudo apt update
sudo apt install linux-headers-amd64
sudo apt install -t stable-backports zfsutils-linux
Important note: zfs is not accepted into the linux kernel due to licensing incompatibilities between the license it uses and the GPL license used by the linux kernel. What this means for you, is that zfs is an out-of-tree module, and kernel updates might break the interoperability between the linux kernel and zfs. This means that zfs is best suited for usage with LTS disros, or distros that just have good support around zfs in their package repository.
Create a datapool
In this blog post, I will use a file based pool to make it easier to follow and reproduce the steps. In practice, pools would not be built on actual storage devices, and not files. The file based method is just a risk-free approach for learning and testing.
To create a file-based toy mirror pool, we just need to create some empty files with identical sizes:
cd /tmp
truncate -s 4G 0.raw
truncate -s 4G 1.raw
This means we’ll have two file based “disks” for a 1:1 redundancy.
zpool create -o ashift=12 testpool mirror /tmp/0.raw /tmp/1.raw
Otherwise, using zfs with disks works in a similar way, with the only caveat being that zfs does not identify disks by names like /dev/sdX
. A translation between /dev/sdX
and what zfs prefers can be found with the following command:
ls -l /dev/disk/by-id
You should see labels like:
root@debian-test:~# ls -l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root 9 May 4 11:45 ata-CT1000BX500SSD1_2341E87DD9DC -> ../../sda
lrwxrwxrwx 1 root root 10 May 4 11:45 ata-CT1000BX500SSD1_2341E87DD9DC-part1 -> ../../sda1
lrwxrwxrwx 1 root root 9 May 4 11:45 ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X -> ../../sdb
lrwxrwxrwx 1 root root 10 May 4 11:45 ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 4 11:45 ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X-part2 -> ../../sdb2
So with actual disks, your command would look something like:
zpool create -o ashift=12 testpool ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X-part1 ata-Samsung_SSD_860_EVO_250GB_S5HGNJ0N629003X-part2
zfs prefers these names, because it is possible for the names /dev/sdX to get reassigned, so this way zfs can be absolutely sure that it’s always going to target the same device or partition.
ashift=12?
This value has more to do with hardware: SSDs are often optimized for writing data in 4KiB blocks, because it is the most common block size across different filesystems. So this ashfit=12 instructs zfs that the pool we are about to create should also use a 4KiB sector size. This also aligns with the InnoDB storage engine, which typically uses 4KiB page sizes as well. Finally, it is useful to remark that ashift’s param actually refers to an exponent of 2, so ashift=12 translates to 4096.
With all of that out of the way, let’s check the pool we just created:
root@debian-test:/tmp# zpool status testpool
pool: testpool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/tmp/0.raw ONLINE 0 0 0
/tmp/1.raw ONLINE 0 0 0
errors: No known data errors
Sidenote: How to “move” mariadb to zfs if it’s not already there?
A simple solution is to use a bind mount. Essentially, a bind mount is like a symbolic link, but on a filesystem level. It is used to create a mount point for an existing filesystem, so that it is visible at a different spot in the namespace. The data is not copied or duplicated in any way.
This is not really something to do in production. But this does provide a simple way to follow along without having to nuke your test DB.
On a clean server, you would just ask zfs to create a new mountpoint, e.g.
zfs create -o mountpoint=/var/lib/mysql production/mysql
Then, install mariadb and populate it. mariadb will initialize directly into the zfs backed /var/lib/mysql and there is no need to move data or use tricks like bind mount.
cd /testpool
mkdir mysql
systemctl stop mysql
rsync -av /var/lib/mysql/ /testpool/mysql/
chown -R mysql:mysql mysql
sudo mount --bind /testpool/mysql /var/lib/mysql
echo "/testpool/mysql /var/lib/mysql none bind 0 0" | sudo tee -a /etc/fstab
systemctl start mysql
Taking the snapshots
Create a dataset in zfs for holding the mysql installation:
zfs create testpool/mysql
Now we’re at the stage where we can finally take snapshots.
In a production system, you definitely want to automate taking snapshots, so in cron, you’d add something like this:
0 * * * * /sbin/zfs snapshot testpool/mysql@hourly-$(date +\%Y%m%dT\%H\%M)
0 0 * * * /sbin/zfs destroy -r $(/sbin/zfs list -t snapshot -o name -s creation | grep 'testpool/mysql@hourly-' | head -n -24)
Or use something more advanced like systemd timers.
Let’s take a snapshot manually right now:
root@debian-test:/testpool/mysql# zfs snapshot testpool/mysql@first
root@debian-test:/testpool/mysql# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
testpool/mysql@first 0B - 96K -
The backup server should have the same version of mariadb installed as what is generally used on your main server(s). Besides that, just install zfs on the backup server, and make sure that you create a pool there as well. You will use this pool to receive snapshots over from the main server.
The transfer of snapshots happen with the zfs send
and zfs recv
commands. zfs send
outputs to standard output, so you can pipe it, or use file-redirection for a direct output into a file. Likewise, zfs recv
takes inputs from standard input, so it can take input from a pipe, or a file.
In my case, there’s no direct network access between my two test machines, so I must export from zfs into a file, and then reimport on the second test machine:
root@debian-test:/testpool/mysql# zfs send testpool/mysql@first > /tmp/mysql-backup.zfs
And on the receiving end:
root@debian-test2:/# zfs recv testpool/mysql < /tmp/mysql-backup.zfs
Otherwise, if the two machines both using zfs have direct network access, it’s possible to of course just send things directly:
root@sender-host:/# zfs send somepool/mysql@some-unique-name | ssh root@target-host zfs recv somepool/mysql
Fishing the data out of the backup
We’ll establish that the following data existed at the time of creating the snapshot.
MariaDB [test]> SELECT * FROM table1;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1 | test |
+---------+---------+
1 row in set (0.000 sec)
MariaDB [(none)]> SELECT @@hostname;
+-------------+
| @@hostname |
+-------------+
| debian-test |
+-------------+
1 row in set (0.000 sec)
On the backup server create the directory structure and load the backup:
cd /testpool
mkdir mysql
zfs recv testpool/mysql < /tmp/mysql-backup.zfs
You should be able to see that everything is transfered:
root@debian-test2:/testpool/mysql# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
testpool/mysql@first 0B - 3.31M -
root@debian-test2:/testpool/mysql# ls
aria_log.00000001 ddl_recovery-backup.log ib_logfile0 mysql mysql_upgrade_info test
aria_log_control ddl_recovery.log ibtmp1 mysqld-relay-bin.000001 performance_schema
bin-mariadb.000001 ib_buffer_pool master.info mysqld-relay-bin.000002 relay-log.info
bin-mariadb.index ibdata1 multi-master.info mysqld-relay-bin.index sys
Now we can use zfs built in ability to mount things from zfs to your OS filesystem to use this backup as a working copy:
cd ~
zfs set mountpoint=legacy testpool/mysql
zfs set mountpoint=/var/lib/mysql testpool/mysql
chown -R mysql:mysql /var/lib/mysql
/sbin/restorecon -rv /var/lib/mysql
We should see a usable instance of mariadb in /var/lib/mysql:
root@debian-test2:~# ls /var/lib/mysql/
aria_log.00000001 ddl_recovery-backup.log ib_logfile0 mysql mysql_upgrade_info test
aria_log_control ddl_recovery.log ibtmp1 mysqld-relay-bin.000001 performance_schema
bin-mariadb.000001 ib_buffer_pool master.info mysqld-relay-bin.000002 relay-log.info
bin-mariadb.index ibdata1 multi-master.info mysqld-relay-bin.index sys
Start the db service:
systemctl start mariadb
And check:
MariaDB [(none)]> SELECT * FROM test.table1;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1 | test |
+---------+---------+
1 row in set (0.001 sec)
MariaDB [(none)]> SELECT @@hostname;
+--------------+
| @@hostname |
+--------------+
| debian-test2 |
+--------------+
1 row in set (0.000 sec)
Starting the database will take a good 15 mins (or more) on a realistic dataset, but it’ll still be an order of magnitude faster than trying to load in a mysqldump! Occasionally, a snapshot might catch mariadb mid-transaction, leaving it in an unclean state.
In that case, you’ll probably fail to start the db, and see something like this in the mysql logs:
[ERROR] Found 1 prepared transactions!
For the sake of restoring some random table, you’ll probably NOT want to do all the work of figuring out which transaction failed and roll it back. You can force your way into the DB via recovery mode:
mariadbd --user=root --skip-grant-tables --innodb-force-recovery=4
Grab what you need, without having to think too much here.
Once the required data has been recovered, stop the service and destroy the dataset:
systemctl stop mariadb
zfs destroy -r testpool/mysql
This method can also be used to restore a full database, in which case you just use zfs send and receive and just overwrite everything.
Beyond the basics
This guide merely looked at the tip of the iceberg of features zfs has to offer. There’s a whole host of advanced configuration parameters that we have not touched upon. Doing so would be beyond the scope of this simple guide, but I strongly recommend checking out the Debian Wiki, as well as the zfs official docs. Moreover, actually using zfs with mariadb on a production environment would require some careful consideration, as by default, innodb and zfs have some overlapping functionalities.