Practical patterns for large deletes in MariaDB

Context

Databases constantly evolve and change, and this change often requires larger clean-ups of data. When replication is involved, simply deleting a large chunk of data at once can cause a lot of operational issues in real(istic) set-ups.

This is because massive single-statement DELETE operations not only lock tables, but if the database server is the master to a replica server or servers, it will almost certainly cause an associated lag spike, as the replica servers will be flooded with single row events.

Basically, even though you may only be sending a single DELETE instruction, with row based replication (which is the most common mode), the instruction will be split into individual replication events to be replayed sequentially on the replica server, and will completely occupy its sql thread, preventing it from keeping up with the master until this backlog clears.

Though this article will largely focus on DELETEs, similar patterns can be applied for UPDATE and INSERTs as well.

Splitting DELETE operations into chunks

The idea here is simple. Instead of trying to process the DELETE in one shot, we try to find a way to break it up into smaller, manageable chunks, and issue a DO SLEEP between the chunks. This way, the sql thread on the replica will be able to process the chunk, and catch up with master, before the next chunk reaches the replica.

This will obviously slow down the operation, relative to just shooting it through in one shot, however, it not only makes a noticeable difference in terms of replica friendliness, but also makes things much less awkward (and potentially frustrating) if the query fails or is interrupted for whatever reason, since rolling back say, a DELETE of 1000 rows is a far smaller overhead, than rolling back a million.

Precaution

Whenever you decide to send a DELETE or UPDATE operation, make sure that

-> You actually have a WHERE clause. Without a WHERE clause, you will simply end up deleting or updating all the values in the affected tables, but in chunks. Always remember that a DELETE or UPDATE with no WHERE clause affects the entire table.
-> Your WHERE clause is not a tautology. E.g. if your WHERE clause is WHERE 1=1, the query will affect all rows, since 1=1 is always true, for every record.

In addition to that, make sure that each chunk is on its own transaction. (so either leave autocommit=1 on, or use autocommit=0 and change the stored procedures in the article by manually wrapping them in START TRANSACTION and COMMIT)

SQL statements to recreate the test schema and tables will be provided in the end of the article.

Simplest case - delete items with a table that has AUTO INCREMENT PK

In this example, the problem statement is to delete every record that has been created within the previous 24 hours in a table with an AUTO INCREMENT Primary Key. In my situation, there was a real need to find a non-disruptive solution to get rid of un-needed data, where a bug slipped through the cracks, and the developers accidentally filled a table in production with a lot of junk data overnight.

We can do it with a simple stored procedure (SP) like this:

USE test;
DROP PROCEDURE IF EXISTS sp_chunk_delete_data;
 
DELIMITER ;;
 
CREATE PROCEDURE sp_chunk_delete_data()
MODIFIES SQL DATA
BEGIN
    SET @id_tmp = 0;
    REPEAT
        DO SLEEP(2);
            SELECT @id_tmp;
            DELETE FROM test_data_1
                WHERE id BETWEEN @id_tmp AND @id_tmp+999
                    AND test_date >= (NOW() - INTERVAL 1 DAY);
                    SET @rc = ROW_COUNT();
                    SET @id_tmp = @id_tmp + 1000;
        UNTIL @rc=0
    END REPEAT;
 
END ;;
DELIMITER ;

Simply loop through the id’s in blocks of 1000, then give a small sleep before each DELETE operation, and do it until we run out of rows to delete. This can have problems if there are gaps in the PK (id column) because of previous DELETES.

delete items from AUTO INCREMENT PK with gaps in PK

In this case, the query needs to be adjusted such that we don’t quit at the first time we hit ROW_COUNT()=0. Instead, each time we hit the end of the range between @id_tmp and the next 1000, we check if we can jump further. Quit only if no further jumps are possible.

So here is the adjusted code:

USE test;
DROP PROCEDURE IF EXISTS sp_chunk_delete_data_2;
 
DELIMITER ;;
 
CREATE PROCEDURE sp_chunk_delete_data_2()
MODIFIES SQL DATA
BEGIN
    SET @id_tmp = (SELECT MIN(id) FROM test_data_2);
    checkLabel: REPEAT
        DO SLEEP(2);
        SET @id_tmp2 = (SELECT id FROM test_data_2 WHERE id >= @id_tmp ORDER BY id LIMIT 1000, 1);
        IF @id_tmp2 IS NULL THEN
            LEAVE checkLabel;
        END IF;
        DELETE FROM test_data_2
            WHERE id >= @id_tmp
            AND id < @id_tmp2
            AND test_date >= (NOW() - INTERVAL 1 DAY);
            SET @rc = ROW_COUNT();
            SET @id_tmp = @id_tmp2;
        UNTIL @rc=0
    END REPEAT;
    DELETE FROM test_data_2
        WHERE id >= @id_tmp
        AND test_date >= (NOW() - INTERVAL 1 DAY);
END;;
DELIMITER ;

Using an index instead of the PK (e.g. maybe there is no simple PK to use)

Sometimes there might not be a convenient PK to use, maybe due to flawed legacy table design, or business requirements resulting in an awkward table.

For our example, we can consider that test_date has an index, but id is something like a uuid instead of an AUTO_INCREMENT field (thus the method from the first SP won’t work anymore).

The query can be adjusted to work with the indexed field instead:

USE test;
DROP PROCEDURE IF EXISTS sp_chunk_delete_data;
 
DELIMITER ;;
 
CREATE PROCEDURE sp_chunk_delete_data()
MODIFIES SQL DATA
BEGIN
    REPEAT
        DO SLEEP(2);
            DELETE FROM test_data_1
                WHERE test_date >= (NOW() - INTERVAL 1 DAY)
                ORDER BY test_date
                LIMIT 1000;
        UNTIL ROW_COUNT()=0;
    END REPEAT;
 
END ;;
DELIMITER ;

This is not as clean as chunking by PK, but sometimes it might be the best option if the id is non-sequential. It’s still lighter on replication than a big delete in one shot would be.

Worst case - No index is possible

If there is simply no WHERE criteria that can use an index, or resolve reasonably fast, then trying to write and execute a chunked stored procedure in the above manner is likely to make things even worse. In that case, the most viable solution might be a table swap:

-> 1. Do a SELECT of the rows you’d like to keep on a replica server, and export it to a data file.
-> 2. Create a new empty table with the same structure as the table you’ve SELECTed from, but on the master.
-> 3. LOAD DATA in chunks the data that is to be kept in chunks (explanation on steps for chunked LOAD DATA are provided in a later section of this article)
-> 4. Use an ALTER TABLE table_name RENAME TO new_table_name command to swap out the tables. This operation sends only one replication event, so this is fine.

Safety and recovery

The most common engine these days is InnoDB, which means that in most cases, safety is already built in for you.

If a chunked operation fails, or the query is killed midway, the current transaction is going to be rolled back automatically. You can then simply resume the process, and the failed chunk will simply be retried. If you do want to go the extra mile, you can create a new table that acts as a progress log, that records the last id or time range.

Partitioning as a possible alternative

If there’s a partition on a table, then partitioning beats all of the above by a long shot. Dropping a partition from a table is a single replication event, and executes much faster than any DELETE would.

If you’re asked to create a table, where the business needs only the last X days of data, or half a year worth of data, or something that is easily divisible by dates, partitioning is a very reasonable approach.

Setting up the table and initial partitions

CREATE TABLE `some_table` (
    `id` varchar(36) NOT NULL,
    `user_id` varchar(36) DEFAULT NULL,
    `date` varchar(36) DEFAULT NULL,
    `interesting_value1` int(11) DEFAULT NULL,
    `interesting_value2` int(11) DEFAULT NULL,
    `partition_key` datetime NOT NULL DEFAULT current_timestamp(),
    PRIMARY KEY (`id`,`partition_key`)
) ENGINE=InnoDB;

can be partitioned by day, right away

ALTER TABLE some_table
PARTITION BY RANGE (TO_DAYS(partition_key)) (
    PARTITION pYYYYMMDD VALUES LESS THAN (TO_DAYS('YYYY-MM-DD+1')),
    PARTITION pFuture VALUES LESS THAN MAXVALUE
);

Automating partition maintenance

Time-based partitions can be maintained by a simple bash script, for example, if we want to drop days older than 6 months:

#!/bin/bash

DD=$(date +%Y-%m-%d)
TD=$(date -d "$DD + 1 day" +%Y-%m-%d)
MARIADB="mysql -u root -A "

#create new partition for today
echo "ALTER TABLE some_table DROP PARTITION pFuture; ALTER TABLE some_table ADD PARTITION (PARTITION p$(date -d "$DD" +%Y%m%d) VALUES LESS THAN (TO_DAYS('$TD')), PARTITION pFuture VALUES LESS THAN MAXVALUE);" | $MARIADB

#drop old partitions
OLD_PART=$(date -d "$DD - 6 months" +%s)

for PART in $(echo "SELECT partition_name FROM information_schema.partitions WHERE table_name = 'some_table' AND partition_method = 'RANGE';" | $MARIADB --skip-column-names | grep -v Future)
do
    PART_DATE="${PART:1}" #drop prefix p to extract date portion
    PART_DATE_TEMP=$(date -d "${PART_DATE:0:4}-${PART_DATE:4:2}-${PART_DATE:6:2}" +%s)

    if [[ "$OLD_PART" > "$PART_DATE_TEMP" ]]; then
        echo "ALTER TABLE some_table DROP PARTITION ${PART};" | $MARIADB
    fi
done

This simple script will basically check all the partition names, translate them to a timestamp, and drop the partition, if the timestamp derived from the name is older than 6 months, while also adding a new partition for the current day. It can easily be added to crontab, ideally executed before the business needs to insert data into the “current” partition.

This kind of partitioning can also help with managing SELECT queries, in situations where both interesting values are of low cardinality. Directly referencing a partition in a SELECT query is possible, so any requests can at least be split and executed a bit faster that way.

Now that ways of deleting data have been thoroughly examined, it’s probably a good idea to quickly address a few things about populating tables with data in the first place.

Perform INSERT INTO … SELECT in Chunks

It commonly happens that (part of) a table needs to be loaded into another table. As is the case with DELETE, performing too many INSERTs at once can cause replica lag spikes. However, INSERT ... SELECT can be even nastier, because in this construct, it’s likely that the SELECT part will hold a lock on the table for the duration of the INSERT. It’s not hard to see how that would be rather inconvenient on a long running transaction.

The recipe here is to find a SELECT that can lift a small chunk from the table. The caveat is that the SELECT itself must execute fast. Then, we can apply a pretty similar idea that we had with the DELETE.

Example of an SP to lift things from one table to another:

USE test;
DROP PROCEDURE IF EXISTS insert_select_chunk_data;
 
DELIMITER ;;
 
CREATE PROCEDURE insert_select_chunk_data()
BEGIN
    DECLARE i INT DEFAULT 0;
    SET @tmp_max = ((SELECT COUNT(*) FROM test_data_3) + 1);
 
    REPEAT
        DO SLEEP(2);
            INSERT INTO test_data_4
                SELECT id , test_value, test_date, 0 FROM test_data_3
                    WHERE test_date >= (NOW() - INTERVAL 1 DAY)
                    ORDER BY id ASC
                    LIMIT i, 1000
            ON DUPLICATE KEY UPDATE test_value = VALUES(test_value), test_date = VALUES(test_date), additional_value = 0;  
            SET i = i + 1000;
         UNTIL i >= @tmp_max
    END REPEAT;
END;;
DELIMITER ;

Finally, it can happen that a data needs to be sourced from external data files, such as csv files.

Loading data from external sources

Inserting data in chunks from a source like a csv file can’t be done entirely inside the mysql client. This is because some way of splitting the datafile itself into smaller chunks needs to take place before sending any requests through the mysql client.

Why even bother? LOAD DATA is super fast, right?

While LOAD DATA is really fast, even for millions of rows in a single shot, this excellent performance will not carry over to replication. From the point of view of a replication thread, LOAD DATA = INSERT. The replica still needs to replay every single insert event, so a LOAD DATA on a master will still easily cause replication lag spikes.

With the understanding, that all this splitting business is still worthy, there are many ways to achieve it, e.g. even with the built in linux tools, such as split.

A simple example of a split + LOAD DATA

Assuming we have a file called datafile.csv that is 100k rows of this:

$ tail -n 5 datafile.csv
9865,7987,6813,4629
1815,5698,6661,2234
5500,7059,6725,1055
8743,4523,2723,2397
5619,2363,5808,6848

To process this file in smaller chunks, we can use the split utility, which will split the content of the file into multiple smaller files.

The first important thing to remember, is to split by lines using -l (lines) and not by -b (bytes) since we want to make sure that we don’t accidentally split any single line into two by accident, as that’d most likely result in some errors, and some garbled data.

So that brings us to step 1, split the datafile (here I used files with 1000 lines each, but you can use your judgement to split files to reasonable sizes, obviously don’t try to insert 1M rows all at once):

split -d -l 1000 datafile.csv chunk_ --additional-suffix=.csv

Now create the SQL to execute. In this instance, we will just simply insert each column as it is. So create template.sql:

USE test;
LOAD DATA LOCAL INFILE './%THE_FILE%'
INTO TABLE test_data_22
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(test_value_1,test_value_2,test_value_3,test_value_4);

Then it’s possible to simply loop through the files:

for chunk in $(ls chunk_*.csv); do
    cat template.sql | sed 's/%THE_FILE%/'$chunk'/g' | mysql -u root test;
    sleep 2;
done

Clean up the split files:

find . -name 'chunk_*' -delete

Conclusion

I hope that some of these tips are able to serve as helpful pointers when it comes to dealing with queries that move/delete a lot of data. If all else fails, there’s always the option to exclude schemas or tables from replication :-)

Appendix

Queries used for creating the tables that were used with stored procedure examples:

CREATE DATABASE test CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
USE test;
 
DROP TABLE IF EXISTS test_data_1;
CREATE TABLE test_data_1 (
id INT NOT NULL AUTO_INCREMENT,
test_value VARCHAR(100),
test_date DATETIME DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id)
) ENGINE=INNODB
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
COMMENT="test_1";
 
DROP TABLE IF EXISTS test_data_2;
CREATE TABLE test_data_2 (
id INT NOT NULL,
test_value VARCHAR(100),
test_date DATETIME DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id)
) ENGINE=INNODB
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
COMMENT="test_2";
 
DROP TABLE IF EXISTS test_data_3;
CREATE TABLE test_data_3 (
id INT NOT NULL AUTO_INCREMENT,
test_value VARCHAR(100),
test_date DATETIME DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id)
) ENGINE=INNODB
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
COMMENT="test_3";
 
DROP TABLE IF EXISTS test_data_4;
CREATE TABLE test_data_4 (
id INT NOT NULL,
test_value VARCHAR(100),
test_date DATETIME DEFAULT CURRENT_TIMESTAMP,
additional_value VARCHAR(100),
PRIMARY KEY(id)
) ENGINE=INNODB
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
COMMENT="test_4";
 
DROP TABLE IF EXISTS test_data_22;
CREATE TABLE test_data_22 (
id INT NOT NULL AUTO_INCREMENT,
test_value_1 VARCHAR(100),
test_value_2 VARCHAR(100),
test_value_3 VARCHAR(100),
test_value_4 VARCHAR(100),
PRIMARY KEY(id)
) ENGINE=INNODB
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
COMMENT="test_22";