MariaDB: What is an Index Condition Pushdown, and how it can affect locking

When it comes to indexing, one of the first observation on its benefits (besides speed), is the number of rows locked during an UPDATE or a DELETE. If we need to look at less rows thanks to an index, then obviously, that translates to less rows needing to be locked, freeing up other parts of the table for concurrent writes.

The Index Condition Pushdown (ICP) is an interesting optimization, that can allow certain queries to lock less rows than what the WHERE clause would normally indicate. Before diving into the ICP, it is good to recall a little bit about lock reductions on “simple” indexes.

Basic lock reduction due to indexing

Let’s consider a basic table icp with a couple of rows of data:

CREATE TABLE `icp` (
   `id` INT(11) NOT NULL,
   `a` INT(11) DEFAULT NULL,
   `b` VARCHAR(255) DEFAULT NULL,
   PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO icp VALUES
(1, 1,'a'),
(2, 2,'b'),
(3, 3,'c'),
(4, 4,'d'),
(5, 5,'f'),
(6, 6,'g');

We can see how many rows the optimizer thinks a query execution will check with the usage of EXPLAIN. Let’s select a single row on the PK column:

(root@linuxpc) [test]> EXPLAIN SELECT * FROM icp WHERE id = 1;
+------+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|    1 | SIMPLE      | icp   | const | PRIMARY       | PRIMARY | 4       | const | 1    |       |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.000 sec)

The query is checking only one row, since the optimizer can use the PK to efficiently locate the needed row.

If we add a slightly more cumbersome WHERE clause, we should see how it will check more rows:

(root@linuxpc) [test]> EXPLAIN SELECT * FROM icp WHERE id < 6 AND id <> 1;
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | icp   | range | PRIMARY       | PRIMARY | 4       | NULL | 5    | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.000 sec)

Now it checks 5 rows, and in the case of an UPDATE query, you would find that it’s actually going to lock row with id=1 even though the WHERE clause excludes it. This is because in this case, the optimizer decides that it’s best to check the condition row by row, and the InnoDB storage engine doesn’t know that it shouldn’t lock row with id=1.

We can simulate locks held by UPDATE queries without changing any data by using the SELECT ... FOR UPDATE construct, so we can easily check details without having to go back and forth with changing and rolling back data.

Let’s try this on the unindexed second column:

(root@linuxpc) [test]> EXPLAIN SELECT * FROM icp WHERE a < 6 AND a <> 1 FOR UPDATE;
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | icp   | ALL  | NULL          | NULL | NULL    | NULL | 6    | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.000 sec)

We can see that it’s simply going to scan every row, just as we would expect.

So if we keep this lock open indefinitely, we can actually verify that e.g. row where a=1 is indeed locked.

START TRANSACTION ; SELECT * FROM icp WHERE a < 6 AND a <> 1 FOR UPDATE;

As long as we don’t send a COMMIT; the lock for this simulated UPDATE will be held. So we can calmly open a new terminal window, and try to SELECT something in this table from another session:

(root@linuxpc) [test]> SELECT * FROM icp WHERE id = 1 FOR UPDATE;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Inspecting this with SHOW ENGINE INNODB STATUS, we can decipher that this row was locked:

LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 14432, ACTIVE 6 sec
2 lock struct(s), heap size 1128, 7 row lock(s)
MariaDB thread id 43, OS thread handle 140060968519232, query id 266 localhost root
---TRANSACTION 14433, ACTIVE 4 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1128, 1 row lock(s)
MariaDB thread id 38, OS thread handle 140060968212032, query id 267 localhost root Statistics
SELECT * FROM icp WHERE id = 1 FOR UPDATE
------- TRX HAS BEEN WAITING 3715894 us FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 340 page no 3 n bits 320 index PRIMARY of table `test`.`icp` trx id 14433 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 000000000000; asc       ;;
 2: len 7; hex 80000000000000; asc        ;;
 3: len 4; hex 80000001; asc     ;;
 4: len 1; hex 61; asc a;;

------------------

Now that command outputs a large amount of stuff, but the above snippet contains the relevant details. Record lock -> hex 80000001 is the exact clue we needed. hex 80000001 translates to the integer 1 in InnoDB’s storage format, so that proves that row with id=1 was locked, despite WHERE a <> 1 (which corresponds to id=1) in the query.

Basically, what we can see here is that the InnoDB storage engine must lock rows before sending results to the handler for the WHERE clause checks, because it doesn’t know which rows should be discarded.

COMMIT; in terminal window 1, and let’s finally consider an improvement.

The Index Condition Pushdown

If we add an index on column a, MariaDB will have the option to use an optimization called the Index Condition Pushdown (ICP). Though note that for an ICP to trigger, there must be no covering index (the optimizer will prefer that), and that ICP will not trigger on PKs:

(root@linuxpc) [test]> ALTER TABLE icp ADD INDEX idx_a(`a`);
Query OK, 0 rows affected (0.015 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now let’s compare the execution plan:

(root@linuxpc) [test]> EXPLAIN SELECT * FROM icp WHERE a < 6 AND a <> 1 FOR UPDATE;
+------+-------------+-------+-------+---------------+-------+---------+------+------+-----------------------+
| id   | select_type | table | type  | possible_keys | key   | key_len | ref  | rows | Extra                 |
+------+-------------+-------+-------+---------------+-------+---------+------+------+-----------------------+
|    1 | SIMPLE      | icp   | range | idx_a         | idx_a | 5       | NULL | 5    | Using index condition |
+------+-------------+-------+-------+---------------+-------+---------+------+------+-----------------------+
1 row in set (0.000 sec)

Still going to check 5 rows, just like before, but note the Extra: Using index condition. We should see that this way, the ICP kicks in, and the query should no longer lock row where a=1/id=1:

START TRANSACTION ; SELECT * FROM icp WHERE a < 6 AND a <> 1 FOR UPDATE;

In the other terminal window:

(root@linuxpc) [test]> SELECT * FROM icp WHERE id = 1 FOR UPDATE;
+----+------+------+
| id | a    | b    |
+----+------+------+
|  1 |    1 | a    |
+----+------+------+
1 row in set (0.000 sec)

The SELECT result came back instantly, unlike before, where it failed with a lock wait timeout. That is the difference the ICP makes!

If we try this with the column a instead, we should see the locking bite again:

(root@linuxpc) [test]> SELECT * FROM icp WHERE a = 1 FOR UPDATE;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

With the ICP, the first row is seemingly filtered out, so there’s no need to lock the row. Otherwise, with the second query, where no ICP is in play, the row is simply just locked. So the ICP doesn’t change the cardinality estimate, but it does appear to change where the filtering happens.

What does this really mean?

In the situation where ICP is not triggered, the InnoDB storage engine locates the rows in the table, returns them back to the handler (i.e. the part of MariaDB that actually does the optimizing) which then does the WHERE clause checks. This typically means more number of rows will be locked, since there’s no filtering between InnoDB and the handler.

If ICP is triggered, the WHERE clause is evaluated at the storage engine level, so the WHERE clause check is done before passing the result set to the handler. This means less rows actually go through the parsing stage by the handler, resulting in smaller number of locked rows.

Conclusion

Pushing the WHERE clause check to the storage engine can be a good way to improve concurrency, by reducing the number of rows a WHERE clause will need to lock. This can be particularly useful in situations where multiple services need to make changes to a database table. An ICP may be preferential over covering indexes on tables with frequent UPDATEs as an ICP can act to reduce the chance for row locking overlaps.