Secondary Indexes in MariaDB

Context

A secondary index is a construct in MariaDB that allows its storage engines to index tables in various ways. The idea is that users should be able to write queries whose WHERE clauses include multiple columns, while keeping the queries efficient. For example, we might have a table like orders with many columns ranging between (id,order_id,server_id,...order_date) and we may want to get information about orders based on various factors like the order_date, or even find orders based on their order_id AND order_date.

To do this, we would create two indexes:

ALTER TABLE orders ADD INDEX order_date_idx(`order_id`,`order_date`);
ALTER TABLE orders ADD INDEX date_idx(`order_date`);

Why did we need two indexes?

When MariaDB tries to work out what’s the best way to handle indexes for a given query, it works through the available indexes on a left -> right direction. MariaDB will evaluate each index available on the table, compare it from left to right with the fields given in the WHERE clause. MariaDB will then try to pick the index with the best prefix match.

What this means in practice, is that you can use order_date_idx to find orders by order_id as well as by both order_id and order_date. On the other hand, what you do not want to try to do, is to find orders by order_date using order_date_idx!

This is because the order of indexes in composite indexes matters a great deal: the records are sorted by key-order, meaning that the records are primarily sorted by order_id and then by order_date. So if we wanted to use this index to pick out an order_date, we would first need to scan every record in the index to find the relevant records. This is why a second, standalone index covering only order_date is necessary, if there are queries where that is the primary/only field in the WHERE clause.

EXPLAIN?

If you’re not familiar with EXPLAIN in MariaDB, the gist of it is that it shows us how the optimizer plans to execute a query. It provides an overview on what indexes (if any) a query will use, how many rows it expects to scan, and if there’s any special handling of the indexes.

Let’s illustrate some of the above concepts with a few queries (~130k rows):

MariaDB [test]> EXPLAIN SELECT * FROM orders WHERE order_date='2025-11-07';
+------+-------------+--------+------+---------------+----------+---------+-------+------+-------+
| id   | select_type | table  | type | possible_keys | key      | key_len | ref   | rows | Extra |
+------+-------------+--------+------+---------------+----------+---------+-------+------+-------+
|    1 | SIMPLE      | orders | ref  | date_idx      | date_idx | 4       | const | 37   |       |
+------+-------------+--------+------+---------------+----------+---------+-------+------+-------+
1 row in set (0.000 sec)

Here, MariaDB optimizer sees that there’s a dedicated index to find orders by order_date (type=ref is a good indicator, that the optimizer used an index), and it can quickly look up the needed rows, query cost is minimal and the expected execution time would be <<1s.

Now, if we drop this dedicated index, but keep the composite one:

MariaDB [test]> ALTER TABLE orders DROP INDEX date_idx;
Query OK, 0 rows affected (0.016 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [test]> EXPLAIN SELECT * FROM orders WHERE order_date='2025-11-07';
+------+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id   | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+------+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|    1 | SIMPLE      | orders | ALL  | NULL          | NULL | NULL    | NULL | 158455 | Using where |
+------+-------------+--------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.000 sec)

There is no longer a dedicated index for order_date, even though it’s the second column in the composite index order_date_idx, it’s not usable on its own, and the optimizer falls back to a full table scan (type=ALL). To execute this query, MariaDB would test every row against the WHERE clause, just to find those few rows.

The situation for order_id is straight forward: the left-most index in a composite index should work smoothly for WHERE clauses that hit it:

MariaDB [test]> EXPLAIN SELECT * FROM orders WHERE order_id=9913999103;
+------+-------------+--------+------+----------------+----------------+---------+-------+------+-------+
| id   | select_type | table  | type | possible_keys  | key            | key_len | ref   | rows | Extra |
+------+-------------+--------+------+----------------+----------------+---------+-------+------+-------+
|    1 | SIMPLE      | orders | ref  | order_date_idx | order_date_idx | 8       | const | 1    |       |
+------+-------------+--------+------+----------------+----------------+---------+-------+------+-------+
1 row in set (0.000 sec)

The fact that this is a composite index does not slow down lookups for these types of situations. For the sake of this query, it is functionally equivalent to a standalone index.

I’ll quickly add back the index we just dropped, because there’ll be some more comparisons to be made:

MariaDB [test]> ALTER TABLE orders ADD INDEX date_idx(`order_date`);
Query OK, 0 rows affected (0.138 sec)
Records: 0  Duplicates: 0  Warnings: 0

Digging deeper

Now, let’s also verify how the optimizer deals with WHERE clauses that hit both fields of the composite index, first, let’s check with the most common use case, an AND:

MariaDB [test]> EXPLAIN SELECT * FROM orders WHERE order_id=9913999103 AND order_date='2025-11-07';
+------+-------------+--------+------+-------------------------+----------------+---------+-------------+------+-------+
| id   | select_type | table  | type | possible_keys           | key            | key_len | ref         | rows | Extra |
+------+-------------+--------+------+-------------------------+----------------+---------+-------------+------+-------+
|    1 | SIMPLE      | orders | ref  | order_date_idx,date_idx | order_date_idx | 12      | const,const | 1    |       |
+------+-------------+--------+------+-------------------------+----------------+---------+-------------+------+-------+
1 row in set (0.000 sec)

There’s not much to it: type=ref means that an index is used, and if paired with a low number of rows, it’s very likely that the query execution would be fast. ref=const,const simply tells us that the composite index was used exactly as we expected, the optimizer checked the first two columns of the index order_date_idx.

Now if we try to replace the AND with an OR, we’ll see that the optimizer will do something quite interesting:

MariaDB [test]> EXPLAIN SELECT * FROM orders WHERE order_id=9913999103 OR order_date='2025-11-07';
+------+-------------+--------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------+
| id   | select_type | table  | type        | possible_keys           | key                     | key_len | ref  | rows | Extra                                                  |
+------+-------------+--------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------+
|    1 | SIMPLE      | orders | index_merge | order_date_idx,date_idx | order_date_idx,date_idx | 8,4     | NULL | 38   | Using sort_union(order_date_idx,date_idx); Using where |
+------+-------------+--------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------+
1 row in set (0.000 sec)

What type=index_merge and Extra=Using sort_union means is that MariaDB is going to run two separate queries on each of the two indexes, merge and sort the results, and then return the result set. This can be a reasonably fast, especially on small table sizes, but if we increase the number of rows for our test, we’ll see a significant slow down.

So let’s pump it! I’ve increased the table from having ~130k rows, to having ~2.5M rows, to try and illustrate what happens as we scale this (I will be selecting COUNT() rather than raw results for the sake of brevity):

MariaDB [test]> SELECT COUNT('x') FROM orders WHERE order_id=9913999103 OR order_date='2025-11-07';
+------------+
| COUNT('x') |
+------------+
|        712 |
+------------+
1 row in set (0.373 sec)

We can see that the index merge starts to slow down, but also clearly much faster than a full table scan at 2.5M rows would be.

On very large tables (i.e. much larger than 2.5M), you might find that doing two SELECTs and UNION-ing them together works out to be faster. Actually, that’s the case even here, the difference just isn’t that major at this scale:

MariaDB [test]> SELECT COUNT('x') FROM orders WHERE order_id=9913999103 UNION SELECT COUNT('x') FROM orders WHERE order_date='2025-11-07';
+------------+
| COUNT('x') |
+------------+
|          3 |
|        712 |
+------------+
2 rows in set (0.001 sec)

This is faster, because each SELECT can fully make use of a dedicated index, without the extra overhead that comes with merging intermediate results. (There are now two results, since we’re no longer undergoing the merge process of the index merge, and we’re selecting COUNT() rather than the actual rows).

This also holds true for DELETE queries, if both individual indexes resolve to hits fast, then splitting a DELETE ... FROM ... WHERE column2 = condition1 OR column2 = condition2 to two simple DELETE queries will be more efficient.

On the other hand, if most (~90%) of the rows of order_date were let’s say, either today or yesterday, we would say that the indexed column has a low cardinality, in which case WHERE order_date=... would not be fast due to needing to scan a large chunk of the index, and doing an UNION wouldn’t make a huge difference.

This distinction serves as a reminder as to why blindly adding indexes on every column is counterproductive. Besides the storage overhead, every INSERT, UPDATE, and DELETE will update relevant indexes, meaning that an unusable index will only slow down writes, and provide no benefit. Don’t guess, always check query metrics before adding indexes.

Conclusion

I hope that this article helped you gain a basic understanding of what secondary indexes are, and how they affect query execution plans and performance. If nothing else, I hope it was a good demonstration on why simply just throwing an index that covers all columns isn’t necessarily a great idea.