PostgreSQL EXPLAIN notes (from a MariaDB point of view)

Introduction

This article is a roadmap to build a basic mental model between MariaDB explain output and PostgreSQL output. This article is not intended as a “which is better” comparison, nor is it intended to be a comprehensive list of ecosystem differences.

The key conceptual differences in outputs:

MariaDB:

-> JOIN types: Uses type such as ref, range, ALL to describe access methods, but does not explicitly spell out the join type in most cases (The extra column can be a good indicator, and sometimes a direct answer to this)
-> Costs: The operational cost of each element of the plan is not directly referenced, but a lot can be infered from reading into type, ref and rows and extra columns.
-> Index Usage: Lists possible_keys and ref will indicate the selected key. The columns type, ref and extra can be used to infer the index access type (or if one is used at all)
-> Extra Info: The extra column provides very useful hints, like Using index or Using where, about how the optimizer wants to resolve the query. This is where hidden performance costs are often indicated.

PostgreSQL:

-> JOIN types: More explicit about the algorithm selected for processing the joins, e.g. Hash Join, Nested Loop, Merge Join may directly be named in the output
-> Costs: Provides a cost range (startup…total). So, for example, when executing a hash join, it’ll say how much “effort” it was to produce the hash, and then how much “effort” it was to perform the join itself. (cost=1 ~ the “effort” needed to read 8kB of data from disk sequentially, everything is relative to this baseline)
-> Index Usage: It names index access types explicitly (e.g. Seq Scan ~ Type=ALL, ref=NULL, Using where)
-> Extra Info: N/A, there is no similar space in the output

Nested Loops

Indexed Nested Loop:

Nested Loop
  -> Seq Scan on orders o
  -> Index Scan using customers_pkey on customers c
       Index Cond: (id = o.customer_id)

Indicators:

-> Index Scan/Index Only Scan on inner table
-> Index Cond: shows the condition driving the index

This is O(n * log(m))

The MariaDB analogue to Index Scan is Type=ref/eq_ref whith ref={{some key}}. If the extra is empty, or contains Using index condition, then the index is used to fetch the result, but the table also needs to be visited to get more data.

Index Only Scan, is the same as above, that’s roughly analogous to seeing MariaDB’s Using index in the extra, which means that the index covers all of the columns needed for this query. It means that PostgreSQL in this case can answer the needed columns from the index itself.

Simple Nested Loop:

Nested Loop
  -> Seq Scan on orders o
  -> Seq Scan on customers c
       Filter: (id = o.customer_id)

Indicators:

-> Seq Scan on inner table: full table scan
-> Filter instead of Index Cond

This is O(n * m)

This should be relatively rare in PostgreSQL because for unindexed joins, it can fall back to hash joins.

So if this appears, it indicates either that the tables are tiny, or something has gone severely off-track.

The MariaDB analogue to Seq Scan => Type=ALL, ref=NULL (+ key=NULL), should probably see Using where in the extra. Seeing Filter: ~ seeing Using where in the extra field in MariaDB.

Simple Nested Loop with Materialize:

Nested Loop
  -> Seq Scan on orders o
  -> Materialize
       -> Seq Scan on customers c
            Filter: (id = o.customer_id)

Indicators:

-> Inner scan is wrapped in Materialize, PostgreSQL scans inner table ONCE, caches it in memory

This is still O(n * m), but inner table only scanned once instead of n times

Materialize here means that PostgreSQL caches the inner result set for reuse during the processing of the Nested Loop. In MariaDB a similar process may not be directly reflected in the EXPLAIN.

Sidenote on Simple Nested Loops

When joining two tables, PostgreSQL considers:

    Nested Loop with Index Scan (if inner table has usable index)
        Cost: outer_rows × log(inner_rows)
        Chosen when: Small outer table, indexed inner table

    Hash Join (if no good index, or large result set)
        Cost: outer_rows + inner_rows
        Chosen when: No suitable index, or both tables large

    Merge Join (if both inputs are sorted)
        Cost: outer_rows × log(outer_rows) + inner_rows × log(inner_rows)
        Chosen when: Pre-sorted inputs or sort is cheap

    Nested Loop with Sequential Scan (RARE - usually worst option)
        Cost: outer_rows × inner_rows
        Chosen when: Inner table is TINY (like 5-10 rows) and seq scan is cheaper than index overhead

So that is why an actual Nested Loop with a Seq Scan is a reason to consider looking into the underlying query.

Hash Joins

Basic Hash Join:

Hash Join  (cost=150..2000 rows=5000)
  Hash Cond: (o.product_id = p.id)
  -> Seq Scan on orders o  (cost=0..1000 rows=5000)
  -> Hash  (cost=100..100 rows=1000)
       -> Seq Scan on products p  (cost=0..100 rows=1000)

Indicators:

-> Hash Join operation at top level
-> Hash Cond: shows the join condition
-> One side builds hash table (the Hash line), other side probes the hash table

This is O(n + m), as it scans each table once

Be mindful, that if the startup cost of the hash operation is high, then it means that building the hash table itself is expensive, there may not be enough space in the working memory to one shot it.

The MariaDB analogue in practice is most likely a (Block) Nested Loop, because the direction of thinking in MariaDB is that tables should have good/relevant indexes for JOINs to be performant. As a result, there is generally no direct equivalent of optimizer preferring (or falling back) to hash joins in MariaDB. If you really need a hash join in a MariaDB based system anyway, break up the join, export the data and do a file based join (if your script/application server can handle it… ). This is not a general advice, but it may sometimes be faster or more approachable than trying to fiddle with the optimizer.