Basic differences between MariaDB vs PostgreSQL, from a MariaDB PoV

CONFIG FILES AND CONNECTION BASICS

postgresql.conf (~ my.cnf / mariadb.cnf)

# Linux:
cat /etc/postgresql/15/main/postgresql.conf

Key sections to understand:

shared_buffers = 128MB          # MariaDB: innodb_buffer_pool_size

Shared memory for all connections, RoT ~25% of system RAM

work_mem = 4MB                  # MariaDB: sort_buffer_size (per operation)

Memory for sorts, hash joins, etc. this value is per operation, so 100 connections doing sorts = 100 times work_mem Start conservative on this one, increase if needed

maintenance_work_mem = 64MB     # MariaDB: no direct equivalent

No MariaDB equivalent, this is memory for things like VACUUM, CREATE INDEX, these are relatively rare operations

effective_cache_size = 4GB      # Hint to planner (not actual allocation)

Tells the query planner how much OS cache is available, RoT ~75% of RAM

max_connections = 100           # MariaDB: max_connections

Very different implications in PostgreSQL! (See section on CONNECTIONS)

pg_hba.conf ( ~ mysql.user table)

# Linux:
cat /etc/postgresql/15/main/pg_hba.conf

PostgreSQL manages host-based authentication in a dedicated configuration file, MariaDB manages these rules directly within its internal grant tables.

MariaDB: Every user account is explicitly tied to a host or IP range at the time of creation, use GRANT/REVOKE statements to determine privileges related to database objects. Basically:

CREATE USER 'username'@'192.168.1.100' IDENTIFIED BY 'password';

GRANT ALL PRIVILEGES ON my_schema.* TO 'username'@'192.168.1.100'

PostgreSQL: PostgreSQL separates identity and access. You create a role (user) inside the database, but you must define which hosts that role can connect from in the external pg_hba.conf file. E.g. To allow the equivalent of @’%’ in MariaDB:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    my_database     my_user         0.0.0.0/0               scram-sha-256

Don’t forget to do a SELECT pg_reload_conf(); after updating this file.

Rules are processed from TOP to BOTTOM, first match wins.

Main difference is that, in PostgreSQL you don’t create users with a host, you just create it like this:

-- Create the user with a password
CREATE USER my_user WITH PASSWORD 'secure_password';

-- 1. Grant connection to the specific database
GRANT CONNECT ON DATABASE my_database TO my_user;

-- 2. Grant USAGE on the schema (allows the user to "see" it)
GRANT USAGE ON SCHEMA my_schema TO my_user;

-- 3. Grant privileges on all existing tables in that schema
GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA my_schema TO my_user;

-- 4. Grant privileges on all future tables automatically
ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema 
GRANT SELECT, INSERT, UPDATE ON TABLES TO my_user;

The 5 Columns:

CONNECTIONS

PostgreSQL connections work differently from MySQL, new processes are created for each connection.

MariaDB: uses a single process with multiple threads to handle its connections (mysqld), all database operations from query execution to buffer management happen within one unified process space

This makes MariaDB comparatively lightweight since it doesn’t fork new processes for new connections. All threads are in the same memory space and share resources like the buffer pool, query cache, and table metadata, enabling efficient resource utilization, great for concurrency

PostgreSQL: Each connection is completely isolated from the others, meaning higher memory usage since each process needs its own space.

A supervisor process listens for incoming connections on the configured port (default 5432) and acts as the primary coordinator for all database activity, after authentication, uses the unix fork() to create a new process dedicated specifically to that connection, with its own memory space and execution context. The newly forked process handles all queries until this connection is closed/disconnecte, maintaining session state, transaction context, and query execution buffers independently from all other connections.

Though each connection gets an isolated space in memory for general query execution, a shared common space for caching data pages, write ahead logging, lock tables is still needed.

NOTE: PostgreSQL also runs some helper processes for WAL, checkpointing to flush dirty buffers, autovacuuming, and query statistics.

Comparatively to MariaDB:

Connection Pooling notes: this can be done either application side, many languages come with connection pooler libraries (e.g. HikariCP), basically fire up a bunch of connections, keep them alive, and the application manages them for itself.

Or, a proxy can be used, where a middleman catches all incoming requests, and hands out readied connections. E.g. pgBouncer.

The idea is that pgBouncer multiplexes e.g. 500 app connections through 50 DB connections so PostgreSQL only spawns 50 processes leading to much more efficient resource usage.

Application connects to 6432 (pgBouncer) and NOT 5432 (PostgreSQL directly)

Sidenote on Connection Pooling

pgBouncer stats:

-- Connect to pgBouncer admin console
psql -h localhost -p 6432 -U pgbouncer pgbouncer

Then:

-- Show pools
SHOW POOLS;
-- Shows: database, user, active connections, waiting clients

-- Show stats
SHOW STATS;
-- Shows: queries per second, bytes in/out

-- Show clients
SHOW CLIENTS;
-- Shows: connected clients, their state

PostgreSQL side:

-- See actual connections
SELECT count(*) FROM pg_stat_activity;

-- See connection states
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;

Alternatively there’s PgPool-II, which is more feature rich, and complex, comparable to MaxScale:

PgPool-II:

MaxScale:

WAL Settings (vs binlogs)

wal_level = replica             # Needed for replication
wal_buffers = 16MB              # MariaDB: innodb_log_buffer_size
checkpoint_timeout = 5min       # How often to checkpoint
max_wal_size = 1GB              # MariaDB: innodb_log_file_size

What is WAL:

PostgreSQL uses Write-Ahead Logging for crash recovery and replication. Changes are written to WAL first, then asynchronously to data files. If the server crashes, it replays the WAL to recover. For replication, replicas can stream WAL changes from the primary.

REPLICATION AND PITR

PostgreSQL:

MariaDB:

TL;DR: WAL is a physical, page-level change stream used for both recovery and replication, whereas binlog is a logical change stream (row or statement based) used for replication, MariaDB uses redo logs (mechanism at different layer, the storange engine) for crash recovery

For quick operational comparisons on setting up REPLICATION, see ANNEX I

STORAGE ENGINES

No pluggable storage engines in PostgreSQL!

MariaDB:

PostgreSQL:

TRANSACTION HANDLING

MySQL/MariaDB:

PostgreSQL:

NOTE ON SERIALIZABLE

MariaDB: Basically REPEATABLE READ + lock all reads, every SELECT gets shared locks

PostgreSQL: Basically tries to put every query through, errors out if a concurrent update was found, no locks on reads like MariaDB

MVCC SIDENOTE

The idea is the same for both MariaDB and PostgreSQL: keep multiple versions of data rows so that readers can access old versions, while writers can create new ones. MVCC is the solution that allows databases to handle concurrent access without locking readers out. Instead of overwriting data directly, the database keeps old versions of rows so that ongoing transactions can still see the data as it existed when they started.

This needs roughly these four things to happen:

That’s where the similarities end. The actual implementation of this is totally different between the two systems.

MySQL/MariaDB (InnoDB):

Uses a file based approach. Old records do not remain in the main table, they are written to a file based log (undo log).

PostgreSQL:

PostgreSQL doesn’t have an undo log, or any other log equivalent to it. Instead, it stores multiple versions of the rows directly in the main table itself. Basically, it keeps all revisions in the same table, with markers indicating which version is current, and which versions are old.

REPEATABLE READ SIDENOTE

MariaDB : Uses gap locks to block any inserts that would create new rows while this transaction is being processed.

SELECT * FROM products WHERE product_id >= 100 AND product_id <= 200 FOR UPDATE;

What this does:

This can block inserts that are unrelated to this query, tables with huge gaps due to frequent deletes will observe this in particular.

PostgreSQL:

Does NOT use this logic at all. No such thing as gap locking. PostgreSQL matches the original WHERE clause of the query. So only the rows that really exist for WHERE product_id >= 100 AND product_id <= 200 will be locked. Nothing else. This is very nice for concurrency

INDEXES

Default behaviour is very different, though the syntax is similar, what happens under the hood is not exactly the same.

Clustered vs Non-Clustered Indexes

MariaDB/MySQL (InnoDB):

Every table MUST have a clustered index:

Example:

CREATE TABLE users (
    id INT PRIMARY KEY,        -- This becomes the clustered index
    email VARCHAR(255),
    created_date DATE,
    INDEX idx_email (email)    -- Secondary index (stores id + email)
);

Physical layout on disk:

Implications:

PostgreSQL:

There are NO clustered indexes:

Example:

CREATE TABLE users (
    id INT PRIMARY KEY,        -- Non-clustered index
    email VARCHAR(255),
    created_date DATE
);

CREATE INDEX idx_email ON users(email);  -- Also non-clustered

Physical layout on disk:

Implications:

The CLUSTER Command (One-Time Operation)

It is possible to ask PostgreSQL to order (cluster) a table, but it won’t stick.

PostgreSQL has CLUSTER command:

-- Physically reorder table by index (one-time)
CLUSTER users USING users_pkey;

-- Or just:
CLUSTER users;  -- Uses most recently clustered index

What this does:

What it DOES NOT do:

After CLUSTER:

INSERT INTO users VALUES (1000, 'new@email.com', '2024-01-01');
-- This row goes at the END of the heap (not in id order)
-- Clustering is now broken

To maintain clustering:

Operational Perspective:

MariaDB:

Best practices:

PostgreSQL:

Different considerations:

But:

TL ; DR:

PostgreSQL:

MariaDB:

OPERATIONAL SPECIFICS

PostgreSQL has some indexing features out of the box that MariaDB does not have:

PostgreSQL:

MariaDB:

Covering Indexes also work differently:

PostgreSQL:

MariaDB:

DATABASE vs SCHEMA

MariaDB:

DATABASE and SCHEMA are synonyms, and every DATABASE/SCHEMA on the same server can be cross-joined, or referenced within a single query. No hard container.

PostgreSQL:

A DATABASE is a hard container, one query can not cros-join across these. DATABASES can contain SCHEMAS which are the containers for TABLES.

QUERY ANALYSIS (EXPLAIN)

Both MariaDB and PostgreSQL have an EXPLAIN statement for looking into the optimizer plan for execution. The outputs between MariaDB and PostgreSQL are very different, as well as there are differences in the optimizer approach for certain types of queries.

For this comparison, a simple three table structure is used, where tables 2 and 3 have a relation parent_id towards table 1.

root@debian-test:/root/tmp# awk -v tables=3 -v digits=8 -v gen_date=1 -v join_mode=1 -f gen_multi_table_with_dates.awk
----- TABLE 1 -----
id,stamp_created,col1,col2,col3
65581476,2023-12-10,80145896,24821729,53160890
27314164,2017-02-03,63534806,10386889,23100733
59139326,2020-02-24,31937496,79548467,82410078
10436107,2019-10-08,92748791,97199300,1144696
2437634,2016-11-13,86843132,94984995,69678641

----- TABLE 2 -----
id,parent_id,col1,col2,col3
31902428,65581476,8423775,37554689,57320794
99580925,2437634,22555129,79131929,32832900
79908125,59139326,20794347,24204127,97918800
25845521,27314164,37929588,79558799,34137298
97333279,27314164,67683785,40012518,15993387

----- TABLE 3 -----
id,parent_id,col1,col2,col3
59391619,27314164,94162607,29144644,89148135
89757852,27314164,61512988,32166739,44081521
35507451,27314164,8542435,35867322,8488144
15136464,59139326,11057072,46909228,9855358
51130852,2437634,26040480,76389303,66981533

This setup mirrors a common star shaped schema. There are no indexes on these tables.

Let’s examine the following query:

SELECT
    t1.id,
    t1.col1,
    t2.col3,
    t3.col1,
    t3.col3
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.parent_id
JOIN table3 t3 ON t1.id = t3.parent_id
WHERE t1.stamp_created BETWEEN '2025-06-01' AND '2025-10-31';

MARIADB

(root@linuxpc) [test]> EXPLAIN SELECT     t1.id,     t1.col1,     t2.col3,     t3.col1,     t3.col3 FROM tbl1 t1 JOIN tbl2 t2 ON t1.id = t2.parent_id JOIN tbl3 t3 ON t1.id = t3.parent_id WHERE t1.stamp_created BETWEEN '2025-06-01' AND '2025-10-31';
+------+-------------+-------+--------+---------------+---------+---------+-------------------+-------+-------------------------------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref               | rows  | Extra                                           |
+------+-------------+-------+--------+---------------+---------+---------+-------------------+-------+-------------------------------------------------+
|    1 | SIMPLE      | t2    | ALL    | NULL          | NULL    | NULL    | NULL              | 77012 | Using where                                     |
|    1 | SIMPLE      | t1    | eq_ref | PRIMARY       | PRIMARY | 4       | test.t2.parent_id | 1     | Using where                                     |
|    1 | SIMPLE      | t3    | ALL    | NULL          | NULL    | NULL    | NULL              | 88133 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+--------+---------------+---------+---------+-------------------+-------+-------------------------------------------------+
3 rows in set (0.000 sec)

This EXPLAIN is saying that:

  1. t2 (ALL) - Full table scan of tbl2 (77k rows), Using where (filtering by parent_id: using each t2.parent_id to probe t1 by primary key)

  2. t1 (eq_ref, PRIMARY) - For each t2 row, lookup in t1 by PK, Using where (filtering by stamp_created)

  3. t3 (ALL) - Full table scan of tbl3 (88k rows), Using join buffer (BNL - Block Nested Loop)

Join order: t2, t1, t3.

This type of query often requires optimization in MariaDB, because it does not scale well to bigger table sizes.

PostgreSQL

test=# EXPLAIN SELECT
    t1.id,
    t1.col1,
    t2.col3,
    t3.col1,
    t3.col3
FROM tbl1 t1
JOIN tbl2 t2 ON t1.id = t2.parent_id
JOIN tbl3 t3 ON t1.id = t3.parent_id
WHERE t1.stamp_created BETWEEN '2025-06-01' AND '2025-10-31';
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=3674.37..5597.07 rows=3647 width=20)
   Hash Cond: (t3.parent_id = t1.id)
   ->  Seq Scan on tbl3 t3  (cost=0.00..1544.98 rows=90998 width=12)
   ->  Hash  (cost=3632.07..3632.07 rows=3384 width=16)
         ->  Hash Join  (cost=2057.31..3632.07 rows=3384 width=16)
               Hash Cond: (t2.parent_id = t1.id)
               ->  Seq Scan on tbl2 t2  (cost=0.00..1370.00 rows=78000 width=8)
               ->  Hash  (cost=2011.52..2011.52 rows=3663 width=8)
                     ->  Seq Scan on tbl1 t1  (cost=0.00..2011.52 rows=3663 width=8)
                           Filter: ((stamp_created >= '2025-06-01'::date) AND (stamp_created <= '2025-10-31'::date))
(10 rows)

Key differences on reading a PostgreSQL explain vs MariaDB explain:

The EXPLAIN format is drastically different:

Format: (cost=startup_cost..total_cost rows=X width=Y)

The EXPLAIN is saying that:

  1. Seq Scan (~ type: ALL) on tbl1, Filter by stamp_created BETWEEN dates, Estimated: 3663 rows (out of ~100k total)

  2. Build Hash of t1 results (3663 rows)

  3. Seq Scan on tbl2 (78k rows), Hash Join with t1 hash (match on parent_id), Estimated: ~3384 rows

  4. Build Hash of join result

  5. Seq Scan on tbl3 (90k rows), Hash Join with previous hash, Estimated final result: ~3647 rows

Join order: t1, t2, t3

Basically, the insight is that MariaDB and PostgreSQL will experience different pain points around query and index optimization.

In MariaDB, a missing index can lead to nested loops, and bad join orders to catastrophically slow queries.

In PostgreSQL, a missing index might not have these catastrophic outcomes, and the pain points are centered around index maintenance (bloat/unnecessary indexes on write heavy tables, insufficient VACUUMing)

Adding indexes for parent_id in tables 2 and 3 would result in a dramatic improvement for MariaDB, while it wouldn’t necessarily do the same over in PostgreSQL.

Long Running Transactions

MariaDB (InnoDB):

Often causes various lock based problems:

PostgreSQL:

Rather than lock based problems, like in MariaDB, the situation is entirely different: VACUUM can not clean up dead rows that are still visible to long running transactions

ANNEX I - REPLICATION SET UP BASICS

MariaDB:

Configuration is mostly owned by the replica server, but do need to make sure binlogging is enabled, and that a replication user on the master exists.

MASTER: make sure the basics are configured:

[mysqld]
server-id=1
gtid_strict_mode=1
log_bin = /var/lib/mysql/bin-mariadb.log
expire_logs_days=8
sync_binlog = 1
binlog_format = row
log_slave_updates = 1

TL;DR - we need binlogs, row format is safest for consistency, need a unique server id, gtid is can be a good idea, if we want things like automated failovers

Then, restart with systemctl restart mariadb, and create a user with GRANTS

(root@linuxpc) [(none)]> CREATE USER 'repl'@'%' IDENTIFIED BY 'secret';
Query OK, 0 rows affected (0.011 sec)

(root@linuxpc) [(none)]> GRANT REPLICATION SLAVE ON *.* to 'repl'@'%';
Query OK, 0 rows affected (0.012 sec)

Then take a dump:

root@linuxpc:~# mysqldump --all-databases --master-data=2 --single-transaction -u root > dump.sql

TL;DR –master-data=2 needed if using GTID, as it will include the GTID position

REPLICA:

Make sure we have a unique server id here:

[mysqld]
server_id=2
... 

Import the dump taken from the master as a starting point:

root@debian-test:~# mysql -u root -p < /tmp/dump.sql

Once done, set up the replication:

MariaDB [(none)]> CHANGE MASTER TO
    -> MASTER_HOST='192.168.2.37',
    -> MASTER_PORT=3306,
    -> MASTER_USER='repl',
    -> MASTER_PASSWORD='secret',
    -> MASTER_USE_GTID=slave_pos;
Query OK, 0 rows affected (0.013 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.022 sec)

Check:

MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
...
       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
...

Indicates that everything is well.

PostgreSQL

The configuration mostly happens on the master.

MASTER:

First, check into postgresql.conf to set up some basic requirements:

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10

Then, in pg_hba.conf, create a rule that allows the replication user to connect:

host    replication     all             192.168.2.99/24         scram-sha-256

Then, login to the Database, create the user, and reload configs:

postgres=# CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secret';
CREATE ROLE
postgres=# SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

Don’t forget to check locale, because a mismatch of locales would prevent the replica server from starting up:

postgres=# SHOW lc_collate;
 lc_collate
-------------
 en_US.UTF-8
(1 row)

postgres=# SHOW lc_ctype;
  lc_ctype
-------------
 en_US.UTF-8
(1 row)

REPLICA:

Repeat the SHOW statements, if there’s not an exact match, make sure to fix this (as root) by:

apt install locales
sed -i 's/^# *en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen
locale-gen

Then, it should be possible to stop the database, import the dataset over from the master DB, and set up replication:

root@debian-test:~# sudo -u postgres rm -rf /var/lib/postgresql/15/main/*
root@debian-test:~# sudo -u postgres pg_basebackup -h 192.168.2.37 -D /var/lib/postgresql/15/main -U replicator -P -R -X stream -C -S replica1
could not change directory to "/root": Permission denied
Password:
23043/23043 kB (100%), 1/1 tablespace

Flag breakdown:

Start up the database:

root@debian-test:~# systemctl start postgresql

Check if it’s working:

postgres=# SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

Can check on the MASTER as well:

postgres=# SELECT * FROM pg_stat_replication;

   pid   | usesysid |  usename   | application_name | client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time
---------+----------+------------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
 2241757 |    16388 | replicator | 15/main          | 192.168.2.99 |                 |       34984 | 2026-05-02 23:05:47.787436+02 |
           | streaming | 0/F000060 | 0/F000060 | 0/F000060 | 0/F000060  |           |           |            |             0 | async
   | 2026-05-02 23:07:27.900017+02
(1 row)

ANNEX II - Common Optimizer Cost Patterns

Postgres always shows costs as a range: (startup..total)

  1. Sequential Scan (large table):
   Seq Scan on users  (cost=0.00..15000.00 rows=500000)
   High total cost = many pages to read
  1. Index Scan (selective):
   Index Scan using idx_email  (cost=0.42..8.44 rows=1)
   Low total cost = few pages needed
  1. Index Only Scan (best case):
   Index Only Scan using idx_email_name  (cost=0.42..4.44 rows=1)
   Even lower cost = no heap access needed
  1. Bitmap Index Scan (medium selectivity):
   Bitmap Index Scan on idx_age  (cost=0.00..500.00 rows=5000)
   -> Bitmap Heap Scan  (cost=500.00..2500.00 rows=5000)
   Medium cost = many matches, but organized
  1. Hash Join (large datasets):
   Hash Join  (cost=1500.00..8000.00 rows=10000)
   High startup (building hash), but efficient
  1. Nested Loop (small datasets):
   Nested Loop  (cost=0.29..100.00 rows=10)
   Low startup, but multiplies for each outer row

Note: PostgreSQL allows its cost model to be configured, e.g. for faster drives, it’s possible to lower the costs associated with random page accesses, to favour indexes)

In postgresql.conf:

- seq_page_cost = 1.0
- random_page_cost = 4.0
- cpu_tuple_cost = 0.01
- cpu_operator_cost = 0.0025