PostgreSQL VACUUM Basics (from a MariaDB Pov)

Introduction

As UPDATE/DELETE queries come in, PostgreSQL stores multiple versions of rows directly in the main table itself. It keeps all revisions there, with markers indicating which version is current, which versions are old.

This means for example, that an UPDATE creates a completely new copy of the entire row in the table, with all rows, not just the deltas. Both the old and the new versions will live there side by side, until a VACUUM process removes the old version.

Why do it

PostgreSQL’s VACUUM command is used to achieve a few things:

Wraparound Sidenote

PostgreSQL uses 32 bit IDs xmin and xmax for its MVCC implementation, to be able to compare trx ids.

If this counter wraps back around to zero, transactions that should be in the past will be marked as if they were to appear in the future, causing data to become invisible.

Visibility Map Sidenote

In MariaDB, an index only points to a row. But in PostgreSQL, because of MVCC, an index does not know if a row is still “alive” or if it was just deleted or updated in another transaction.

So, even if an index contains all the needed data, PostgreSQL still must visit the main table just to check this visibility status.

The visibility map is a tiny file that tracks every page in the table. It uses a single bit to mark whether a page is visible.

So this bit can not be set to 1 for a page that includes “dead” rows.

VACUUM vs VACUUM FULL

When to use VACUUM FULL

AUTOVACUUM!

AUTOVACUUM is per table, and can be configured per table via ALTER TABLE. (No equivalent concept in MariaDB, so this is important!)

ALTER TABLE your_table SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_analyze_scale_factor = 0.005
);

The point is that AUTOVACUUM does not continuously run, but rather looks depends on a treshold as a trigger of execution.

VACUUM Limitations

Even if VACUUM runs, it may not be able to clean up “dead” rows if:

So dead rows can accumulate even if autovacuum is regularly doing its job

Long running transactions

SELECT
    pid,
    usename,
    state,
    now() - xact_start AS txn_duration,
    now() - query_start AS query_duration,
    wait_event_type,
    wait_event,
    query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start;

Transactions specifically blocking VACUUM

SELECT
    pid,
    usename,
    now() - xact_start AS txn_age,
    backend_xmin,
    query
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY backend_xmin;

Tables with dead rows

SELECT
    relname,
    n_live_tup,
    n_dead_tup,
    round(100.0 * n_dead_tup / NULLIF(n_live_tup, 0), 2) AS dead_pct,
    last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Wraparound danger

SELECT
    datname,
    age(datfrozenxid) AS xid_age
FROM pg_database
ORDER BY xid_age DESC;

Table Size Bloat

Check table sizes

SELECT
    relname AS table_name,
    round(pg_total_relation_size(relid) / 1024.0 / 1024.0, 2) AS total_mb
FROM pg_catalog.pg_statio_user_tables
/* WHERE schemaname = 'someschema' */
ORDER BY total_mb DESC;

This includes dead space, so to know how big a table SHOULD be (approximately), we can do some napkin math: