PostgreSQL VACUUM Basics (from a MariaDB Pov)
Introduction
As UPDATE/DELETE queries come in, PostgreSQL stores multiple versions of rows directly in the main table itself. It keeps all revisions there, with markers indicating which version is current, which versions are old.
This means for example, that an UPDATE creates a completely new copy of the entire row in the table, with all rows, not just the deltas. Both the old and the new versions will live there side by side, until a VACUUM process removes the old version.
Why do it
PostgreSQL’s VACUUM command is used to achieve a few things:
-> Recover/reuse disk space occupied by “dead” rows
-> To protect against “wraparound” where old data becomes invisible as the internal
xminandxmaxhit their 32 bit integer limitation-> To update the visibility map, used by indexes to know if rows are “dead” or “alive” - affects index only scans
Wraparound Sidenote
PostgreSQL uses 32 bit IDs xmin and xmax for its MVCC implementation, to be able to compare trx ids.
If this counter wraps back around to zero, transactions that should be in the past will be marked as if they were to appear in the future, causing data to become invisible.
-> it is necessary to vacuum every table in every database at least once every two billion transactions
-> if it is not done, PostgreSQL will eventually force aggressive vacuuming and may refuse writes to prevent data corruption
Visibility Map Sidenote
In MariaDB, an index only points to a row. But in PostgreSQL, because of MVCC, an index does not know if a row is still “alive” or if it was just deleted or updated in another transaction.
So, even if an index contains all the needed data, PostgreSQL still must visit the main table just to check this visibility status.
The visibility map is a tiny file that tracks every page in the table. It uses a single bit to mark whether a page is visible.
-> If the bit is set (1): Every row on that page is visible to all current and future transactions. Postgres can safely take the data straight from the index and skip the table entirely.
-> If the bit is NOT set (0): Postgres can’t be sure. It must visit the table to check the row’s status.
So this bit can not be set to 1 for a page that includes “dead” rows.
VACUUM vs VACUUM FULL
->
VACUUMCan run in parallel with DML operations, but not with DDL operations, does not return disk space back to the OS, the table file remains the same size->
VACUUM FULLlocks the entire table while it is running, as it compacts it by writing a new version of it, will return disk space back to the OS when done, does require extra disk space while running->
TRUNCATEis more impactful than in MariaDB, when deleting the contents of an entire table,TRUNCATEremoves all table contents, including “dead” rows, for MariaDB,TRUNCATEonly becomes very significant when replication is considered-> If a huge chunk of a table is deleted or modified at once, it will create a lot of “dead” rows at once too, so such activities have a bigger impact on disk space than MariaDB, and need to be monitored/taken into account from a disk space perspective (consider rewriting table)
When to use VACUUM FULL
-> large chunk of the table is deleted
-> table is severely bloated
-> Small (max few GB) tables
-> planned maintenance windows
AUTOVACUUM!
AUTOVACUUM is per table, and can be configured per table via ALTER TABLE. (No equivalent concept in MariaDB, so this is important!)
ALTER TABLE your_table SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_scale_factor = 0.005
);
The point is that AUTOVACUUM does not continuously run, but rather looks depends on a treshold as a trigger of execution.
->
autovacuum_vacuum_scale_factor = 0.01means triggerVACUUMwhen ~1% of the table has changed (UPDATE/DELETE)->
autovacuum_analyze_scale_factor = 0.005means triggerANALYZEwhen ~0.5% of rows have changed-> small tables are usually OK with default values (default scale factor is 0.2), but large, or high write frequency tables might need lower values
VACUUM Limitations
Even if VACUUM runs, it may not be able to clean up “dead” rows if:
-> long running transactions are still active, it will check that
xmaxis older than the oldest currently running transaction (~ MariaDB: long running transactions blocking from undo log getting purged, though this is more of a disk space problem in MariaDB world than a big perf hit)-> a replica needs the old row versions (if it is set to logical, not generally an issue for physical)
So dead rows can accumulate even if autovacuum is regularly doing its job
Checking for VACUUM related problems
Long running transactions
SELECT
pid,
usename,
state,
now() - xact_start AS txn_duration,
now() - query_start AS query_duration,
wait_event_type,
wait_event,
query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start;
- -> high
txn_duration: these are what we likely want to check
Transactions specifically blocking VACUUM
SELECT
pid,
usename,
now() - xact_start AS txn_age,
backend_xmin,
query
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY backend_xmin;
->
backend_xmindefines how farVACUUMcan clean-> check from smaller to bigger
backend_xminvalues
Tables with dead rows
SELECT
relname,
n_live_tup,
n_dead_tup,
round(100.0 * n_dead_tup / NULLIF(n_live_tup, 0), 2) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
- -> check for high
n_dead_tupespecially if coupled withdead_pct
Wraparound danger
SELECT
datname,
age(datfrozenxid) AS xid_age
FROM pg_database
ORDER BY xid_age DESC;
- -> this check should be automated, with alerts of increasing severity as ~2B is approached
Table Size Bloat
Check table sizes
SELECT
relname AS table_name,
round(pg_total_relation_size(relid) / 1024.0 / 1024.0, 2) AS total_mb
FROM pg_catalog.pg_statio_user_tables
/* WHERE schemaname = 'someschema' */
ORDER BY total_mb DESC;
This includes dead space, so to know how big a table SHOULD be (approximately), we can do some napkin math:
->
expected table size ≈ (number of rows * table row size) * 1.2 + foreach_index(rows * index[i]_size)(ROT: INT index -> 24-32 bytes per row, UUID index -> 32-40 bytes per row)->
actual table size = pg_relation_size(table)