Database Administration — Fri Apr 17

← Home | ← database-admin

PostgreSQL - DBA Toolkit - Checkpoint and WAL Health

Fri Apr 17 — Monitor PostgreSQL checkpoint frequency, WAL generation rate, and background writer pressure to catch I/O storms before they hit.
#postgresql #dba #monitoring #performance #wal #checkpoints #tool-kit

The Write Behind the Write

Every change in PostgreSQL goes to WAL first — the Write-Ahead Log.

Checkpoints are how PostgreSQL flushes those changes from memory to the actual data files. Done well, checkpoints happen quietly in the background and nobody notices. Done badly — too frequent, too large, happening all at once — and you get I/O spikes that crush performance across the entire server.

This script reads from pg_stat_bgwriter to show how often checkpoints are happening, whether they are being triggered on schedule or by pressure, and how much WAL is being generated.

CheckpointandWALHealth.sql
    Loading…
  

Reading the Results

Checkpoint Stats (pg_stat_bgwriter)

ColumnWhat It Tells You
checkpoints_timedCheckpoints triggered by checkpoint_timeout — expected, healthy
checkpoints_reqCheckpoints triggered by WAL volume or explicit request — a warning sign if high
checkpoint_write_timeTotal ms spent writing dirty buffers to disk during checkpoints
checkpoint_sync_timeTotal ms spent in fsync — high values indicate I/O subsystem pressure
buffers_checkpointBuffers written by checkpoints
buffers_cleanBuffers written by the background writer between checkpoints
buffers_backendBuffers written directly by backend processes — bad; means checkpoints can’t keep up
maxwritten_cleanTimes the bgwriter stopped because it hit bgwriter_lru_maxpages

WAL Generation

ColumnWhat It Tells You
wal_recordsTotal WAL records generated since last reset
wal_bytesTotal WAL volume in bytes
wal_buffers_fullTimes WAL had to be written because the WAL buffer was full

What to Watch For

SignalWhat It Means
checkpoints_req > checkpoints_timedWAL is filling up faster than the timeout fires — increase max_wal_size
High buffers_backendBackends are writing their own buffers — bgwriter cannot keep up with dirty pages
High checkpoint_sync_timeDisk I/O is the bottleneck during fsync — storage performance issue
maxwritten_clean climbingbgwriter is hitting its rate limit — tune bgwriter_lru_maxpages
wal_buffers_full > 0WAL buffer is too small — consider increasing wal_buffers
Short interval between checkpointsCombined with high checkpoints_req, indicates write pressure needing tuning

Quick Tuning Reference

ParameterEffect
checkpoint_timeoutTime between scheduled checkpoints (default 5 min) — increase to reduce frequency
max_wal_sizeMax WAL size before a forced checkpoint — increase if checkpoints_req is high
bgwriter_lru_maxpagesMax pages bgwriter writes per round — increase if maxwritten_clean is high
wal_buffersWAL buffer size in shared memory — increase if wal_buffers_full > 0

Gareth Winterman