Database Administration — Fri Apr 17

← Home | ← database-admin

PostgreSQL - DBA Toolkit - Replication Lag

Fri Apr 17 — Check streaming replication lag on all standby servers, WAL send and replay positions, and whether replicas are keeping up with the primary.
#postgresql #dba #monitoring #replication #high-availability #tool-kit

Your Replica Looks Fine — Until It Does Not

Streaming replication lag is a silent failure.

The replica is connected. WAL is flowing. Everything looks healthy in a dashboard that only checks whether the process is running. But the replica is ten minutes behind. If the primary fails right now, that is ten minutes of committed transactions that do not exist on the standby.

This script queries pg_stat_replication on the primary to show every connected standby, how far behind each one is in bytes and in estimated time, and what stage of the replication pipeline each is at.

ReplicationLag.sql
    Loading…
  

Reading the Results

pg_stat_replication (run on primary)

ColumnWhat It Tells You
client_addrIP address of the standby server
usenameReplication user
application_nameStandby name — set in recovery.conf or postgresql.conf on the replica
statestreaming = healthy; catchup = recovering; backup = base backup in progress
sent_lsnWAL position sent to this standby
write_lsnWAL position written to standby disk
flush_lsnWAL position flushed (durable) on standby
replay_lsnWAL position applied to standby data files
write_lagTime from primary WAL write to standby write
flush_lagTime from primary WAL write to standby flush
replay_lagTime from primary WAL write to standby replay — the real replication lag
sync_stateasync, sync, quorum — whether this standby participates in synchronous commit
sent_lag_bytesBytes of WAL not yet sent to standby
replay_lag_bytesBytes of WAL sent but not yet replayed

What to Watch For

SignalWhat It Means
state = catchup for extended timeReplica is behind and recovering — check network and replica I/O capacity
replay_lag growing steadilyThe standby cannot keep up with WAL generation rate
replay_lag_bytes > 0 on a synchronous standbyCommits on the primary are waiting for this standby — latency impact
Missing rows (expected standby not appearing)Replica has disconnected — check standby logs and connectivity
sync_state = sync with high replay_lagSynchronous replication is slowing your primary commits
Large gap between sent_lsn and replay_lsnWAL is arriving but not being applied — standby I/O or CPU bottleneck

Checking Lag from the Standby

If you need to check lag from the standby server itself (when you cannot access the primary):

CheckingLagfromtheStandby.sql
    Loading…
  

replay_delay shows how old the last replayed transaction is — a reliable measure of data freshness.

Gareth Winterman