Your Replica Looks Fine — Until It Does Not
Streaming replication lag is a silent failure.
The replica is connected. WAL is flowing. Everything looks healthy in a dashboard that only checks whether the process is running. But the replica is ten minutes behind. If the primary fails right now, that is ten minutes of committed transactions that do not exist on the standby.
This script queries pg_stat_replication on the primary to show every connected standby, how far behind each one is in bytes and in estimated time, and what stage of the replication pipeline each is at.
Loading…
Reading the Results
pg_stat_replication (run on primary)
| Column | What It Tells You |
|---|---|
client_addr | IP address of the standby server |
usename | Replication user |
application_name | Standby name — set in recovery.conf or postgresql.conf on the replica |
state | streaming = healthy; catchup = recovering; backup = base backup in progress |
sent_lsn | WAL position sent to this standby |
write_lsn | WAL position written to standby disk |
flush_lsn | WAL position flushed (durable) on standby |
replay_lsn | WAL position applied to standby data files |
write_lag | Time from primary WAL write to standby write |
flush_lag | Time from primary WAL write to standby flush |
replay_lag | Time from primary WAL write to standby replay — the real replication lag |
sync_state | async, sync, quorum — whether this standby participates in synchronous commit |
sent_lag_bytes | Bytes of WAL not yet sent to standby |
replay_lag_bytes | Bytes of WAL sent but not yet replayed |
What to Watch For
| Signal | What It Means |
|---|---|
state = catchup for extended time | Replica is behind and recovering — check network and replica I/O capacity |
replay_lag growing steadily | The standby cannot keep up with WAL generation rate |
replay_lag_bytes > 0 on a synchronous standby | Commits on the primary are waiting for this standby — latency impact |
| Missing rows (expected standby not appearing) | Replica has disconnected — check standby logs and connectivity |
sync_state = sync with high replay_lag | Synchronous replication is slowing your primary commits |
Large gap between sent_lsn and replay_lsn | WAL is arriving but not being applied — standby I/O or CPU bottleneck |
Checking Lag from the Standby
If you need to check lag from the standby server itself (when you cannot access the primary):
Loading…
replay_delay shows how old the last replayed transaction is — a reliable measure of data freshness.
Gareth Winterman