Database Administration — Tue Apr 07

← Home | ← database-admin

MySQL - DBA Toolkit - Replication Lag and Status

Tue Apr 07 — Check replica status, seconds behind source, I/O and SQL thread health, and GTID consistency to catch replication problems before they become outages.
#mysql #dba #monitoring #replication #tool-kit

Replication Lag: The Silent Killer of Read Replicas

Replication lag is one of those problems that builds slowly and then fails suddenly.

Your replica is running. The I/O thread is connected. The SQL thread is applying events. Everything looks fine — until an application reads stale data, a failover happens, or the replica falls so far behind that it can never catch up before the binary logs on the source expire.

A quick lag check should be part of any DBA’s routine, especially before failovers, major writes, or any operation that produces large binary log volume.

ReplicationLagandStatus.sql
    Loading…
  

Reading the Results

Key Fields from SHOW REPLICA STATUS

FieldWhat to Check
Replica_IO_RunningMust be Yes — the thread reading from the source
Replica_SQL_RunningMust be Yes — the thread applying events locally
Seconds_Behind_SourceLag in seconds — NULL means the I/O thread is not connected
Last_IO_ErrorLast error from the I/O thread — usually a network or auth issue
Last_SQL_ErrorLast error from the SQL thread — usually a data conflict or skipped GTID
Relay_Log_SpaceTotal space used by relay logs — grows when SQL thread falls behind
Retrieved_Gtid_SetGTIDs received from source
Executed_Gtid_SetGTIDs applied locally — compare with source @@gtid_executed to find gaps

Parallel Applier Variables

VariableWhat It Means
replica_parallel_workersNumber of parallel SQL threads (0 = single-threaded)
replica_parallel_typeLOGICAL_CLOCK is preferred for minimal lag on MySQL 8.0

What to Watch For

SignalWhat It Means
Seconds_Behind_Source growingReplica is falling behind — writes on source exceed apply rate
Seconds_Behind_Source = NULLI/O thread is disconnected — replica is receiving nothing
Replica_SQL_Running = NoSQL thread stopped — check Last_SQL_Error immediately
Replica_IO_Running = NoI/O thread stopped — check Last_IO_Error for the cause
Last_SQL_Error contains “duplicate entry”A row already exists on the replica that the source tried to insert
Large Relay_Log_SpaceSQL thread is far behind — relay logs are accumulating
GTID gap between source and replicaEvents not yet applied — measure with GTID_SUBTRACT()

Gareth Winterman