Replication Lag: The Silent Killer of Read Replicas
Replication lag is one of those problems that builds slowly and then fails suddenly.
Your replica is running. The I/O thread is connected. The SQL thread is applying events. Everything looks fine — until an application reads stale data, a failover happens, or the replica falls so far behind that it can never catch up before the binary logs on the source expire.
A quick lag check should be part of any DBA’s routine, especially before failovers, major writes, or any operation that produces large binary log volume.
ReplicationLagandStatus.sql
Loading…
Reading the Results
Key Fields from SHOW REPLICA STATUS
| Field | What to Check |
|---|---|
Replica_IO_Running | Must be Yes — the thread reading from the source |
Replica_SQL_Running | Must be Yes — the thread applying events locally |
Seconds_Behind_Source | Lag in seconds — NULL means the I/O thread is not connected |
Last_IO_Error | Last error from the I/O thread — usually a network or auth issue |
Last_SQL_Error | Last error from the SQL thread — usually a data conflict or skipped GTID |
Relay_Log_Space | Total space used by relay logs — grows when SQL thread falls behind |
Retrieved_Gtid_Set | GTIDs received from source |
Executed_Gtid_Set | GTIDs applied locally — compare with source @@gtid_executed to find gaps |
Parallel Applier Variables
| Variable | What It Means |
|---|---|
replica_parallel_workers | Number of parallel SQL threads (0 = single-threaded) |
replica_parallel_type | LOGICAL_CLOCK is preferred for minimal lag on MySQL 8.0 |
What to Watch For
| Signal | What It Means |
|---|---|
Seconds_Behind_Source growing | Replica is falling behind — writes on source exceed apply rate |
Seconds_Behind_Source = NULL | I/O thread is disconnected — replica is receiving nothing |
Replica_SQL_Running = No | SQL thread stopped — check Last_SQL_Error immediately |
Replica_IO_Running = No | I/O thread stopped — check Last_IO_Error for the cause |
Last_SQL_Error contains “duplicate entry” | A row already exists on the replica that the source tried to insert |
Large Relay_Log_Space | SQL thread is far behind — relay logs are accumulating |
| GTID gap between source and replica | Events not yet applied — measure with GTID_SUBTRACT() |
Gareth Winterman