To Nha Notes | May 28, 2026, 10:57 p.m.
You've just created a MySQL replica — whether for a Blue/Green deployment, a read replica, or a disaster recovery standby. The source database has years of accumulated data, and your new replica is showing:
Seconds_Behind_Source: 9814
Nearly 3 hours of lag. And it's barely moving.
You need it at zero before you can proceed with your maintenance window. The clock is ticking.
When MySQL replication starts, the replica's SQL thread must replay every binlog event from the source in sequence. On a large production database this means:
By default, MySQL is configured for maximum durability:
These settings are correct for a production primary — they guarantee zero data loss on crash. But on a replica that's just catching up, they're causing unnecessary I/O overhead on every single replayed transaction.
Temporarily relax the durability settings on the replica only while it catches up:
| Parameter | Default | Catch-up value | Effect |
|---|---|---|---|
| innodb_flush_log_at_trx_commit | 1 | 2 | Writes to OS buffer per commit; flushes to disk every second instead |
| sync_binlog | 1 | 0 | Lets the OS decide when to flush the binlog |
This dramatically reduces the number of disk flush operations per replayed transaction — the SQL thread can apply events much faster.
On a replica catching up — yes, with caveats:
Risk: If the replica OS crashes while catching up, you could lose up to 1 second of applied transactions from its local binlog and redo log. However:
Non-negotiable rule: Revert both settings to 1 before the replica becomes a primary. If you switchover or promote the replica with these settings, you risk data loss on a production crash.
In RDS, modify the replica's parameter group (dynamic parameters — no reboot needed):
innodb_flush_log_at_trx_commit = 2 sync_binlog = 0
Verify immediately on the replica:
SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit'; SHOW VARIABLES LIKE 'sync_binlog'; -- Must return 2 and 0
SHOW REPLICA STATUS\G -- Watch: Seconds_Behind_Source decreasing -- Watch: Relay_Source_Log_File advancing
Check every 15–30 minutes. You should see the lag dropping significantly faster.
innodb_flush_log_at_trx_commit = 1 sync_binlog = 1
Verify:
SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit'; SHOW VARIABLES LIKE 'sync_binlog'; -- Must both return 1 before any promotion or switchover
During a MySQL 8.0 → 8.4 Blue/Green upgrade on a ~2TB RDS production instance:
| Phase | Seconds_Behind_Source | Drop per 30 min |
|---|---|---|
| Initial catch-up (default settings) | 9,814 → 9,700 | ~114s |
| After storage I/O warmed up | 9,700 → 7,307 | ~2,393s |
| Steady state | ~5,000–7,000 | ~1,500–2,000s |
Applying innodb_flush_log_at_trx_commit=2 + sync_binlog=0 accelerated the remaining catch-up, reducing estimated time to zero by 1–2 hours on a 2TB dataset.
Before reaching for durability settings, always check parallelism first:
SHOW VARIABLES LIKE 'replica_parallel_workers'; SHOW VARIABLES LIKE 'replica_parallel_type';
If replica_parallel_workers = 0 or 1, increasing it (e.g. to 8 or 16) with replica_parallel_type = LOGICAL_CLOCK is the first optimization to try — it parallelizes independent transactions and has no durability tradeoff.
The durability relaxation technique is the second lever — for when parallelism is already maxed out and I/O is the remaining bottleneck.
-- Apply catch-up settings (replica only) SET GLOBAL innodb_flush_log_at_trx_commit = 2; SET GLOBAL sync_binlog = 0; -- Monitor SHOW REPLICA STATUS\G -- Revert when Seconds_Behind_Source = 0 SET GLOBAL innodb_flush_log_at_trx_commit = 1; SET GLOBAL sync_binlog = 1;
⚠️ Never leave innodb_flush_log_at_trx_commit=2 or sync_binlog=0 on a production primary. Always revert before promotion or switchover.
| Default | Catch-up mode | |
|---|---|---|
| innodb_flush_log_at_trx_commit | 1 (flush every commit) | 2 (flush every second) |
| sync_binlog | 1 (sync every transaction) | 0 (OS-managed) |
| Safe on primary? | ✅ Yes | ❌ No |
| Safe on catching-up replica? | ✅ Yes | ✅ Yes |
| Revert before promotion? | N/A | Required |
A simple, low-risk technique that can shave hours off your replica catch-up time — as long as you remember to revert before the replica goes live.