Bug #92093 Replication crash safety needs relay_log_recovery even with GTID.
Submitted: 20 Aug 21:24 Modified: 21 Aug 17:57
Reporter: Jean-François Gagné Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Documentation Severity:S2 (Serious)
Version:5.7.23 OS:Any
Assigned to: CPU Architecture:Any

[20 Aug 21:24] Jean-François Gagné
Description:
Hi,

I suspect this also impacts 8.0, but I did not test.

In [1], we can read contradictory information (already reported in Bug#90935):

1) In the single-threaded table, it is claimed that with gtid_mode = ON and with MASTER_AUTO_POSITION, replication will survive ant crash even with relay_log_recovery = 0.

2) Below the table, it is written "When using GTIDs and MASTER_AUTO_POSITION, set relay_log_recovery=1.".

[1]: https://dev.mysql.com/doc/refman/5.7/en/replication-solutions-unexpected-slave-halt.html

Bug#90935 claims that 1) is true and 2): this is false.  It is the contrary: 2) is true and 1) is false.

Without relay_log_recovery and after an OS crash, the relay_log might be in an inconsistent state and this will either break replication, or will lead to silent data corruption.

Many thanks for looking into that,

JFG

How to repeat:
Setup a test replication environment:

$ dbdeployer deploy replication 5.7.23 --gtid;

Enable MASTER_AUTO_POSITION on slaves, put some data in the master and look at the slave statues:

$ cd rsandbox_5_7_23; \
echo "stop slave; change master to master_auto_position = 1; start slave;" | tee >(./s1) | ./s2; \
./m <<< "create database test_jfg; create table test_jfg.t (id int primary key);"; \
for i in $(seq 1 9); do echo "BEGIN;"; \
for j in $(seq 0 9); do echo "INSERT INTO t VALUE ($i$j);"; done; \
echo "COMMIT;"; done | ./m test_jfg; \
sleep 5; ./s1 <<< "show slave status\G"

At this point, both slaves should be in sync with the master with very similar SHOW SLAVE STATUS output.

For demonstration, let's see what gets into the slave when the SQL_THREAD is stopped:

$ echo "stop slave; start slave io_thread;" | tee >(./s1) | ./s2; \
for i in $(seq 10 19); do echo "BEGIN;"; \
for j in $(seq 0 9); do echo "INSERT INTO t VALUE ($i$j);"; done; \
echo "COMMIT;"; done | ./m test_jfg; \
sleep 5; ./s1 <<< "show slave status\G"

At this point, both slaves have unexecuted relay logs as shown by Retrieved_Gtid_Set greater than Executed_Gtid_Set in SHOW SLAVE STATUS output.

Let's restart s1 and s2, without starting the slave, and with relay_log_recovery enabled on s2:

$ echo "skip-slave-start" >> ./node1/my.sandbox.cnf; \
./node1/restart; \
./s1 <<< "show slave status\G"

$ echo -e "skip-slave-start\nrelay_log_recovery = ON" >> ./node2/my.sandbox.cnf; \
./node2/restart; \
./s2 <<< "show slave status\G"

At this point and on s1, because relay_log_recovery was not executed, the relay_logs are still there and if we start the SQL_THREAD, the transactions they contain will be executed even if we do not connect back the slave to the master (IO_THREAD still stopped).

$ ./s1 <<< "start slave sql_thread; do sleep(5); show slave status\G"

But on s2, because relay_log_recovery was executed, the relay logs were discarded.  To get in the same state as s1, we need to run:

$ ./s2 <<< "stop slave; start slave; do sleep(5); show slave status\G"

Ok, now the fun part, let's only have the IO_THREAD running on both s1 and s2, inject work on the master, and crash the OS (do not run this on a server you care about as we will kernel-panic Linux - I am running this on a test vm in GCP):

$ echo "stop slave; start slave io_thread;" | tee >(./s1) | ./s2; \
for i in $(seq 20 4999); do echo "BEGIN;"; \
for j in $(seq 0 9); do echo "INSERT INTO t VALUE ($i$j);"; done; \
echo "COMMIT;"; done | ./m test_jfg; \
sleep 1; sudo bash -c "echo c > /proc/sysrq-trigger"

When the vm will be restarted, there is a good chance that the relay logs are corrupted on both s1 and s2.  To see that, run mysqlbinlog on the last relay logs.

At this point, if we start back s2 and we resume replication, everything will be fine as the corrupted relay logs will be skipped thanks to relay_log_recovery.  But if we do the same on s1, replication will break on the relay log corruption.  I think there is another failure mode of s1 if truncation of the relay logs do not happen in the middle of an event, but at an event boundary in the middle of a transaction: the transaction will be partially executed which will result in silent data corruption.

Suggested fix:
There is no way to trust the relay logs after an OS crash.  The only way to reliably resume replication is to start MySQL with relay_log_recovery enabled.
[21 Aug 14:40] Bogdan Kecman
Hi,
Thanks for the report. It's not hard to reproduce this, now if this is "just" a Documentation issue or not I'll let Documentation team decide :)

kind regards
Bogdan
[21 Aug 17:57] Jean-François Gagné
Somehow related: Bug#92109.