Bug #73397 make MTS work with relay_log_recovery=1 when GTID is enabled
Submitted: 25 Jul 2014 17:53 Modified: 19 May 2015 8:24
Reporter: Santosh Praneeth Banda Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6.21 OS:Any
Assigned to: CPU Architecture:Any
Triage: Needs Triage: D3 (Medium)

[25 Jul 2014 17:53] Santosh Praneeth Banda
see how to repeat.

Without sync_relay_log=1, slave relay log may be corrupted after an OS crash which needs to be purged during crash recovery. relay_log_recovery=1 makes it very easier to achieve this.

Even with sync_relay_log=1, relay_log may end up with partial transactions and cause bug as mentioned in http://bugs.mysql.com/bug.php?id=72794.

When GTID are used, MTS don't need to care about gaps as the auto position replication protocol makes dump thread handle all of the gaps on master side.

How to repeat:
Run mysqld with --relay_log_recovery=1. Turn on MTS. kill -9 mysqld and start mysqld. Replication hits the following error

"relay-log-recovery cannot be executed when the slave was stopped with an error or killed in MTS mode"
[29 Jul 2014 17:55] Sveta Smirnova
Thank you for the report.

Our manual at http://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html#sysvar_sync_relay_lo... says: "A value of 1 is the safest choice because in the event of a crash you lose at most one event from the relay log." Did you loose more than 1 event?
[29 Jul 2014 18:00] Santosh Praneeth Banda
yes, with sync_relay_log=1 we loose only one event. We cannot use sync_relay_log=1 which has negative performance impact.
[29 Jul 2014 18:20] Sveta Smirnova
Thank you for the feedback.

But in this case what is the difference between this bug and bug #72794?
[29 Jul 2014 18:36] Santosh Praneeth Banda
These are completely different bugs. This bug is about not able to use relay_log_recovery=1 when MTS is enabled. I think i unnecessarily talked about sync_relay_log and confused you.
[29 Jul 2014 19:00] Sveta Smirnova
Thank you for the feedback.

I cannot repeat described behavior with simple crash, happened after `kill -9`, so we need particular corruption which leads to this behavior. Do you observe this failure only with corruption, similar to one, described in bug #72794 or you can repeat this bug consistently?
[29 Jul 2014 20:04] Sveta Smirnova
Please ignore last comment: I made mistake when were running a test.

Verified as described.

To repeat:

1. Start MTR:

./mtr --start --suite=rpl rpl_alter --mysqld=--gtid_mode=ON --mysqld=--log-slave-updates --mysqld=--enforce-gtid-consistency --mysqld=--innodb_buffer_pool_size=1G --mysqld=--tmp_table_size=1G --mysqld=--max_heap_table_size=1G --mysqld=--relay_log_info_repository=table --mysqld=--master_info_repository=table --mysqld=--sync-master-info=1 --mysqld=--slave_parallel_workers=8 --mysqld=--relay_log_recovery=1 &

2. Connect to slave, run CHANGE MASTER TO master_host='', master_port=13000, master_user='root', master_password='', MASTER_AUTO_POSITION = 1;

3. Connect to master, create database foo.

4. In parallel client start some load:

mysqlslap --user=root --host= --port=13000 --create-schema=foo --query="create table if not exists t1(f1 int); insert into t1 values(1); drop table if exists t1;" -c 10 -i 10000

5. kill -9 slave process

6. Restart MTR in dirty mode:  ./mtr --start-dirty --suite=rpl rpl_alter --mysqld=--gtid_mode=ON --mysqld=--log-slave-updates --mysqld=--enforce-gtid-consistency --mysqld=--innodb_buffer_pool_size=1G --mysqld=--tmp_table_size=1G --mysqld=--max_heap_table_size=1G --mysqld=--relay_log_info_repository=table --mysqld=--master_info_repository=table --mysqld=--sync-master-info=1 --mysqld=--slave_parallel_workers=8 --mysqld=--relay_log_recovery=1 &

7. Connect to slave, start it, run SHOW SLAVE STATUS.

If master_auto_position is 0 the option works fine.
[19 May 2015 8:24] David Moss
Hello Santosh,
thanks for your feedback. This has been fixed in upcoming releases and the following was noted in the 5.6.26 and 5.7.8 changelogs:

When using GTIDs, a multi-threaded slave which had relay_log_recovery=1 and that stopped unexpectedly could encounter a relay-log-recovery cannot be executed when the slave was stopped with an error or killed in MTS mode error upon restart. The fix ensures that the relay log recovery process checks if GTIDs are in use or not. If GTIDs are in use, the multi-threaded slave recovery process uses the GTID protocol to fill any unprocessed transactions.