MySQL Bugs: #73397: make MTS work with relay_log

Bug #73397	make MTS work with relay_log_recovery=1 when GTID is enabled
Submitted:	25 Jul 2014 17:53	Modified:	19 May 2015 8:24
Reporter:	Santosh Praneeth Banda	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6.21	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
see how to repeat.

Without sync_relay_log=1, slave relay log may be corrupted after an OS crash which needs to be purged during crash recovery. relay_log_recovery=1 makes it very easier to achieve this.

Even with sync_relay_log=1, relay_log may end up with partial transactions and cause bug as mentioned in http://bugs.mysql.com/bug.php?id=72794.

When GTID are used, MTS don't need to care about gaps as the auto position replication protocol makes dump thread handle all of the gaps on master side.

How to repeat:
Run mysqld with --relay_log_recovery=1. Turn on MTS. kill -9 mysqld and start mysqld. Replication hits the following error

"relay-log-recovery cannot be executed when the slave was stopped with an error or killed in MTS mode"

Thank you for the report.

Our manual at http://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html#sysvar_sync_relay_lo... says: "A value of 1 is the safest choice because in the event of a crash you lose at most one event from the relay log." Did you loose more than 1 event?

yes, with sync_relay_log=1 we loose only one event. We cannot use sync_relay_log=1 which has negative performance impact.

Thank you for the feedback.

But in this case what is the difference between this bug and bug #72794?

These are completely different bugs. This bug is about not able to use relay_log_recovery=1 when MTS is enabled. I think i unnecessarily talked about sync_relay_log and confused you.

Thank you for the feedback.

I cannot repeat described behavior with simple crash, happened after `kill -9`, so we need particular corruption which leads to this behavior. Do you observe this failure only with corruption, similar to one, described in bug #72794 or you can repeat this bug consistently?

Please ignore last comment: I made mistake when were running a test.

Verified as described.

To repeat:

1. Start MTR:

./mtr --start --suite=rpl rpl_alter --mysqld=--gtid_mode=ON --mysqld=--log-slave-updates --mysqld=--enforce-gtid-consistency --mysqld=--innodb_buffer_pool_size=1G --mysqld=--tmp_table_size=1G --mysqld=--max_heap_table_size=1G --mysqld=--relay_log_info_repository=table --mysqld=--master_info_repository=table --mysqld=--sync-master-info=1 --mysqld=--slave_parallel_workers=8 --mysqld=--relay_log_recovery=1 &

2. Connect to slave, run CHANGE MASTER TO master_host='127.0.0.1', master_port=13000, master_user='root', master_password='', MASTER_AUTO_POSITION = 1;

3. Connect to master, create database foo.

4. In parallel client start some load:

mysqlslap --user=root --host=127.0.0.1 --port=13000 --create-schema=foo --query="create table if not exists t1(f1 int); insert into t1 values(1); drop table if exists t1;" -c 10 -i 10000

5. kill -9 slave process

6. Restart MTR in dirty mode:  ./mtr --start-dirty --suite=rpl rpl_alter --mysqld=--gtid_mode=ON --mysqld=--log-slave-updates --mysqld=--enforce-gtid-consistency --mysqld=--innodb_buffer_pool_size=1G --mysqld=--tmp_table_size=1G --mysqld=--max_heap_table_size=1G --mysqld=--relay_log_info_repository=table --mysqld=--master_info_repository=table --mysqld=--sync-master-info=1 --mysqld=--slave_parallel_workers=8 --mysqld=--relay_log_recovery=1 &

7. Connect to slave, start it, run SHOW SLAVE STATUS.

If master_auto_position is 0 the option works fine.

Hello Santosh,
thanks for your feedback. This has been fixed in upcoming releases and the following was noted in the 5.6.26 and 5.7.8 changelogs:

When using GTIDs, a multi-threaded slave which had relay_log_recovery=1 and that stopped unexpectedly could encounter a relay-log-recovery cannot be executed when the slave was stopped with an error or killed in MTS mode error upon restart. The fix ensures that the relay log recovery process checks if GTIDs are in use or not. If GTIDs are in use, the multi-threaded slave recovery process uses the GTID protocol to fill any unprocessed transactions.