Bug #75574 | Can not execute change master after Error occured in MTS mode | ||
---|---|---|---|
Submitted: | 21 Jan 2015 12:06 | Modified: | 4 May 2015 15:29 |
Reporter: | zhang yingqiang (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.6.16, 5.6.23 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | replication; mts; slave error; |
[21 Jan 2015 12:06]
zhang yingqiang
[21 Jan 2015 12:07]
zhang yingqiang
we should clear mts recovery info in function reset_slave.
[22 Jan 2015 20:29]
Sveta Smirnova
Thank you for the report. The test case fails with error "mysqltest: At line 89: query 'insert into d1.t1 values(63,"")' failed: 1062: Duplicate entry '63' for key 'PRIMARY'" for me. If I try to change numbers of d1.t1 table, so test can reach next line, waiting for error I cannot repeat the problem and get error "mysqltest: At line 112: query 'change master to Master_Log_File='master-bin.000002',Master_Log_Pos=120' succeeded - should have failed with errno 1802..." Do I have something missed in my environment?
[23 Jan 2015 6:23]
zhang yingqiang
My mistake --source include/master-slave.inc connection slave; show slave status; show databases; stop slave; set global slave_parallel_workers=8; set global relay_log_info_repository='TABLE'; start slave; show processlist; set global BINLOG_FORMAT=row; set session BINLOG_FORMAT=row; show global variables like 'BINLOG_FORMAT'; show session variables like 'BINLOG_FORMAT'; stop slave; connection master; set global BINLOG_FORMAT=row; set session BINLOG_FORMAT=row; show global variables like 'BINLOG_FORMAT'; show session variables like 'BINLOG_FORMAT'; show databases; create database d1; create database d2; create database d3; create database d4; create database d5; create database d6; create database d7; create database d8; create table d1.t1(id int auto_increment primary key, v varchar(1000)) engine=innodb; create table d2.t1 like d1.t1; create table d3.t1 like d1.t1; create table d4.t1 like d1.t1; create table d5.t1 like d1.t1; create table d6.t1 like d1.t1; create table d7.t1 like d1.t1; create table d8.t1 like d1.t1; connection slave; start slave; connection master; --sync_slave_with_master connection slave; show tables from d8; stop slave; connection master; insert into d1.t1(v) values(repeat('a',9)),(repeat('a',9)),(repeat('a',9)),(repeat('a',9)); insert into d1.t1(v) select v from d1.t1; insert into d1.t1(v) select v from d1.t1; insert into d1.t1(v) select v from d1.t1; insert into d1.t1(v) select v from d1.t1; #64 lines insert into d2.t1(v) values(""); insert into d3.t1(v) values(""); insert into d4.t1(v) values(""); insert into d5.t1(v) values(""); insert into d6.t1(v) values(""); insert into d7.t1(v) values(""); insert into d8.t1(v) values(""); connection slave; insert into d1.t1 values(63,""); start slave; show processlist; --let $slave_sql_errno= convert_error(ER_DUP_ENTRY) --source include/wait_for_slave_sql_error.inc show slave status; select * from mysql.slave_worker_info; select * from d2.t1; select * from d1.t1 order by id limit 60,10; connection master; flush logs; show master status; connection slave; stop slave; --error ER_MTS_CHANGE_MASTER_CANT_RUN_WITH_GAPS change master to Master_Log_File='master-bin.000002',Master_Log_Pos=120; reset slave; change master to Master_Log_File='master-bin.000002',Master_Log_Pos=120; connection master; drop database d1; drop database d2; drop database d3; drop database d4; drop database d5; drop database d6; drop database d7; drop database d8; set global BINLOG_FORMAT=statement; connection slave; set global relay_log_info_repository='FILE'; set global slave_parallel_workers=0; stop slave;
[23 Jan 2015 20:38]
Sveta Smirnova
Thank you for the feedback. Verified as described.
[4 May 2015 15:29]
David Moss
Thanks for your feedback. This is fixed in upcoming versions. Noted in the 5.6.25 and 5.7.8 changelogs: If an error occurred when using a multi-threaded slave, issuing a CHANGE MASTER TO statement which resulted in an ER_MTS_CHANGE_MASTER_CANT_RUN_WITH_GAPS error, and then issuing RESET SLAVE, made it impossible to change master due to repeated ER_MTS_CHANGE_MASTER_CANT_RUN_WITH_GAPS errors. Running the debug version of mysqld caused an unexpected exit in this case. The fix ensures that the recovery process for multi-threaded slaves avoids this.
[23 Jun 2015 16:52]
Laurynas Biveinis
commit b86694019206edf667af2f268530c781666b45ca Author: Sujatha Sivakumar <sujatha.sivakumar@oracle.com> Date: Mon Mar 30 12:27:04 2015 +0530 Bug#20411374:CAN NOT EXECUTE CHANGE MASTER AFTER ERROR OCCURED IN MTS MODE Problem: ======== When error occurred in MTS mode, If user first change master (ER_MTS_CHANGE_MASTER_CANT_RUN_WITH_GAPS received here), and then reset slave, user can never change master (error ER_MTS_CHANGE_MASTER_CANT_RUN_WITH_GAPS occurred again). The debug version mysqld will crash at this case. Analysis: ======== When gaps are generated with MTS mode and a change master command is executed the code for change master command checks for the following two pre conditions. i.e first it expects "rli->mts_recovery_group_cnt" to be set and it also expects "rli->recovery_parallel_workers>0". If the two conditions are met it will generate the following error. ER_MTS_CHANGE_MASTER_CANT_RUN_WITH_GAPS After the first change master command RESET SLAVE command is issued which will bring back the slave to a fresh start stage by resetting all the workers information. As part of this clean up code "rli->recovery_parallel_workers" is set to "0". Hence when the change master command is executed once again one of the preconditions will become false and it results in an assert. Fix: === Added code to clean up MTS related recovery information during the execution of RESET SLAVE command.
[24 Jun 2015 4:03]
Laurynas Biveinis
commit 8c15ec68633d07c87170439d916bbc0fc495d7f1 Author: Sujatha Sivakumar <sujatha.sivakumar@oracle.com> Date: Mon Apr 6 11:57:02 2015 +0530 Bug#20411374:CAN NOT EXECUTE CHANGE MASTER AFTER ERROR OCCURED IN MTS MODE Fixing a post push test issue.