Bug #74528 | STOP SLAVE + MTS leave Exec_Master_Log_Pos in an inconsistent state | ||
---|---|---|---|
Submitted: | 23 Oct 2014 14:56 | Modified: | 29 Sep 2015 15:41 |
Reporter: | Alexey Kopytov | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
Version: | 5.6 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[23 Oct 2014 14:56]
Alexey Kopytov
[23 Oct 2014 14:57]
Alexey Kopytov
MTR test case
Attachment: rpl_exec_master_log_pos.test (application/octet-stream, text), 2.86 KiB.
[31 Oct 2014 13:35]
MySQL Verification Team
Hello Alexey Kopytov, Thank you for the bug report and test case. Thanks, Umesh
[31 Oct 2014 13:36]
MySQL Verification Team
// 5.6.21 - mtr test case results [ushastry@cluster-repo mysql-test]$ ./mtr --mysqld=--binlog_format=row rpl_exec_master_log_pos Logging: ./mtr --mysqld=--binlog_format=row rpl_exec_master_log_pos 2014-11-02 20:39:12 11817 [Warning] Buffered warning: Changed limits: max_open_files: 1024 (requested 5000) 2014-11-02 20:39:12 11817 [Warning] Buffered warning: Changed limits: table_cache: 431 (requested 2000) 2014-11-02 20:39:12 11817 [Note] Plugin 'FEDERATED' is disabled. 2014-11-02 20:39:12 11817 [Note] Binlog end 2014-11-02 20:39:12 11817 [Note] Shutting down plugin 'MyISAM' 2014-11-02 20:39:12 11817 [Note] Shutting down plugin 'CSV' MySQL Version 5.6.21 Using binlog format 'row' Checking supported features... - SSL connections supported Collecting tests... Checking leftover processes... Removing old var directory... Creating var directory '/data/ushastry/server/mysql-5.6.21/mysql-test/var'... Installing system database... ============================================================================== TEST RESULT TIME (ms) or COMMENT -------------------------------------------------------------------------- worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009 include/master-slave.inc Warnings: Note #### Sending passwords in plain text without SSL/TLS is extremely insecure. Note #### Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. [connection master] include/stop_slave.inc set @save.slave_parallel_workers= @@global.slave_parallel_workers; set @@global.slave_parallel_workers= 2; SET @@global.relay_log_info_repository="TABLE"; SET @@global.sync_relay_log_info= 1; include/start_slave.inc Warnings: Note 1753 slave_transaction_retries is not supported in multi-threaded slave mode. In the event of a transient failure, the slave will not retry the transaction and will stop. create database d1; create database d2; create table d1.t1 (a int primary key) engine=innodb; create table d2.t1 (a int primary key) engine=innodb; create table d1.t2m (a int) engine=myisam; insert into d1.t1 values (1),(2),(3); insert into d2.t1 values (1),(2),(3); insert into d1.t2m values (1); include/sync_slave_sql_with_master.inc begin; update d1.t1 set a=31 where a=3; insert into d1.t1 values (5),(6),(7); begin; update d1.t1 set a=20 where a=2; insert into d1.t2m values (2); update d1.t1 set a=30 where a=3; insert into d1.t1 values (4); commit; delete from d2.t1; stop slave; Exec_Master_Log_Pos = 1507 select Id, Master_log_pos from mysql.slave_worker_info; Id Master_log_pos 1 2067 2 1507 main.rpl_exec_master_log_pos [ fail ] Test ended at 2014-11-02 20:39:28 CURRENT_TEST: main.rpl_exec_master_log_pos mysqltest: At line 81: *** Exec_Master_Log_Pos is inconsistent! - saving '/data/ushastry/server/mysql-5.6.21/mysql-test/var/log/main.rpl_exec_master_log_pos/' to '/data/ushastry/server/mysql-5.6.21/mysql-test/var/log/main.rpl_exec_master_log_pos/' -------------------------------------------------------------------------- The servers were restarted 0 times Spent 0.000 of 16 seconds executing testcases Completed: Failed 1/1 tests, 0.00% were successful. Failing test(s): main.rpl_exec_master_log_pos The log files in var/log may give you some hint of what went wrong. If you want to report this error, please read first the documentation at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html mysql-test-run: *** ERROR: there were failing test cases [ushastry@cluster-repo mysql-test]$
[10 Feb 2015 15:03]
Andrii Nikitin
see also FR bug 75841 "Disabling MTS must be safe"
[27 Apr 2015 7:22]
Sven Sandberg
Thank you for the bug report and for the detailed description. STOP SLAVE *does* wait for ongoing transactions to complete. It is a bug in the documentation that it says that STOP SLAVE may leave Exec_master_log_pos in an inconsistent state. In the test case provided above, the reason for the wrong value for Exec_master_log_pos is that the slave threads are stopped due to an error, not due to STOP SLAVE. (The test case does not work as intended, because the STOP SLAVE statement is issued on the same connection where the 'blocking' transaction has started. STOP SLAVE has an implicit commit (see http://dev.mysql.com/doc/refman/5.7/en/implicit-commit.html). This makes the 'blocking' transaction commit before STOP SLAVE. The 'blocking' transaction will then remove the value 3 from the table and release the row lock. Then worker 1 gets to execute. Since the value 3 was removed, the worker will fail on the row event corresponding to the second UPDATE statement.) So STOP SLAVE works as you expect. If you need Exec_master_log_pos to be consistent even after an error, the only way would be to use slave_preserve_commit_order (which was introduced in 5.7.5). However, the test case uses per-database parallelization (the only scheme possible in 5.6), whereas slave_preserve_commit_order is only implemented for the LOGICAL_CLOCK parallelization scheme (cf http://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#sysvar_slave_paralle... ). So this bug can potentially be interpreted as a feature request to implement slave_preserve_commit_order also for per-database parallelization (and maybe to backport that to 5.6). For now, we choose to interpret this as a request to fix the documentation. If you need slave_preserve_commit_order to work for per-database parallelization, please file a separate request.
[27 Apr 2015 7:28]
Sven Sandberg
The following test case verifies that STOP SLAVE waits for transactions so that Exec_master_log_pos is correct (in the absence of errors): --source include/master-slave.inc --echo #### Initialize #### --connection slave --source include/stop_slave.inc if (`SELECT @@VERSION LIKE '5.7.%'`) { SET @@GLOBAL.SLAVE_PARALLEL_TYPE= 'DATABASE'; } SET @@GLOBAL.SLAVE_PARALLEL_WORKERS= 2; --source include/start_slave.inc --connection master CREATE DATABASE d1; CREATE DATABASE d2; CREATE TABLE d1.t (a INT PRIMARY KEY, b INT) engine=innodb; CREATE TABLE d2.t (a INT PRIMARY KEY, b INT) engine=innodb; --echo #### Prepare test scenario #### # Create 'conflicting'/'blocking' transaction on slave. --source include/sync_slave_sql_with_master.inc BEGIN; INSERT INTO d2.t VALUES (2, 2); # Issue transactions on master. --connection master INSERT INTO d1.t VALUES (1, 1); # T1 INSERT INTO d2.t VALUES (1, 1); # T2 INSERT INTO d2.t VALUES (2, 1); # This will be a gap when executed on slave INSERT INTO d2.t VALUES (3, 1); # This will be a gap when executed on slave INSERT INTO d1.t VALUES (2, 1); # T4 --connection slave1 --let $table=d2.t --let $count=1 --source include/wait_until_rows_count.inc --let $table=d1.t --let $count=2 --source include/wait_until_rows_count.inc --echo # Now d1.t has two rows and d2.t has one row. # Wait for coordinator to populate worker's queues. --let $show_statement= SHOW PROCESSLIST --let $field= State --let $condition= = 'Slave has read all relay log; waiting for more updates' --source include/wait_show_condition.inc --echo # Now coordinator has read the entire relay log and populated workers' queues. # There is now a gap at T3 SELECT * FROM d2.t; SELECT * FROM d1.t; --echo #### Verify that STOP SLAVE will consume the entire worker queue #### --send STOP SLAVE --connection slave # Despite time elapsed, the slave should still be running, waiting for the # queue to be completed. --sleep 4 --let $show_statement= SHOW PROCESSLIST --let $field= State --let $condition= = 'Waiting for Slave Worker to release partition' --source include/wait_show_condition.inc --echo # Now coordinator is waiting for the worker to consume its queue. SELECT * FROM d2.t; ROLLBACK; --source include/wait_for_slave_sql_to_stop.inc --echo # Now all slave threads have stopped. Verify that worker completed its queue: --echo # Should contain 1, 2, 3. SELECT * FROM d2.t; --echo #### Clean up #### SET @@GLOBAL.SLAVE_PARALLEL_WORKERS = 0; --source include/start_slave.inc --connection master DROP DATABASE d1; DROP DATABASE d2; --source include/sync_slave_sql_with_master.inc --source include/rpl_end.inc
[29 Sep 2015 15:41]
David Moss
Thanks for your feedback. This has been fixed throughout the 5.7 documentation. The main change is this new section: http://dev.mysql.com/doc/refman/5.7/en/replication-features-transaction-inconsistencies.ht... Additionally many of the variables and statements impacted by this have been linked to this new section.