Bug #74528 STOP SLAVE + MTS leave Exec_Master_Log_Pos in an inconsistent state
Submitted: 23 Oct 2014 14:56 Modified: 29 Sep 2015 15:41
Reporter: Alexey Kopytov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any

[23 Oct 2014 14:56] Alexey Kopytov
Description:
The manual says Exec_Master_Log_Pos in SHOW SLAVE STATUS cannot be trusted on a multi-threaded slave:

"When using a multi-threaded slave (by setting slave_parallel_workers to a nonzero value in MySQL 5.6.3 and later), the value in this column actually represents a “low-water” mark, before which no uncommitted transactions remain. Because the current implementation allows execution of transactions on different databases in a different order on the slave than on the master, this is not necessarily the position of the most recently executed transaction. "

It is not clear from this description whether this caveat applies only to a running slave, or STOP SLAVE brings Exec_Master_Log_Pos to a consistent state, i.e. the one that can be used by backup utilities to clone a slave. Some people assume the latter is correct, see for example:

"When you execute STOP SLAVE SQL_THREAD on a slave with slave_parallel_workers>0, the server waits until the furthest-behind slave can catch up before the STOP SLAVE command returns."

https://mariadb.com/blog/how-get-mysql-56-parallel-replication-and-percona-xtrabackup-play...

I could not find any confirmation of the above statement in the source code, so attached is an MTR test case which proves it wrong.

How to repeat:
Run the attached MTR test case.

Suggested fix:
I understand this may be hard to fix, because STOP SLAVE cannot wait for in-progress transactions forever. But the fact that Exec_Master_Log_Pos can never be trusted with a multi-threaded slave, even when the slave is stopped, has some far-reaching implications.

1. First of all, if fixing it is not feasible, the limitation should be stated clearly in the documentation, so users do not make wrong assumptions. (Documenting MTS in general would be even better).

2. When a multi-threaded slave is stopped normally (i.e. with STOP SLAVE) in an inconsistent state, starting it with START SLAVE UNTIL_NO_MTS_GAPS is the only option. Description of START SLAVE UNTIL_NO_MTS_GAPS in the manual sounds like it should only be used when a slave failed with an error. Which is misleading, so it should be corrected.

3. mysqldump (like all other backup utilities currently) does not take this fact into account. Which means mysqldump --dump-slave may produce inconsistent backups for a multi-threaded slave, even though it stops the slave temporarily.

4. Finally, since MTS is not fully functional without GTID, perhaps there should be at least a warning in the error log when a non-GTID MTS slave is started?
[23 Oct 2014 14:57] Alexey Kopytov
MTR test case

Attachment: rpl_exec_master_log_pos.test (application/octet-stream, text), 2.86 KiB.

[31 Oct 2014 13:35] MySQL Verification Team
Hello Alexey Kopytov,

Thank you for the bug report and test case.

Thanks,
Umesh
[31 Oct 2014 13:36] MySQL Verification Team
// 5.6.21 - mtr test case results

[ushastry@cluster-repo mysql-test]$ ./mtr  --mysqld=--binlog_format=row rpl_exec_master_log_pos
Logging: ./mtr  --mysqld=--binlog_format=row rpl_exec_master_log_pos
2014-11-02 20:39:12 11817 [Warning] Buffered warning: Changed limits: max_open_files: 1024 (requested 5000)

2014-11-02 20:39:12 11817 [Warning] Buffered warning: Changed limits: table_cache: 431 (requested 2000)

2014-11-02 20:39:12 11817 [Note] Plugin 'FEDERATED' is disabled.
2014-11-02 20:39:12 11817 [Note] Binlog end
2014-11-02 20:39:12 11817 [Note] Shutting down plugin 'MyISAM'
2014-11-02 20:39:12 11817 [Note] Shutting down plugin 'CSV'
MySQL Version 5.6.21
Using binlog format 'row'
Checking supported features...
 - SSL connections supported
Collecting tests...
Checking leftover processes...
Removing old var directory...
Creating var directory '/data/ushastry/server/mysql-5.6.21/mysql-test/var'...
Installing system database...

==============================================================================

TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
include/master-slave.inc
Warnings:
Note    ####    Sending passwords in plain text without SSL/TLS is extremely insecure.
Note    ####    Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[connection master]
include/stop_slave.inc
set @save.slave_parallel_workers= @@global.slave_parallel_workers;
set @@global.slave_parallel_workers= 2;
SET @@global.relay_log_info_repository="TABLE";
SET @@global.sync_relay_log_info= 1;
include/start_slave.inc
Warnings:
Note    1753    slave_transaction_retries is not supported in multi-threaded slave mode. In the event of a transient failure, the slave will not retry the transaction and will stop.
create database d1;
create database d2;
create table d1.t1 (a int primary key) engine=innodb;
create table d2.t1 (a int primary key) engine=innodb;
create table d1.t2m (a int) engine=myisam;
insert into d1.t1 values (1),(2),(3);
insert into d2.t1 values (1),(2),(3);
insert into d1.t2m values (1);
include/sync_slave_sql_with_master.inc
begin;
update d1.t1 set a=31 where a=3;
insert into d1.t1 values (5),(6),(7);
begin;
update d1.t1 set a=20 where a=2;
insert into d1.t2m values (2);
update d1.t1 set a=30 where a=3;
insert into d1.t1 values (4);
commit;
delete from d2.t1;
stop slave;
Exec_Master_Log_Pos = 1507
select Id, Master_log_pos from mysql.slave_worker_info;
Id      Master_log_pos
1       2067
2       1507
main.rpl_exec_master_log_pos             [ fail ]
        Test ended at 2014-11-02 20:39:28

CURRENT_TEST: main.rpl_exec_master_log_pos
mysqltest: At line 81: *** Exec_Master_Log_Pos is inconsistent!

 - saving '/data/ushastry/server/mysql-5.6.21/mysql-test/var/log/main.rpl_exec_master_log_pos/' to '/data/ushastry/server/mysql-5.6.21/mysql-test/var/log/main.rpl_exec_master_log_pos/'
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 0.000 of 16 seconds executing testcases

Completed: Failed 1/1 tests, 0.00% were successful.

Failing test(s): main.rpl_exec_master_log_pos

The log files in var/log may give you some hint of what went wrong.

If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

mysql-test-run: *** ERROR: there were failing test cases
[ushastry@cluster-repo mysql-test]$
[10 Feb 2015 15:03] Andrii Nikitin
see also FR bug 75841 "Disabling MTS must be safe"
[27 Apr 2015 7:22] Sven Sandberg
Thank you for the bug report and for the detailed description.

STOP SLAVE *does* wait for ongoing transactions to complete. It is a bug in the documentation that it says that STOP SLAVE may leave Exec_master_log_pos in an inconsistent state.

In the test case provided above, the reason for the wrong value for Exec_master_log_pos is that the slave threads are stopped due to an error, not due to STOP SLAVE. (The test case does not work as intended, because the STOP SLAVE statement is issued on the same connection where the 'blocking' transaction has started. STOP SLAVE has an implicit commit (see http://dev.mysql.com/doc/refman/5.7/en/implicit-commit.html). This makes the 'blocking' transaction commit before STOP SLAVE. The 'blocking' transaction will then remove the value 3 from the table and release the row lock. Then worker 1 gets to execute. Since the value 3 was removed, the worker will fail on the row event corresponding to the second UPDATE statement.)

So STOP SLAVE works as you expect.

If you need Exec_master_log_pos to be consistent even after an error, the only way would be to use slave_preserve_commit_order (which was introduced in 5.7.5). However, the test case uses per-database parallelization (the only scheme possible in 5.6), whereas slave_preserve_commit_order is only implemented for the LOGICAL_CLOCK parallelization scheme (cf http://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#sysvar_slave_paralle... ). So this bug can potentially be interpreted as a feature request to implement slave_preserve_commit_order also for per-database parallelization (and maybe to backport that to 5.6).

For now, we choose to interpret this as a request to fix the documentation. If you need slave_preserve_commit_order to work for per-database parallelization, please file a separate request.
[27 Apr 2015 7:28] Sven Sandberg
The following test case verifies that STOP SLAVE waits for transactions so that Exec_master_log_pos is correct (in the absence of errors):

--source include/master-slave.inc

--echo #### Initialize ####

--connection slave
--source include/stop_slave.inc
if (`SELECT @@VERSION LIKE '5.7.%'`)
{
  SET @@GLOBAL.SLAVE_PARALLEL_TYPE= 'DATABASE';
}
SET @@GLOBAL.SLAVE_PARALLEL_WORKERS= 2;
--source include/start_slave.inc

--connection master
CREATE DATABASE d1;
CREATE DATABASE d2;
CREATE TABLE d1.t (a INT PRIMARY KEY, b INT) engine=innodb;
CREATE TABLE d2.t (a INT PRIMARY KEY, b INT) engine=innodb;

--echo #### Prepare test scenario ####

# Create 'conflicting'/'blocking' transaction on slave.
--source include/sync_slave_sql_with_master.inc
BEGIN;
INSERT INTO d2.t VALUES (2, 2);

# Issue transactions on master. 
--connection master
INSERT INTO d1.t VALUES (1, 1); # T1
INSERT INTO d2.t VALUES (1, 1); # T2
INSERT INTO d2.t VALUES (2, 1); # This will be a gap when executed on slave
INSERT INTO d2.t VALUES (3, 1); # This will be a gap when executed on slave
INSERT INTO d1.t VALUES (2, 1); # T4

--connection slave1
--let $table=d2.t
--let $count=1
--source include/wait_until_rows_count.inc

--let $table=d1.t
--let $count=2
--source include/wait_until_rows_count.inc

--echo # Now d1.t has two rows and d2.t has one row.

# Wait for coordinator to populate worker's queues.
--let $show_statement= SHOW PROCESSLIST
--let $field= State
--let $condition= = 'Slave has read all relay log; waiting for more updates'
--source include/wait_show_condition.inc

--echo # Now coordinator has read the entire relay log and populated workers' queues.

# There is now a gap at T3
SELECT * FROM d2.t;
SELECT * FROM d1.t;

--echo #### Verify that STOP SLAVE will consume the entire worker queue ####

--send STOP SLAVE
--connection slave

# Despite time elapsed, the slave should still be running, waiting for the
# queue to be completed.
--sleep 4

--let $show_statement= SHOW PROCESSLIST
--let $field= State
--let $condition= = 'Waiting for Slave Worker to release partition'
--source include/wait_show_condition.inc
--echo # Now coordinator is waiting for the worker to consume its queue.

SELECT * FROM d2.t;

ROLLBACK;
--source include/wait_for_slave_sql_to_stop.inc
--echo # Now all slave threads have stopped. Verify that worker completed its queue:

--echo # Should contain 1, 2, 3.
SELECT * FROM d2.t;

--echo #### Clean up ####

SET @@GLOBAL.SLAVE_PARALLEL_WORKERS = 0;
--source include/start_slave.inc

--connection master
DROP DATABASE d1;
DROP DATABASE d2;

--source include/sync_slave_sql_with_master.inc
--source include/rpl_end.inc
[29 Sep 2015 15:41] David Moss
Thanks for your feedback. This has been fixed throughout the 5.7 documentation. The main change is this new section:

http://dev.mysql.com/doc/refman/5.7/en/replication-features-transaction-inconsistencies.ht...

Additionally many of the variables and statements impacted by this have been linked to this new section.