| Bug #59037 | MASTER_POS_WAIT may return prematurely after CHANGE MASTER TO RELAY_LOG_POS | ||
|---|---|---|---|
| Submitted: | 19 Dec 2010 13:53 | Modified: | 13 Dec 2012 18:03 |
| Reporter: | Sven Sandberg | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
| Version: | 5.1+ | OS: | Any |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
| Tags: | CHANGE MASTER, disabled, master_pos_wait | ||
[19 Dec 2010 14:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127257 3213 Sven Sandberg 2010-12-19 This part of the test fails sporadically because of BUG#59037.
[19 Dec 2010 14:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127258 3448 Sven Sandberg 2010-12-19 [merge] Merged patch that disabled part of rpl_change_master due to BUG#59037 from 5.5-bugteam to trunk-bugfixing.
[19 Dec 2010 14:44]
Sven Sandberg
No patch has been pushed for this bug. A patch that disables part of rpl_change_master.test that fails due to this bug will soon be pushed to 5.5-bugteam and trunk-bugfixing.
[19 Dec 2010 15:30]
Valeriy Kravchuk
I was not yet able to repeat with 5.5.8, but on 5.1.54 it is easily repeatable: macbook-pro:mysql-test openxs$ ./mtr bug59037 Logging: ./mtr bug59037 101219 17:28:27 [Warning] Setting lower_case_table_names=2 because file system for /var/folders/dX/dXCzvuSlHX4Op1g-o1jIWk+++TI/-Tmp-/FL7ceeI9De/ is case insensitive 101219 17:28:27 [Note] Plugin 'FEDERATED' is disabled. 101219 17:28:27 [Note] Plugin 'ndbcluster' is disabled. MySQL Version 5.1.54 Checking supported features... - skipping ndbcluster - SSL connections supported - binaries are debug compiled Collecting tests... vardir: /Users/openxs/dbs/5.1/mysql-test/var Checking leftover processes... Removing old var directory... Creating var directory '/Users/openxs/dbs/5.1/mysql-test/var'... Installing system database... Using server port 63053 ============================================================================== TEST RESULT TIME (ms) ------------------------------------------------------------ worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009 main.bug59037 [ fail ] Test ended at 2010-12-19 17:28:38 CURRENT_TEST: main.bug59037 --- /Users/openxs/dbs/5.1/mysql-test/r/bug59037.result 2010-12-19 18:28:24.000000000 +0300 +++ /Users/openxs/dbs/5.1/mysql-test/r/bug59037.reject 2010-12-19 18:28:38.000000000 +0300 @@ -0,0 +1,19 @@ +stop slave; +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; +reset master; +reset slave; +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; +start slave; +CREATE TABLE t1 (a INT); +STOP SLAVE; +DROP TABLE t1; +CHANGE MASTER TO RELAY_LOG_POS = 4; +START SLAVE; +SELECT MASTER_POS_WAIT('master-bin.000001', 192, 0); +MASTER_POS_WAIT('master-bin.000001', 192, 0) +0 +SHOW TABLES; +Tables_in_test +SHOW TABLES; +Tables_in_test +t1 ...
[22 Dec 2010 21:31]
Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101222212842-y0t3ibtd32wd9qaw) (version source revid:alexander.nozdrin@oracle.com-20101222212842-y0t3ibtd32wd9qaw) (merge vers: 5.6.1) (pib:24)
[29 Dec 2010 12:53]
Bugs System
Pushed into mysql-5.5 5.5.9 (revid:alexander.nozdrin@oracle.com-20101229113652-km2v993aurv7h79j) (version source revid:alexander.nozdrin@oracle.com-20101229113132-uonlbcc2uopff8yb) (merge vers: 5.5.9) (pib:24)
[13 Dec 2012 18:03]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.
If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at
http://dev.mysql.com/doc/en/installing-source.html
[13 Dec 2012 18:16]
Jon Stephens
Fixed in trunk. Documented in the 5.7.1 changelog as follows:
It was possible for the MASTER_POS_WAIT() function to return
prematurely following a CHANGE MASTER TO statement that updated
the RELAY_LOG_POS or RELAY_LOG_NAME. This could happen because
CHANGE MASTER TO did not update the master log position in such
cases, causing MASTER_POS_WAIT() to read an invalid log position
and to return immediately.
To fix this problem, the master log position is flagged as
invalid until the position is set to a valid value when the SQL
thread reads the first event, after which it is flagged as
valid. Functions such as MASTER_POS_WAIT() now defer any
comparison with the master log position until a valid value can
be obtained (that is, after the first event following the CHANGE
MASTER TO statement has been applied).
Closed.

Description: The function MASTER_POS_WAIT waits for the slave SQL thread to execute up to a given position. The position is given in master binlog coordinates. The statement CHANGE MASTER TO RELAY_LOG_POS sets the read position for the SQL thread. Here, the position is given in slave relay log coordinates. Suppose the following happens: 1. CHANGE MASTER TO RELAY_LOG_POS is executed in such a way that the position is moved back in the relay log, say from position B to A where A < B 2. Then START SLAVE is executed 3. Then MASTER_POS_WAIT is called, and the position that MASTER_POS_WAIT waits for is between A and B, say position C where A < C <= B. Then MASTER_POS_WAIT can return prematurely, before the position C has been reached. How to repeat: # The following test shows that even after MASTER_POS_WAIT, # table t1 does not exist on slave. It contains a race, so # it is not guaranteed to always show the bug. --source include/have_binlog_format_statement.inc --source include/master-slave.inc CREATE TABLE t1 (a INT); --let $master_file= query_get_value(SHOW MASTER STATUS, File, 1) --let $master_pos= query_get_value(SHOW MASTER STATUS, Position, 1) --sync_slave_with_master STOP SLAVE; DROP TABLE t1; CHANGE MASTER TO RELAY_LOG_POS = 4; START SLAVE; eval SELECT MASTER_POS_WAIT('$master_file', $master_pos, 0); SHOW TABLES; --sleep 1 SHOW TABLES; Suggested fix: Probably CHANGE MASTER TO RELAY_LOG_POS only updates the SQL thread's slave relay log coordinates, not the SQL thread's master binlog coordinates. The subsequent call to MASTER_POS_WAIT then thinks that the master binlog coordinates that were active before CHANGE MASTER are still active. There are two possible approaches to fix this bug: 1. Make CHANGE MASTER TO RELAY_LOG_POS set a flag that indicates the master binlog coordinates are invalid, and make MASTER_POS_WAIT wait for this flag to be updated. It will be updated after the first event has been read. 2. Make CHANGE MASTER TO RELAY_LOG_POS read the master position in the first event.