Bug #73035 START SLAVE UNTIL will timeout if waiting for the last transaction in relaylog
Submitted: 18 Jun 2014 11:30 Modified: 30 Jun 2014 8:10
Reporter: João Gramacho Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S3 (Non-critical)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any

[18 Jun 2014 11:30] João Gramacho
Description:
While designing some test cases I found an issue that was preventing
"START SLAVE UNTIL" with master binlog file/position or slave relay
log file/position clauses to stop if the position in the until clause
was of the last transaction present on the relay log.

The UNTIL verification for master binlog file/position or slave
relay log file/position clauses is only checked inside next_event()
function, after calling read_log_event(). If read_log_event() return
no event because it reached the end of the relay log, it will not
check the UNTIL clause, and will wait for the relay log update by
calling relay_log.wait_for_update_relay_log().

There is a UNTIL verification in the exec_relay_log_event() function,
before calling next_event() function, but only for until clauses
containing SQL_AFTER_GTIDS option.

How to repeat:
Use the following MTR test cases:

=== FIRST TEST CASE ===

$ cat rpl_until_pos_in_last_group_of_master_binlog.test 
#
# Check if SQL thread stops correctly
# after executing the last event of the master binlog
# based on master binlog file and position
#
# This test with GTIDs will hit BUG#18306199, so disabling GTIDs by now
--source include/not_gtid_enabled.inc
--source include/master-slave.inc
# Stop the SQL thread after syncing
--source include/sync_slave_sql_with_master.inc
--source include/stop_slave_sql.inc
# Save master position to use on UNTIL clause of START SLAVE
--source include/rpl_connection_master.inc
--let $master_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $master_pos= query_get_value(SHOW MASTER STATUS, Position, 1)
--inc $master_pos
# Create a table in the master
CREATE TABLE t1 (a INT) ENGINE=InnoDB;
# Sync the IO thread with the master
--source include/sync_slave_io_with_master.inc
# Start SQL thread until it executed the CREATE TABLE
eval START SLAVE SQL_THREAD UNTIL MASTER_LOG_FILE = "$master_file", MASTER_LOG_POS = $master_pos;
# Wait until SQL thread reaches last master binlog file
--let $slave_param= Relay_Master_Log_File
--let $slave_param_value= $master_file
--let $slave_param_comparison= =
--source include/wait_for_slave_param.inc
# Wait until SQL thread reaches desired master binlog position
--let $slave_param= Exec_Master_Log_Pos
--let $slave_param_value= $master_pos
--let $slave_param_comparison= >=
--source include/wait_for_slave_param.inc
# If the desired position was reached, SQL thread should stop
--let $slave_param_comparison= =
--source include/wait_for_slave_sql_to_stop.inc
# Cleanup
--source include/start_slave_sql.inc
--source include/rpl_connection_master.inc
DROP TABLE t1;
--source include/rpl_end.inc

=== SECOND TEST CASE ===

$ cat rpl_until_pos_in_last_group_of_slave_relaylog.test 
#
# Check if SQL thread stops correctly
# after executing the last event of the slave relaylog
# based on slave relaylog file and position
#
# This test with GTIDs will hit BUG#18306199, so disabling GTIDs by now
--source include/not_gtid_enabled.inc
--source include/master-slave.inc
# Stop the SQL thread after syncing
--source include/sync_slave_sql_with_master.inc
--source include/stop_slave_sql.inc
# Save slave position to use on UNTIL clause of START SLAVE
--source include/rpl_connection_slave.inc
--let $slave_file= query_get_value(SHOW SLAVE STATUS, Relay_Log_File, 1)
--let $slave_pos= query_get_value(SHOW SLAVE STATUS, Relay_Log_Pos, 1)
--inc $slave_pos
# Create a table in the master
--source include/rpl_connection_master.inc
CREATE TABLE t1 (a INT) ENGINE=InnoDB;
# Sync the IO thread with the master
--source include/sync_slave_io_with_master.inc
--echo # Start SQL thread until it executed the CREATE TABLE
eval START SLAVE SQL_THREAD UNTIL RELAY_LOG_FILE = "$slave_file", RELAY_LOG_POS = $slave_pos;
# Wait until SQL thread reaches last slave relaylog file
--let $slave_param= Relay_Log_File
--let $slave_param_value= $slave_file
--let $slave_param_comparison= =
--source include/wait_for_slave_param.inc
# Wait until SQL thread reaches desired slave relaylog position
--let $slave_param= Relay_Log_Pos
--let $slave_param_value= $slave_pos
--let $slave_param_comparison= >=
--source include/wait_for_slave_param.inc
# If the desired position was reached, SQL thread should stop
--let $slave_param_comparison= =
--source include/wait_for_slave_sql_to_stop.inc
# Cleanup
--source include/start_slave_sql.inc
--source include/rpl_connection_master.inc
DROP TABLE t1;
--source include/rpl_end.inc

Suggested fix:
=== modified file 'mysql-test/include/wait_for_slave_sql_to_stop.inc'
--- mysql-test/include/wait_for_slave_sql_to_stop.inc	revid:sujatha.sivakumar@oracle.com-20140616044742-3ibgiv08ygux3a9j
+++ mysql-test/include/wait_for_slave_sql_to_stop.inc	2014-06-18 11:08:07 +0000
@@ -31,6 +31,7 @@
 
 --let $slave_param= Slave_SQL_Running
 --let $slave_param_value= No
+--let $slave_param_comparison= =
 --let $slave_error_param= Last_SQL_Errno
 --source include/wait_for_slave_param.inc
 --let $slave_error_param= 

=== modified file 'sql/rpl_slave.cc'
--- sql/rpl_slave.cc	revid:sujatha.sivakumar@oracle.com-20140616044742-3ibgiv08ygux3a9j
+++ sql/rpl_slave.cc	2014-06-18 11:05:09 +0000
@@ -3796,14 +3796,20 @@
   mysql_mutex_lock(&rli->data_lock);
 
   /*
-    UNTIL_SQL_AFTER_GTIDS requires special handling since we have to check
-    whether the until_condition is satisfied *before* the SQL threads goes on
-    a wait inside next_event() for the relay log to grow. This is reuired since
-    if we have already applied the last event in the waiting set but since he
-    check happens only at the start of the next event we may end up waiting
-    forever the next event is not available or is delayed.
+    UNTIL_SQL_AFTER_GTIDS, UNTIL_MASTER_POS and UNTIL_RELAY_POS requires
+    special handling since we have to check whether the until_condition is
+    satisfied *before* the SQL threads goes on a wait inside next_event()
+    for the relay log to grow.
+    This is required in the following case: We have already applied the last
+    event in the waiting set, but the relay log ends after this event. Then it
+    is not enough to check the condition in next_event; we also have to check
+    it here, before going to sleep. Otherwise, if no updates were coming from
+    the master, we would sleep forever despite having reached the required
+    position.
   */
-  if (rli->until_condition == Relay_log_info::UNTIL_SQL_AFTER_GTIDS &&
+  if ((rli->until_condition == Relay_log_info::UNTIL_SQL_AFTER_GTIDS ||
+       rli->until_condition == Relay_log_info::UNTIL_MASTER_POS ||
+       rli->until_condition == Relay_log_info::UNTIL_RELAY_POS) &&
        rli->is_until_satisfied(thd, NULL))
   {
     rli->abort_slave= 1;
[30 Jun 2014 8:10] Jon Stephens
Fixed in 5.7.5. Testing only, no further action required. Closed.

Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html