Bug #47749 rpl_slave_skip fails sporadically on PB2 (mysql-5.1-rep+2 tree).
Submitted: 30 Sep 2009 16:35 Modified: 12 Nov 2009 12:12
Reporter: Luis Soares Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Luis Soares CPU Architecture:Any

[30 Sep 2009 16:35] Luis Soares
Description:
After backporting BUG#30703 test rpl_slave_skip started failing
sporadically with:

CURRENT_TEST: rpl.rpl_slave_skip
mysqltest: At line 38: query 'SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1' failed: 1198: This operation cannot be performed with a running slave; run STOP SLAVE first

Looking at the test case, one finds that the slave issues: 

START SLAVE UNTIL MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=762;
wait_for_slave_to_stop;
[...]
START SLAVE;

wait_for_slave_to_stop is implemented in a way that repeatedly
checks the status variable 'Slave_runnning' until it returns
0. After the patch for BUG#30703 was backported this variable
returns 1 only when:

 1. the IO thread is connected:
    active_mi->slave_running == MYSQL_SLAVE_RUN_CONNECT 

 2. and SQL thread is running 
    active_mi->rli.slave_running > 0.

So for the failure in above, it can be that the
wait_for_slave_to_stop gets 0 for variable Slave_running when the
IO thread is in fact bootstrapping, ie, connecting to the master
and before transferring any data.

Given that wait_for_slave_to_stop may succeed too earlier in the
process, ie, without the slave having actually started and
stopped, the test continues and later issues a START SLAVE while
the slave is running. The outcome? ... Well... Just check the
failure output above.

How to repeat:
URL: http://pb2.norway.sun.com/web.py?template=push_details&push=551168
Log: http://trollheim.norway.sun.com/archive/780866.log

Suggested fix:
replace the 

wait_for_slave_to_stop;

with

-- source include/wait_for_slave_sql_to_stop.inc;
[30 Sep 2009 16:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85247

3123 Luis Soares	2009-09-30
      BUG#47749: rpl_slave_skip fails sporadically on PB2 (mysql-5.1-rep+2 tree).
      
      rpl_slave_skip fails randomly on PB2. This patch fixes the failure by
      setting explicit wait for SQL thread to stop, instead of the 
      wait_for_slave_to_stop mysqltest command, after a start until command 
      is executed.
[30 Sep 2009 19:29] Luis Soares
Pushed to mysql-5.1-rep+2.
[27 Oct 2009 9:49] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091027094604-9p7kplu1vd2cvcju) (version source revid:zhenxing.he@sun.com-20091026140226-uhnqejkyqx1aeilc) (merge vers: 6.0.14-alpha) (pib:13)
[27 Oct 2009 18:02] Jon Stephens
Change in test suite only, no end-user changes to document.

Closed w/o further action.
[12 Nov 2009 8:21] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:alik@sun.com-20091027095744-rf45u3x3q5d1f5y0) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 12:12] Jon Stephens
Re-closed w/o further action; see previous comments.