Bug #37717 rpl.rpl_stm_until 'stmt' fails sporadically on pushbuild
Submitted: 28 Jun 2008 11:26 Modified: 30 Jan 2009 17:12
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S7 (Test Cases)
Version:5.1 OS:Any
Assigned to: Sven Sandberg CPU Architecture:Any
Tags: 51rpl, pushbuild, rpl.rpl_stm_until 'stmt', sporadic, test failure

[28 Jun 2008 11:26] Sven Sandberg
Description:
TEST: rpl.rpl_stm_until 'stmt'

FAILURE:

rpl.rpl_stm_until 'stmt'       [ fail ]

--- /Users/henry/pb/bzr_mysql-6.0/15/mysql-6.0.6-alpha-pb15/mysql-test/suite/rpl/r/rpl_stm_until.result	Thu Jun 26 16:23:31 2008
+++ /Users/henry/pb/bzr_mysql-6.0/15/mysql-6.0.6-alpha-pb15/mysql-test/suite/rpl/r/rpl_stm_until.reject	Thu Jun 26 23:52:41 2008
@@ -162,7 +162,7 @@
 Relay_Log_File	#
 Relay_Log_Pos	#
 Relay_Master_Log_File	master-bin.000001
-Slave_IO_Running	Yes
+Slave_IO_Running	No
 Slave_SQL_Running	No
 Replicate_Do_DB	
 Replicate_Ignore_DB	

mysqltest: Result content mismatch

Stopping All Servers

How to repeat:
 WHERE: 6.0/mleich on Thu Jun 26 13:18:34 2008/'powermacg5' -max/ps_stm
 URL: https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0&order=15
 --
 WHERE: 6.0-bugteam/mattiasj on Mon Jun 9 18:39:19 2008/'sles10-ia64-a' no innodb/ps_stm_threadpool
 URL: https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-bugteam&order=18
 --
 WHERE: 6.0-bugteam/gshchepa on Fri Jun 27 19:46:30 2008/'sles10-ia64-a' no innodb/ps_stm_threadpool
 URL: https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-bugteam&order=32
 --
 WHERE: 5.1/hf on Tue May 20 13:04:40 2008/'powermacg5' -max/ps_stm
 URL: https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-5.1&order=1
 --
 Fails on other locations, too.

Suggested fix:
Possibly the test is wrongly written in the same way as rpl_server_id1 (BUG#36818). It's not safe to do "start slave" followed by "wait_for_slave_to_stop", because the condition that the slave is stopped can be temporarily met during reconnects during the slave startup. 

Btw, the test does not end with sync_slave_with_master. Whoever fixes this should check the cleanup code of the test.
[23 Jul 2008 10:37] Sven Sandberg
The problem is as follows.

When the slave starts, it starts the IO thread and the SQL thread. The SQL thread has only two states: not running/running. The IO thread has three states: not running/running disconnected/running connected.

When the slave starts, it waits until both the IO and the SQL thread are running, but does not wait until the IO thread is connected. The Slave_IO_Running field of SHOW SLAVE STATUS is 'Yes' if and only if the IO thread is running and connected.

Hence, for any SHOW SLAVE STATUS that appears after 'START SLAVE', there is a race. The fix is to wait until Slave_IO_Running = Yes after each START SLAVE.
[23 Jul 2008 11:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50293

2633 Sven Sandberg	2008-07-23
      BUG#37717: rpl.rpl_stm_until 'stmt' fails sporadically on pushbuild
      Problem: After START SLAVE, the Slave_IO_Status column of
      SHOW SLAVE STATUS goes from No to Yes asynchronously. That
      caused sporadic failures on pushbuild in rpl_stm_until since
      the test contains SHOW SLAVE STATUS right after START SLAVE.
      Fix: Wait until Slave_IO_Status becomes Yes after each
      START SLAVE.
[23 Jul 2008 11:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50295

2633 Sven Sandberg	2008-07-23
      BUG#37717: rpl.rpl_stm_until 'stmt' fails sporadically on pushbuild
      Problem: After START SLAVE, the Slave_IO_Status column of
      SHOW SLAVE STATUS goes from No to Yes asynchronously. That
      caused sporadic failures on pushbuild in rpl_stm_until since
      the test contains SHOW SLAVE STATUS right after START SLAVE.
      Fix: Wait until Slave_IO_Status becomes Yes after each
      START SLAVE.
[23 Jul 2008 14:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50339

2633 Sven Sandberg	2008-07-23
      BUG#37717: rpl.rpl_stm_until 'stmt' fails sporadically on pushbuild
      Problem: After START SLAVE, the Slave_IO_Status column of
      SHOW SLAVE STATUS goes from No to Yes asynchronously. That
      caused sporadic failures on pushbuild in rpl_stm_until since
      the test contains SHOW SLAVE STATUS right after START SLAVE.
      Fix: Wait until Slave_IO_Status becomes Yes after each
      START SLAVE.
[23 Jul 2008 14:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50340

2633 Sven Sandberg	2008-07-23
      BUG#37717: rpl.rpl_stm_until 'stmt' fails sporadically on pushbuild
      Problem: After START SLAVE, the Slave_IO_Status column of
      SHOW SLAVE STATUS goes from No to Yes asynchronously. That
      caused sporadic failures on pushbuild in rpl_stm_until since
      the test contains SHOW SLAVE STATUS right after START SLAVE.
      Fix: Wait until Slave_IO_Status becomes Yes after each
      START SLAVE.
[23 Jul 2008 16:24] Sven Sandberg
pushed to 5.1-rpl
[24 Jul 2008 14:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50420

2634 Sven Sandberg	2008-07-24
      Add doxygen files to .bzrignore
[7 Nov 2008 15:47] Georgi Kodinov
Still present in the current 5.1 main tree on the valgrind run : http://tinyurl.com/6abphu
[30 Jan 2009 13:30] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463) (version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers: 6.0.10-alpha) (pib:6)
[30 Jan 2009 15:10] Bugs System
Pushed into 5.1.32 (revid:luis.soares@sun.com-20090129165946-d6jnnfqfokuzr09y) (version source revid:msvensson@mysql.com-20080724213037-v70a92v0489g2s2o) (merge vers: 5.1.28) (pib:6)
[30 Jan 2009 17:12] Paul DuBois
Test case changes. No changelog entry needed.
[17 Feb 2009 14:58] Bugs System
Pushed into 5.1.32-ndb-6.3.23 (revid:tomas.ulin@sun.com-20090217131017-6u8qz1edkjfiobef) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)
[17 Feb 2009 16:46] Bugs System
Pushed into 5.1.32-ndb-6.4.3 (revid:tomas.ulin@sun.com-20090217134419-5ha6xg4dpedrbmau) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)
[17 Feb 2009 18:22] Bugs System
Pushed into 5.1.32-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090217134216-5699eq74ws4oxa0j) (version source revid:tomas.ulin@sun.com-20090201210519-vehobc4sy3g9s38e) (merge vers: 5.1.32-ndb-6.2.17) (pib:6)