Bug #36818 rpl_server_id1 fails expecting slave has stopped
Submitted: 20 May 2008 13:24 Modified: 28 Jul 2008 15:23
Reporter: Andrei Elkin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Tests Severity:S7 (Test Cases)
Version:5.1 OS:Any
Assigned to: Andrei Elkin CPU Architecture:Any
Tags: pushbuild, sporadic, test failures

[20 May 2008 13:24] Andrei Elkin
Description:
On a slow environment like valgrid

https://intranet.mysql.com/secure/pushbuild/getlog.pl?dir=mysql-5.1-bugteam&entry=gshchepa...

the test is vulnerable because it does not check if slave has stopped at time
of the new session is requested `start slave;'

The differences like
-Slave_IO_State	
+Slave_IO_State	Checking master version
-Slave_IO_Running	No
+Slave_IO_Running	Yes
etc
can be explained with that the preceeding `stop slave' effects had not yet taken place, and the test needs to wait for the fact that slave has stopped indeed.

How to repeat:
check pb

Suggested fix:
stop slave;
+source include/wait_for_slave_to_stop.inc;
[20 May 2008 13:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46861

ChangeSet@1.2647, 2008-05-20 16:27:46+03:00, aelkin@mysql1000.dsl.inet.fi +1 -0
  Bug #36818  	rpl_server_id1 fails expecting slave has stopped
  
  the test is vulnerable because it does not check if slave has stopped at time
  of the new session is requested `start slave;'
  
  Fixed with deploying explicitly wait_for_slave_to_stop synchronization macro.
[17 Jun 2008 19:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48025

2662 Andrei Elkin	2008-06-17
      Bug #36818  rpl_server_id1 fails expecting slave has stopped
      
      the test was vulnerable because there was a possibility for the slave io thread start
      reconnecting in between of two cycles of source include/wait_for_slave_io_to_stop.inc.
      The supposed to stay still slave mananged to re-start because the delay between reconnecting
      was apparently small - 1 sec.
      At restarting show slave status faces IO running which can happen before comparing the master id 
      with the local id that is supposed to stop the IO thread.
      
      Fixed with changing master_connect_retry from a small default to an impossible to exceed large.
[19 Jun 2008 14:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48181

2662 Andrei Elkin	2008-06-19
      Bug #36818  rpl_server_id1 fails expecting slave has stopped
      
      the test was vulnerable because there was a possibility for the slave io thread
      start reconnecting in after it got stopped at wait_for_slave_io_to_stop.inc.
      The possibility was due to a small 1 sec change master's reconnecting parameter so that on slow
      env the following show slave status could find the slave connected again.
      
      Fixed with changing master_connect_retry from a small default to an impossible to
      exceed large.
[19 Jun 2008 16:51] Andrei Elkin
Restoring the status. no push has been done.
[27 Jun 2008 15:58] Andrei Elkin
Need more info from pb's slave logs.
`stop|start slave' are synchronous calls wrt to Slave IO status. Hence, enforcing of synchronization with `wait_for_slave_io_start|stop' is unnecessary.
The matter could be in that the slave io connecting to its own server as the master might not succeed at the first attempt. That's why the info from the logs.
[1 Jul 2008 16:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48827

2618 Andrei Elkin	2008-07-01
      Bug #36818 rpl_server_id1 fails expecting slave has stopped
      
      A "working" commit to call back the failure in order to inspect remaining logs.
[2 Jul 2008 9:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48861

2671 Andrei Elkin	2008-07-02
      Bug #36818 rpl_server_id1 fails expecting slave has stopped
      
      a "null" push in order to summon the failure.
      Need more info from pb's slave logs in order to fix the bug.
[2 Jul 2008 9:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48862

2671 Andrei Elkin	2008-07-02
      Bug #36818 rpl_server_id1 fails expecting slave has stopped
      
      a "null" push in order to summon the failure.
      Need more info from pb's slave logs in order to fix the bug.
[18 Jul 2008 10:09] Andrei Elkin
The reason of the bug is that the slave's IO state is set to NO despite the thread itself has started and responded to a client connection that issued `START SLAVE'. Responding to the client thread later, upon connecting to the master, can not be done as it would be unrolling bug#31024 idea.
[18 Jul 2008 11:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50021

2706 Andrei Elkin	2008-07-18
      Bug #36818 rpl_server_id1 fails expecting slave has stopped
      
      the reason for the failure is that io thread passes through a sequence of state changes before
      it eventually got stuck at the expect running state as NO.
      It's unreasonble to wait for the running status while the whole idea of the test is to get
      to the IO thread error.
      
      Fixed with changing the waiting condition.
[18 Jul 2008 11:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50027

2706 Andrei Elkin	2008-07-18
       Bug #36818 rpl_server_id1 fails expecting slave has stopped
            
      the reason for the failure is that io thread passes through a sequence of state
      changes before it eventually got stuck at the expect running state as NO.
      It's unreasonble to wait for the running status while the whole idea of the test is
      to get to the IO thread error.
      
      Fixed with changing the waiting condition.
[18 Jul 2008 14:21] Andrei Elkin
pushed to 5.1-bugteam
[18 Jul 2008 14:21] Andrei Elkin
pushed to 5.1-bugteam
[22 Jul 2008 18:36] Bugs System
Pushed into 5.1.28
[22 Jul 2008 20:11] Paul DuBois
Test case change. No changelog entry needed.

Setting report to Patch queued pending push into 6.0.x
[28 Jul 2008 13:26] Georgi Kodinov
Pushed into 6.0.7-alpha
[28 Jul 2008 14:45] Bugs System
Pushed into 6.0.7-alpha  (revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (version source revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (pib:3)
[28 Jul 2008 15:23] Paul DuBois
Test case change. No changelog entry needed.
[28 Jul 2008 16:44] Bugs System
Pushed into 5.1.28  (revid:davi.arnaut@sun.com-20080722182431-0i2f1yc4uocime9q) (version source revid:davi.arnaut@sun.com-20080722182431-0i2f1yc4uocime9q) (pib:3)
[13 Sep 2008 19:47] Bugs System
Pushed into 6.0.6-alpha  (revid:aelkin@mysql.com-20080718115316-wbnusxnr07y4p6qe) (version source revid:sergefp@mysql.com-20080611231653-nmuqmw6dedjra79i) (pib:3)
[30 Jan 2009 13:32] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463) (version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers: 6.0.10-alpha) (pib:6)