Bug #36818 rpl_server_id1 fails expecting slave has stopped
Submitted: 20 May 2008 15:24 Modified: 28 Jul 2008 17:23
Reporter: Andrei Elkin
Status: Closed
Category:Server: Tests Severity:S2 (Serious)
Version:5.1 OS:Any
Assigned to: Andrei Elkin Target Version:5.1+
Tags: test failures, pushbuild, sporadic
Triage: D3 (Medium)

[20 May 2008 15:24] Andrei Elkin
Description:
On a slow environment like valgrid

https://intranet.mysql.com/secure/pushbuild/getlog.pl?dir=mysql-5.1-bugteam&entry=gshchepa...

the test is vulnerable because it does not check if slave has stopped at time
of the new session is requested `start slave;'

The differences like
-Slave_IO_State	
+Slave_IO_State	Checking master version
-Slave_IO_Running	No
+Slave_IO_Running	Yes
etc
can be explained with that the preceeding `stop slave' effects had not yet taken place,
and the test needs to wait for the fact that slave has stopped indeed.

How to repeat:
check pb

Suggested fix:
stop slave;
+source include/wait_for_slave_to_stop.inc;
[20 May 2008 15:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46861

ChangeSet@1.2647, 2008-05-20 16:27:46+03:00, aelkin@mysql1000.dsl.inet.fi +1 -0
  Bug #36818  	rpl_server_id1 fails expecting slave has stopped
  
  the test is vulnerable because it does not check if slave has stopped at time
  of the new session is requested `start slave;'
  
  Fixed with deploying explicitly wait_for_slave_to_stop synchronization macro.
[17 Jun 2008 21:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48025

2662 Andrei Elkin	2008-06-17
      Bug #36818  rpl_server_id1 fails expecting slave has stopped
      
      the test was vulnerable because there was a possibility for the slave io thread
start
      reconnecting in between of two cycles of source
include/wait_for_slave_io_to_stop.inc.
      The supposed to stay still slave mananged to re-start because the delay between
reconnecting
      was apparently small - 1 sec.
      At restarting show slave status faces IO running which can happen before comparing
the master id 
      with the local id that is supposed to stop the IO thread.
      
      Fixed with changing master_connect_retry from a small default to an impossible to
exceed large.
[19 Jun 2008 16:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48181

2662 Andrei Elkin	2008-06-19
      Bug #36818  rpl_server_id1 fails expecting slave has stopped
      
      the test was vulnerable because there was a possibility for the slave io thread
      start reconnecting in after it got stopped at wait_for_slave_io_to_stop.inc.
      The possibility was due to a small 1 sec change master's reconnecting parameter so
that on slow
      env the following show slave status could find the slave connected again.
      
      Fixed with changing master_connect_retry from a small default to an impossible to
      exceed large.
[19 Jun 2008 18:51] Andrei Elkin
Restoring the status. no push has been done.
[27 Jun 2008 17:58] Andrei Elkin
Need more info from pb's slave logs.
`stop|start slave' are synchronous calls wrt to Slave IO status. Hence, enforcing of
synchronization with `wait_for_slave_io_start|stop' is unnecessary.
The matter could be in that the slave io connecting to its own server as the master might
not succeed at the first attempt. That's why the info from the logs.
[1 Jul 2008 18:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48827

2618 Andrei Elkin	2008-07-01
      Bug #36818 rpl_server_id1 fails expecting slave has stopped
      
      A "working" commit to call back the failure in order to inspect remaining logs.
[2 Jul 2008 11:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48861

2671 Andrei Elkin	2008-07-02
      Bug #36818 rpl_server_id1 fails expecting slave has stopped
      
      a "null" push in order to summon the failure.
      Need more info from pb's slave logs in order to fix the bug.
[2 Jul 2008 11:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48862

2671 Andrei Elkin	2008-07-02
      Bug #36818 rpl_server_id1 fails expecting slave has stopped
      
      a "null" push in order to summon the failure.
      Need more info from pb's slave logs in order to fix the bug.
[18 Jul 2008 12:09] Andrei Elkin
The reason of the bug is that the slave's IO state is set to NO despite the thread itself
has started and responded to a client connection that issued `START SLAVE'. Responding to
the client thread later, upon connecting to the master, can not be done as it would be
unrolling bug#31024 idea.
[18 Jul 2008 13:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50021

2706 Andrei Elkin	2008-07-18
      Bug #36818 rpl_server_id1 fails expecting slave has stopped
      
      the reason for the failure is that io thread passes through a sequence of state
changes before
      it eventually got stuck at the expect running state as NO.
      It's unreasonble to wait for the running status while the whole idea of the test is
to get
      to the IO thread error.
      
      Fixed with changing the waiting condition.
[18 Jul 2008 13:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50027

2706 Andrei Elkin	2008-07-18
       Bug #36818 rpl_server_id1 fails expecting slave has stopped
            
      the reason for the failure is that io thread passes through a sequence of state
      changes before it eventually got stuck at the expect running state as NO.
      It's unreasonble to wait for the running status while the whole idea of the test is
      to get to the IO thread error.
      
      Fixed with changing the waiting condition.
[18 Jul 2008 16:21] Andrei Elkin
pushed to 5.1-bugteam
[18 Jul 2008 16:21] Andrei Elkin
pushed to 5.1-bugteam
[22 Jul 2008 20:36] Bugs System
Pushed into 5.1.28
[22 Jul 2008 22:11] Paul DuBois
Test case change. No changelog entry needed.

Setting report to Patch queued pending push into 6.0.x
[28 Jul 2008 15:26] Georgi Kodinov
Pushed into 6.0.7-alpha
[28 Jul 2008 16:45] Bugs System
Pushed into 6.0.7-alpha  (revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (version
source revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (pib:3)
[28 Jul 2008 17:23] Paul DuBois
Test case change. No changelog entry needed.
[28 Jul 2008 18:44] Bugs System
Pushed into 5.1.28  (revid:davi.arnaut@sun.com-20080722182431-0i2f1yc4uocime9q) (version
source revid:davi.arnaut@sun.com-20080722182431-0i2f1yc4uocime9q) (pib:3)
[13 Sep 2008 21:47] Bugs System
Pushed into 6.0.6-alpha  (revid:aelkin@mysql.com-20080718115316-wbnusxnr07y4p6qe) (version
source revid:sergefp@mysql.com-20080611231653-nmuqmw6dedjra79i) (pib:3)
[30 Jan 14:32] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463)
(version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers:
6.0.10-alpha) (pib:6)