Bug #28497 wait_for_slave_to_stop can cause random replication mysql-test failures
Submitted: 17 May 2007 12:04 Modified: 18 Jun 2007 16:30
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Tests Severity:S3 (Non-critical)
Version:5.1 OS:Linux
Assigned to: Jonathan Miller CPU Architecture:Any

[17 May 2007 12:04] Jonathan Miller
Description:
Hi Jeb, Serge,

Can you please implement solution 1 as described below (do_wait_for_slave_to_stop should wait for both threads to stop)?

Open a bug report if you want to.

Best wishes,
Lars

On Thu, May 10, 2007 at 04:14:00PM +0200, Guilhem Bichot wrote:
> Hi,
> 
> as Tomas observed, do_wait_for_slave_to_stop monitors "Slave_running"
> which is defined as
> "slave sql thread running AND slave I/O thread running"
> This definition is correct IMHO as a user expects to see "YES" when 
> the replication is fully running i.e. both threads are.
> But when we want to wait for the slave to stop, we would want to wait 
> for both threads to have stopped. Now we stop waiting as soon as one 
> thread stopped and so this is causing random test failures:
> https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=mysql-5.1-
> new-ndb&order=453
> (click on rpl_ndb_extraCol).
> 
> So, there are three choices:
> - modify do_wait_for_slave_to_stop so that it rather looks for 
> slave_sql_running==No and slave_io_running==No in SHOW SLAVE STATUS, 
> and does not look at Slave_running.
> - or modify testcases to not use wait_for_slave_to_stop but instead 
> use Matthias script:
> include/wait_slave_status.inc
> to wait for %No%No%.
> - <tomas> guilhem, another option is to give "Slave_running" more 
> states... than "ON", "OFF"
> <guilhem> tomas: yes, if that can be done without breaking old 
> monitoring apps
> 
> I did my duty of transmitting. Good luck the implementors :)
> 
> -- 
>    __  ___     ___ ____  __
>   /  |/  /_ __/ __/ __ \/ /    Mr. Guilhem Bichot <guilhem@mysql.com>
>  / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Full-Time Software Developer
> /_/  /_/\_, /___/\___\_\___/   Bordeaux, France
>        <___/   www.mysql.com   

--
Dr. Lars Thalmann
Replication and Clustering Technology
MySQL AB, www.mysql.com

How to repeat:
see above
[17 May 2007 12:15] Jonathan Miller
bk commit - 5.1 tree (jmiller:1.2580)
bk commit - 5.1 tree (jmiller:1.2579)
bk commit - 5.1 tree (jmiller:1.2578)
[29 May 2007 13:10] Jonathan Miller
Hi,

Just so we are clear, there are 3 cases that we need to test for.

1) Both the slave IO and SQL threads have stopped.

wait_for_slave_to_stop

2) The slave SQL Thread has stopped.

This is for error injection when we are testing that the slave SQL thread has seen the error and stops as expected.

3) Wait for both threads to start. This will allow sleeps to be removed from test cases that can cause random failures.

Best wishes,
/Jeb
[1 Jun 2007 10:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/27893

ChangeSet@1.2659, 2007-06-01 12:01:42+02:00, msvensson@pilot.(none) +3 -0
  Bug#28497 wait_for_slave_to_stop can cause random replication mysql-test failures
   - Add funtion "query_get_value to allow reading a fields value
     into a $variable
[5 Jun 2007 20:08] Jonathan Miller
Hi,

The tool Magnus made is good, but does not really resolve this bug by itself. We still have need to create then new test include scripts to replace the deprecated option of wait_for_slave_to_stop in the mysqltest.c.

I am assigning this bug currently to Serge as part of work log https://intranet.mysql.com/worklog/QA-Sprint/index.pl?tid=3894 (pl_*.test refactored) to be complete hopefully in short order for 5.1 -> above. 

Setting Lars as one of the Reviewers, he can change that to who ever he wishes.

Best wishes,
/Jeb
[6 Jun 2007 16:54] Bugs System
Pushed into 5.1.20-beta
[6 Jun 2007 16:57] Bugs System
Pushed into 5.0.44
[6 Jun 2007 17:01] Bugs System
Pushed into 4.1.24
[11 Jun 2007 14:58] Jonathan Miller
Per Calvin, he wants this patch asap.
[12 Jun 2007 2:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28549

ChangeSet@1.2547, 2007-06-12 04:28:58+02:00, jmiller@mysql.com +13 -0
  Many files:
    Updated test files. Updated to use new include. Result files had no changes. Bug#28497
    New include files to replace the old mysqltest.c wait_for_slave_to_stop function. This should help with the stability of the tests that use stop slave or depend on slave failure. This should complete Bug#28497
[13 Jun 2007 3:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28623

ChangeSet@1.2488, 2007-06-13 05:52:43+02:00, jmiller@mysql.com +14 -0
  Updated patch for Bug#28497 based off of Magnus's review
[13 Jun 2007 3:56] Jonathan Miller
pushed into mysql-5.1-maint
[14 Jun 2007 6:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28714

ChangeSet@1.2497, 2007-06-14 08:48:00+02:00, msvensson@pilot.(none) +1 -0
  Bug#28497 wait_for_slave_to_stop can cause random replication mysql-test failures
   - touch up
[16 Jun 2007 4:50] Bugs System
Pushed into 5.1.20-beta
[18 Jun 2007 16:30] Paul DuBois
Test suite change. No changelog entry needed.
[22 Jun 2007 18:07] Bugs System
Pushed into 5.1.20-beta