Bug #37975 wait_for_slave_* should increase the timeout
Submitted: 8 Jul 2008 17:14 Modified: 30 Jan 2009 17:17
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S7 (Test Cases)
Version:5.1+ OS:Any
Assigned to: Sven Sandberg CPU Architecture:Any
Tags: 51rpl, timeout, wait_for_slave_io_to_stop, wait_for_slave_param, wait_for_slave_sql_error, wait_for_slave_sql_to_start, wait_for_slave_sql_to_stop, wait_for_slave_to_start, wait_for_slave_to_stop

[8 Jul 2008 17:14] Sven Sandberg
Description:
It is a quite common error in pushbuild that one of the wait_for_slave_* macros fails. In some cases, it has been difficult to come up with any better explanation than the host was overloaded and the timeout exceeded. Therefore, it would be good to increase the timeout. That would have the following consequences:

 - Tests where nothing is wrong, they are only run on a slow server, will more likely succeed.

 - For tests that have real errors, we get stronger evidence of the error.

 - It takes longer time to execute tests that have real errors.

How to repeat:
See this xref log, for instance: http://tinyurl.com/5dy3rh

Suggested fix:
Increase the timeout to 5 minutes.
While I'm there, I will add output from show binlog events to the error message.
Also, I will refactor so that all wait_for_slave_* use wait_for_slave_param as an atom.
[10 Jul 2008 13:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49437

2624 Sven Sandberg	2008-07-10
      BUG#37975: wait_for_slave_* should increase the timeout
      Problem 1: tests often fail in pushbuild with a timeout when waiting
      for the slave to start/stop/receive error.
      Fix 1: Updated the wait_for_slave_* macros in the following way:
      - The timeout is increased by a factor ten
      - Refactored the macros so that wait_for_slave_param does the work for
      the other macros.
      Problem 2: Tests are often incorrectly written, lacking a
      source include/wait_for_slave_to_[start|stop].inc.
      Fix 2: Improved the chance to get it right by adding
      include/start_slave.inc and include/stop_slave.inc, and updated tests
      to use these.
      Problem 3: The the built-in test language command
      wait_for_slave_to_stop is a misnomer (does not wait for the slave io
      thread) and does not give as much debug info in case of failure as
      the otherwise equivalent macro
      source include/wait_for_slave_sql_to_stop.inc
      Fix 3: Replaced all calls to the built-in command by a call to the
      macro.
      Problem 4: Some, but not all, of the wait_for_slave_* macros had an
      implicit connection slave. This made some tests confusing to read,
      and made it more difficult to use the macro in circular replication
      scenarios, where the connection named master needs to wait.
      Fix 4: Removed the implicit connection slave from all
      wait_for_slave_* macros, and updated tests to use an explicit
      connection slave where necessary.
      Problem 5: The macros wait_slave_status.inc and wait_show_pattern.inc
      were unused. Moreover, using them is difficult and error-prone.
      Fix 5: remove these macros.
      Problem 6: log_bin_trust_function_creators_basic failed when running
      tests because it assumed @@global.log_bin_trust_function_creators=1,
      and some tests modified this variable without resetting it to its
      original value.
      Fix 6: All tests that use this variable have been updated so that
      they reset the value at end of test.
[10 Jul 2008 15:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49477

2624 Sven Sandberg	2008-07-10
      BUG#37975: wait_for_slave_* should increase the timeout
      Problem 1: tests often fail in pushbuild with a timeout when waiting
      for the slave to start/stop/receive error.
      Fix 1: Updated the wait_for_slave_* macros in the following way:
      - The timeout is increased by a factor ten
      - Refactored the macros so that wait_for_slave_param does the work for
      the other macros.
      Problem 2: Tests are often incorrectly written, lacking a
      source include/wait_for_slave_to_[start|stop].inc.
      Fix 2: Improved the chance to get it right by adding
      include/start_slave.inc and include/stop_slave.inc, and updated tests
      to use these.
      Problem 3: The the built-in test language command
      wait_for_slave_to_stop is a misnomer (does not wait for the slave io
      thread) and does not give as much debug info in case of failure as
      the otherwise equivalent macro
      source include/wait_for_slave_sql_to_stop.inc
      Fix 3: Replaced all calls to the built-in command by a call to the
      macro.
      Problem 4: Some, but not all, of the wait_for_slave_* macros had an
      implicit connection slave. This made some tests confusing to read,
      and made it more difficult to use the macro in circular replication
      scenarios, where the connection named master needs to wait.
      Fix 4: Removed the implicit connection slave from all
      wait_for_slave_* macros, and updated tests to use an explicit
      connection slave where necessary.
      Problem 5: The macros wait_slave_status.inc and wait_show_pattern.inc
      were unused. Moreover, using them is difficult and error-prone.
      Fix 5: remove these macros.
      Problem 6: log_bin_trust_function_creators_basic failed when running
      tests because it assumed @@global.log_bin_trust_function_creators=1,
      and some tests modified this variable without resetting it to its
      original value.
      Fix 6: All tests that use this variable have been updated so that
      they reset the value at end of test.
[10 Jul 2008 16:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49479

2624 Sven Sandberg	2008-07-10
      BUG#37975: wait_for_slave_* should increase the timeout
      Problem 1: tests often fail in pushbuild with a timeout when waiting
      for the slave to start/stop/receive error.
      Fix 1: Updated the wait_for_slave_* macros in the following way:
      - The timeout is increased by a factor ten
      - Refactored the macros so that wait_for_slave_param does the work for
      the other macros.
      Problem 2: Tests are often incorrectly written, lacking a
      source include/wait_for_slave_to_[start|stop].inc.
      Fix 2: Improved the chance to get it right by adding
      include/start_slave.inc and include/stop_slave.inc, and updated tests
      to use these.
      Problem 3: The the built-in test language command
      wait_for_slave_to_stop is a misnomer (does not wait for the slave io
      thread) and does not give as much debug info in case of failure as
      the otherwise equivalent macro
      source include/wait_for_slave_sql_to_stop.inc
      Fix 3: Replaced all calls to the built-in command by a call to the
      macro.
      Problem 4: Some, but not all, of the wait_for_slave_* macros had an
      implicit connection slave. This made some tests confusing to read,
      and made it more difficult to use the macro in circular replication
      scenarios, where the connection named master needs to wait.
      Fix 4: Removed the implicit connection slave from all
      wait_for_slave_* macros, and updated tests to use an explicit
      connection slave where necessary.
      Problem 5: The macros wait_slave_status.inc and wait_show_pattern.inc
      were unused. Moreover, using them is difficult and error-prone.
      Fix 5: remove these macros.
      Problem 6: log_bin_trust_function_creators_basic failed when running
      tests because it assumed @@global.log_bin_trust_function_creators=1,
      and some tests modified this variable without resetting it to its
      original value.
      Fix 6: All tests that use this variable have been updated so that
      they reset the value at end of test.
[14 Jul 2008 8:36] Sven Sandberg
pushed to 5.1-rpl
[14 Jul 2008 9:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49663

2626 Sven Sandberg	2008-07-14
      BUG#37975: wait_for_slave_* should increase the timeout
      Post-push fixes: I forgot to include the new files
      start_slave.inc and stop_slave.inc in the previuos push.
[14 Jul 2008 9:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49664

2626 Sven Sandberg	2008-07-14
      BUG#37975: wait_for_slave_* should increase the timeout
      Post-push fixes: I forgot to include the new files
      start_slave.inc and stop_slave.inc in the previuos push.
[14 Jul 2008 14:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49678

2627 Sven Sandberg	2008-07-14
      BUG#37975: wait_for_slave_* should increase the timeout
      Post-push fixes: forgot to commit an updated result file.
[14 Jul 2008 14:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49679

2627 Sven Sandberg	2008-07-14
      BUG#37975: wait_for_slave_* should increase the timeout
      Post-push fixes: forgot to commit an updated result file.
[30 Jan 2009 13:31] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463) (version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers: 6.0.10-alpha) (pib:6)
[30 Jan 2009 15:07] Bugs System
Pushed into 5.1.32 (revid:luis.soares@sun.com-20090129165946-d6jnnfqfokuzr09y) (version source revid:sven@mysql.com-20080714140603-dry2qafaj7msno6j) (merge vers: 5.1.28) (pib:6)
[30 Jan 2009 17:17] Paul DuBois
Test case changes. No changelog entry needed.
[17 Feb 2009 14:59] Bugs System
Pushed into 5.1.32-ndb-6.3.23 (revid:tomas.ulin@sun.com-20090217131017-6u8qz1edkjfiobef) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)
[17 Feb 2009 16:47] Bugs System
Pushed into 5.1.32-ndb-6.4.3 (revid:tomas.ulin@sun.com-20090217134419-5ha6xg4dpedrbmau) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)
[17 Feb 2009 18:23] Bugs System
Pushed into 5.1.32-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090217134216-5699eq74ws4oxa0j) (version source revid:tomas.ulin@sun.com-20090201210519-vehobc4sy3g9s38e) (merge vers: 5.1.32-ndb-6.2.17) (pib:6)