Bug #37714 | rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout | ||
---|---|---|---|
Submitted: | 28 Jun 2008 10:52 | Modified: | 24 Jun 2009 11:09 |
Reporter: | Sven Sandberg | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Tests: Replication | Severity: | S7 (Test Cases) |
Version: | 6.0 | OS: | Any |
Assigned to: | Andrei Elkin | CPU Architecture: | Any |
Tags: | disabled, pushbuild, rpl.rpl_heartbeat, sporadic, test failure, timeout |
[28 Jun 2008 10:52]
Sven Sandberg
[28 Jun 2008 11:11]
Sven Sandberg
WHERE: 6.0/azundris on Sat Jun 21 09:16:21 2008/'powermacg5' -max/n_mix URL: https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0&order=11 -- WHERE: 6.0-rpl/skozlov on Mon Jun 23 21:26:22 2008/'vm-win2003-64-b' Win64 VS2005 -max-nt/n_mix URL: https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-rpl&order=17
[15 Jul 2008 7:19]
Alexander Nozdrin
Test case has been disabled because it fails too often.
[12 Dec 2008 17:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/61533 2807 Andrei Elkin 2008-12-12 Bug #37714 rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout The reason of the failure on windows platform was not detected. Still, a piece of heartbeat code had a flaw fixed with Bug #39077. It's probable that the patch for the latter bug, which is going to be pushed to 6.0 main, can help with the current. Attempted to fix with patch for Bug #39077.
[12 Dec 2008 21:34]
Andrei Elkin
the fixes for possibly related bug#39077 are pushed in order to monitor passing of the test. The status is set to in-progress till further openings of confirmation the bug is really over.
[19 Dec 2008 9:36]
Sven Sandberg
Setting to "Can't repeat" since it has not happened since 2008-07-04. Please re-open the bug if it happens again.
[19 Dec 2008 17:35]
Sven Sandberg
xref: http://tinyurl.com/3q7nr9
[20 Jan 2009 18:57]
Bugs System
Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:azundris@mysql.com-20081230114916-c290n83z25wkt6e4) (merge vers: 6.0.9-alpha) (pib:6)
[30 Jan 2009 14:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/64652 2988 Andrei Elkin 2009-01-30 Bug #37714 rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout Finally there happened to be the timeout again: https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-bugteam&order=45... The test is conditionally disabled not to run on windows. Todo: remove +-- source include/not_windows.inc upon the case's been fixed.
[4 Feb 2009 11:15]
Bugs System
Pushed into 6.0.10-alpha (revid:kostja@sun.com-20090204104420-mw1i2u9lum4bxjo6) (version source revid:joro@sun.com-20090131161307-ydhtowoaf0m3nzu0) (merge vers: 6.0.10-alpha) (pib:6)
[5 Feb 2009 13:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/65335 3027 Andrei Elkin 2009-02-05 mysql-test/suite/rpl/t/rpl_heartbeat.test is let to run on windows for watching over bug#37714 show-up after mtr2 has been pushed; it might be that the former mtr contributed to the bug issue
[6 Feb 2009 21:25]
Andrei Elkin
Setting it in-progress to gather regression evidence that pb can supply. If the timeout failure won't show up then we could relate it to the old mtr.
[14 Feb 2009 13:00]
Bugs System
Pushed into 6.0.10-alpha (revid:matthias.leich@sun.com-20090212211028-y72faag15q3z3szy) (version source revid:alexey.kopytov@sun.com-20090206100220-tkvd9v83791i895x) (merge vers: 6.0.10-alpha) (pib:6)
[23 Feb 2009 13:22]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/67188 3075 Andrei Elkin 2009-02-23 Bug #37714 rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout Logs on PB show that the IO thread was down by the clean-up (drop table t1) of the test. A propable reason for IO thread to stop is a small value of slave_net_timeout - chosen as tradeoff betweeen a need to test counting of heartbeats and the test execution time. On a slow env it can be that the timeout elapses first before any heartbeat got arrived. Fixed with performing the clean-up separately by the master and the slave.
[23 Feb 2009 18:54]
Andrei Elkin
Alfranio, I think one of you with Luis needs substitution by Serge who involved into rpl heartbeat testing. This patch must be of his interest, not least he spotted the test failure last time. I hope you're okay with giving him a chance :-) De nada, Andrei.
[23 Feb 2009 18:56]
Andrei Elkin
There are two patches committed, still in-progress till the second patch proves correlation of small value slave net timeout with the failure. So far I have been watching over the test passage.
[24 Feb 2009 15:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/67394 3075 Andrei Elkin 2009-02-24 only comments regarding to bug#37714. The push is to make pb executing rpl_heartbeat
[18 Mar 2009 13:17]
Bugs System
Pushed into 6.0.11-alpha (revid:joro@sun.com-20090318122208-1b5kvg6zeb4hxwp9) (version source revid:azundris@mysql.com-20090224072212-51w0xg6doju2drup) (merge vers: 6.0.10-alpha) (pib:6)
[3 Apr 2009 16:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/71335 3177 Andrei Elkin 2009-04-03 bug#37714 debug print out for the test is added
[3 Apr 2009 16:27]
Andrei Elkin
Still in-progress, a debug push is about to be done.
[6 May 2009 14:09]
Bugs System
Pushed into 6.0.12-alpha (revid:svoj@sun.com-20090506125450-yokcmvqf2g7jhujq) (version source revid:aelkin@mysql.com-20090403162450-66ih5occv33rsc6a) (merge vers: 6.0.11-alpha) (pib:6)
[3 Jun 2009 15:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/75539 2858 Andrei Elkin 2009-06-03 Bug #37714 rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout The reason of the bug is a feature of pthread_cond_timedwait() having a time window in between of the timer elapsed that wakes up the thread and the thread re-acquired the mutex. There could be signals sent to the dump thread at times of the interval so that the dump thread was not aware of updating of the binlog and continued to stay in the loop. Fixed by augmenting MYSQL_BIN_LOG class with a counter what is checked prior and after the wake-up to catch the fact of the binlog got updated.
[3 Jun 2009 15:48]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/75540 2859 Andrei Elkin 2009-06-03 Bug #37714 rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout cleaning the test out of a debug print.
[8 Jun 2009 17:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/75868 2860 Andrei Elkin 2009-06-08 Bug #37714 rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout Restroring the pre-debug push aelkin@mysql.com-20090223133029-31b45i2aw9uaompa values of slave net timeout and hb to reduce the test pass time as twice.
[15 Jun 2009 14:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76283 2864 Andrei Elkin 2009-06-15 Bug #37714 rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout The reason of the bug is a feature of pthread_cond_timedwait() having a time window in between of the timer elapsed that wakes up the thread and the thread re-acquired the mutex. There could be signals sent to the dump thread at times of the interval so that the dump thread was not aware of updating of the binlog and continued to stay in the loop. Fixed by augmenting MYSQL_BIN_LOG class with a counter that is checked before and after the wake-up to catch the fact of the binlog got updated.
[16 Jun 2009 12:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76388 2866 Andrei Elkin 2009-06-16 Bug #37714 rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout rpl_backup_multi revealed the assert DBUG_ASSERT(ret == 0 && signal_cnt != mysql_bin_log.signal_cnt || thd->killed) does not hold in a case of multiple dump threads. A waiting for binlog update thread can catch a broad-cast signal without the binlog having actually refreshed. The assert is removed.
[19 Jun 2009 7:54]
Bugs System
Pushed into 5.4.4-alpha (revid:zhenxing.he@sun.com-20090619074435-4mlfkqqol4nzpq10) (version source revid:zhenxing.he@sun.com-20090619074435-4mlfkqqol4nzpq10) (merge vers: 5.4.4-alpha) (pib:11)
[24 Jun 2009 11:09]
Jon Stephens
Test failure only, no user-facing changes to document. Closed.
[27 Oct 2009 9:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/88270 3137 He Zhenxing 2009-10-27 Backport Bug #37714 rpl.rpl_heartbeat fails sporadically in pushbuild due to timeout rpl_backup_multi revealed the assert DBUG_ASSERT(ret == 0 && signal_cnt != mysql_bin_log.signal_cnt || thd->killed) does not hold in a case of multiple dump threads. A waiting for binlog update thread can catch a broad-cast signal without the binlog having actually refreshed The assert is removed. @ sql/sql_repl.cc assert does not hold in a case of multiple dump threads. A waiting for binlog update thread can catch a broad-cast signal without the binlog having actually refreshed.
[23 Dec 2009 11:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/95491 3074 Andrei Elkin 2009-12-23 Bug #49802: backport Bug #37714 rpl.rpl_heartbeat to telco fixed with backporting two patches of bug@37714