Bug #44799 | rpl_backup_shutdown test fails on PB1 in pb-valgrind platform with timeout | ||
---|---|---|---|
Submitted: | 11 May 2009 18:31 | Modified: | 12 Jun 2009 13:31 |
Reporter: | Hema Sridharan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | mysql-6.0-backup | OS: | Any |
Assigned to: | Luis Soares | CPU Architecture: | Any |
[11 May 2009 18:31]
Hema Sridharan
[11 May 2009 18:33]
Hema Sridharan
Please take a look at the valgrind error that is visible in pb-valgrind platform
Attachment: valgrind_error.txt (text/plain), 16.66 KiB.
[12 May 2009 9:26]
Sveta Smirnova
Thank you for the report. Verified as described: mysqltest: In included file "./include/wait_for_slave_param.inc": Result content mismatch ==27424== ==27424== ERROR SUMMARY: 3322732 errors from 100 contexts (suppressed: 5 from 1) ==27424== malloc/free: in use at exit: 2,104 bytes in 2 blocks. ==27424== malloc/free: 72,323 allocs, 72,321 frees, 165,784,892 bytes allocated. ==27424== For counts of detected errors, rerun with: -v ==27424== searching for pointers to 2 not-freed blocks. ==27424== checked 2,296,104 bytes. ==27424== ==27424== ==27424== 2,048 bytes in 1 blocks are still reachable in loss record 2 of 2 ==27424== at 0x4A05809: malloc (vg_replace_malloc.c:149) ==27424== by 0x4A05883: realloc (vg_replace_malloc.c:306) ==27424== by 0x3429E06D19: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so) ==27424== by 0x4C98C8F: my_thread_global_init (my_thr_init.c:148) ==27424== by 0x4C8FC4D: my_init (my_init.c:99) ==27424== by 0x417542: main (mysqltest.cc:7512) ==27424== ==27424== LEAK SUMMARY: ==27424== definitely lost: 0 bytes in 0 blocks. ==27424== possibly lost: 0 bytes in 0 blocks. ==27424== still reachable: 2,048 bytes in 1 blocks. ==27424== suppressed: 56 bytes in 1 blocks.
[14 May 2009 0:01]
Luis Soares
The mentioned failure (rpl_backup_shutdown) is due to timeout in valgrind build (test case hangs while waiting for the slave to stop when facing a RESTORE event). This seems to be causing some extra valgrind output. However, it is unclear why it is failing on valgrind build only. Aside from valgrind error/warnings the following is also found in PB output: --- /data0/pushbuild/pb1/pb/bzr_mysql-6.0-backup/245/mysql-6.0.11-alpha-pb245/mysql-test/suite/rpl/r/rpl_backup_shutdown.result 2009-05-12 22:04:53.000000000 +0300 +++ /data0/pushbuild/pb1/pb/bzr_mysql-6.0-backup/245/mysql-6.0.11-alpha-pb245/mysql-test/suite/rpl/r/rpl_backup_shutdown.reject 2009-05-13 10:20:34.000000000 +0300 @@ -116,31 +116,38 @@ # # Slave will be stopped as result of restore incident event # Connecting to slave... -# Check the slave status +**** ERROR: timeout after seconds while waiting for slave parameter Slave_SQL_Running = No **** +Message: Failed while waiting for slave SQL thread to stop +Current connection is 'slave' +Note: the following output may have changed since the failure was detected + (...) Going through the test case (rpl_backup_shutdown), the test seems to hang at this point (at least by cross-referencing from output from failure above): 76 --echo # Slave will be stopped as result of restore incident event 77 --echo # Connecting to slave... 78 connection slave; 79 source include/wait_for_slave_sql_to_stop.inc; Placing an exit; just after line 79 and running the test case locally with valgrind enabled, seems to corroborate the suspicion.
[14 May 2009 0:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/73986 2810 Luis Soares 2009-05-14 BUG#44799: Some rpl tests fails in pushbuild because of bad usage of dbug.c The motivation for this bug report was due to fact that rpl_backup_shutdown was failing on PB in valgrind run. It happens that the test case actually times out causing extra valgrind output. This test case explicitly stops and restarts the master server. This is fine in all PB runs except one, the one in the valgrind host. Since valgrind is makes execution slower, it may happen that the master takes too long to restart, making the slave reconnection routine to give up before the master is ready again. In the case of an MTR run the reconnect period seems small enough for this to happen, ultimately causing the test case to timeout. This patch addresses this, by setting the --master-retry-count to a large value on the slave. @ mysql-test/suite/rpl/t/rpl_backup_shutdown-slave.opt Added slave option to reconfigure master-retry-count.
[14 May 2009 15:45]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/74081 2810 Luis Soares 2009-05-14 BUG#44799: rpl_backup_shutdown test fails on PB1 in pb-valgrind platform with timeout The motivation for this bug report was due to fact that rpl_backup_shutdown was failing on PB in valgrind run. It happens that the test case actually times out causing extra valgrind output. This test case explicitly stops and restarts the master server. This is fine in all PB runs except one, the one in the valgrind host. Since valgrind makes execution slower, it may happen that the master takes too long to restart, making the slave reconnection routine to give up before the master is ready again. In the case of an MTR run the reconnect period seems small enough for this to happen, ultimately causing the test case to timeout. This patch addresses this, by setting the --master-retry-count to a large value on the slave.
[14 May 2009 15:46]
Luis Soares
This last commit is just to fix bug synopsis.
[14 May 2009 15:55]
Luis Soares
Pushed to 6.0-backup.
[9 Jun 2009 6:41]
Jørgen Løland
Merged to Azalea June 2
[12 Jun 2009 13:31]
Jon Stephens
Does not appear to be any user-facing changes associated with this fix, so nothing to document for the Manual; closed without taking further action.