Bug #58426 | Crashing tests not failing as they are supposed to on Solaris 10 debug | ||
---|---|---|---|
Submitted: | 23 Nov 2010 19:25 | Modified: | 20 Dec 2010 3:12 |
Reporter: | Bjørn Munch | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Tests | Severity: | S3 (Non-critical) |
Version: | 5.1+ | OS: | Solaris (S10, debug) |
Assigned to: | Tor Didriksen | CPU Architecture: | Any |
[23 Nov 2010 19:25]
Bjørn Munch
[23 Nov 2010 23:34]
Luis Soares
This is not specific to binlog_index test case. This test case is "unfortunate" to make heavy use of "crash the server" test pattern. For instance, the same behavior is observed in another test case (that crashes the server), main.crash_commit_before: main.crash_commit_before [ fail ] Test ended at 2010-11-24 00:21:40 CURRENT_TEST: main.crash_commit_before mysqltest: At line 22: query 'COMMIT' succeeded - should have failed with errno 2013... The result from queries just before the failure was: CREATE TABLE t1(a int) engine=innodb; START TRANSACTION; insert into t1 values(9); SET SESSION debug="d,crash_commit_before"; - saving '/export/home/tmp/ls229627/tests/mysql-5.5.8-solaris10-x86_64/mysql-test/var/log/main.crash_commit_before/' to '/export/home/tmp/ls229627/tests/mysql-5.5.8-solaris10-x86_64/mysql-test/var/log/main.crash_commit_before/' ------------------------------------------------------------ The servers were restarted 0 times Spent 0.000 of 8 seconds executing testcases Completed: Failed 1/1 tests, 0.00% were successful. Failing test(s): main.crash_commit_before How-to repeat: 1. Take a debug build for solaris10 2. run the test case main.crash_commit_before several times 3. Observe it failing sometimes with timeout (this is BUG#58424) and sometimes with the server not crashing where it was supposed to (this is BUG#58426 - this one).
[23 Nov 2010 23:57]
Luis Soares
This may have to do with with the recent change on how crashes are injected, DBUG_ABORT was replaced by DBUG_SUICIDE in BUG#52002. DBUG_SUICIDE is implemented using SIGKILL + pause(): #define DBUG_SUICIDE() (_db_flush_(), kill(getpid(), SIGKILL), pause()) Also, the behavior described in this bug report seems to match closely on this review comments, which would happen when DBUG_SUICIDE was implemented as without the pause() (which was one of the original proposals): http://lists.mysql.com/commits/119761 This makes me think that even with pause(), there is still some code that runs before the process actually dies? or pause() wakes earlier than expected, due to some other signal!? As a consequence, crashing tests tend to exhibit this random behavior on solaris10.
[26 Nov 2010 3:58]
Anitha Gopi
Tor, Assigning to you since this may have been introduced by your fix for Bug#52002
[26 Nov 2010 10:24]
Tor Didriksen
with this definition of DBUG_SUICIDE #define DBUG_SUICIDE() (_db_flush_(), fprintf(stderr, "SIGKILL myself\n"), fflush(stderr), kill(getpid(), SIGKILL), pause(), fprintf(stderr, "Continue after SIGKILL\n"), fflush(stderr)) we can actually get this server log: SIGKILL myself Continue after SIGKILL So we have received some other signal (alarm maybe?) that interrupted the pause(). Simple fix: just loop forever waiting for SIGKILL to be delivered.
[26 Nov 2010 10:41]
Tor Didriksen
Interesting, with this implementation of DBUG_SUICIDE do { fprintf(stderr, "SIGKILL myself\n"); fflush(stderr); kill(getpid(), SIGKILL); pause(); } while(1); I sometimes see several hundred lines of SIGKILL myself in the server log.
[26 Nov 2010 14:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/125156 3521 Tor Didriksen 2010-11-26 Bug #58426 Crashing tests not failing as they are supposed to on Solaris 10 debug On this platform we either - get lots of other signals - or maybe we lose the KILL signal Solution: loop with { kill; pause; } until process dies. @ include/my_dbug.h Let DBUG_SUICE loop forever until the KILL signal is delivered, and process dies.
[2 Dec 2010 11:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/125801 3512 Tor Didriksen 2010-11-26 Bug #58426 Crashing tests not failing as they are supposed to on Solaris 10 debug On this platform we either - get lots of other signals - or maybe we lose the KILL signal Solution: loop with { kill; pause; } until process dies. @ include/my_dbug.h Let DBUG_SUICE loop forever until the KILL signal is delivered, and process dies.
[8 Dec 2010 8:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/126292 3513 Tor Didriksen 2010-11-26 Bug #58426 Crashing tests not failing as they are supposed to on Solaris 10 debug On this platform we either - get lots of other signals - or maybe we lose the KILL signal Solution: loop with { kill; pause; } until process dies. @ include/my_dbug.h Let DBUG_SUICE loop forever until the KILL signal is delivered, and process dies.
[13 Dec 2010 12:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/126624 3518 Tor Didriksen 2010-12-13 Bug #58426 Crashing tests not failing as they are supposed to on Solaris 10 debug On this platform we seem to get lots of other signals while waiting for SIGKILL to be delivered. Solution: use sigsuspend(<all signals blocked>) @ dbug/dbug.c New function _db_suicide_() which does kill(myself, -9) and then waits forever. @ include/my_dbug.h Let DBUG_SUICE wait forever until the KILL signal is delivered, and process dies.
[13 Dec 2010 16:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/126681 3519 Tor Didriksen 2010-12-13 Bug #58426 Crashing tests not failing as they are supposed to on Solaris 10 debug On this platform we seem to get lots of other signals while waiting for SIGKILL to be delivered. Solution: use sigsuspend(<all signals blocked>) @ dbug/dbug.c New function _db_suicide_() which does kill(myself, -9) and then waits forever. @ include/my_dbug.h Let DBUG_SUICE wait forever until the KILL signal is delivered, and process dies.
[13 Dec 2010 16:18]
Tor Didriksen
pushed to 5.1-bugteam, 5.5-bugteam, trunk-bugfixing.
[17 Dec 2010 12:48]
Bugs System
Pushed into mysql-5.1 5.1.55 (revid:georgi.kodinov@oracle.com-20101217124435-9imm43geck5u55qw) (version source revid:tor.didriksen@oracle.com-20101213161301-81lprlbune7r98dl) (merge vers: 5.1.55) (pib:24)
[17 Dec 2010 12:52]
Bugs System
Pushed into mysql-5.5 5.5.9 (revid:georgi.kodinov@oracle.com-20101217124733-p1ivu6higouawv8l) (version source revid:sergey.glukhov@oracle.com-20101214104600-v0ndu721rf61nbml) (merge vers: 5.5.8) (pib:24)
[17 Dec 2010 12:56]
Bugs System
Pushed into mysql-trunk 5.6.1 (revid:georgi.kodinov@oracle.com-20101217125013-y8pb3az32rtbplc9) (version source revid:sergey.glukhov@oracle.com-20101214111513-9j68fg7s48a986ng) (merge vers: 5.6.1) (pib:24)
[20 Dec 2010 3:12]
Paul DuBois
Changes to test suite. No changelog entry needed.