Bug #35319 | Almost certain crash(es) of mysqld with the "rpl*" tests from mysql-test-run.pl | ||
---|---|---|---|
Submitted: | 16 Mar 2008 12:34 | Modified: | 30 Mar 2010 6:16 |
Reporter: | Axel Luttgens | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | Tests: Replication | Severity: | S7 (Test Cases) |
Version: | 5.0.51a + 5.1 | OS: | MacOS (10.4 + 10.5, PPC) |
Assigned to: | Daogang Qu | CPU Architecture: | Any |
[16 Mar 2008 12:34]
Axel Luttgens
[16 Mar 2008 12:36]
Axel Luttgens
Reports for the 32-bit distribution
Attachment: Distribution32.txt (text/plain), 105.68 KiB.
[16 Mar 2008 12:36]
Axel Luttgens
Reports for the 64-bit distribution
Attachment: Distribution64.txt (text/plain), 16.61 KiB.
[16 Mar 2008 12:38]
Axel Luttgens
Reports for a 32-bit "debug" compiled binary
Attachment: DebugBinaries.txt (text/plain), 29.61 KiB.
[24 Mar 2008 23:49]
Sveta Smirnova
Thank you for the report. I can not repeat described behavior. My guess you can catch some system port allocation conflict. Please check system logs located at /var/log (system.log, netinfo.log) in case if they contain related information.
[25 Apr 2008 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[28 Apr 2008 13:00]
Axel Luttgens
[25 Mar 0:49] Sveta Smirnova wrote: > Thank you for the report. > > I can not repeat described behavior. > > My guess you can catch some system port allocation conflict. Please check > system logs located at /var/log (system.log, netinfo.log) in case if they > contain related information. Please agree my apologies: I didn't figure how quickly time had flown. Here, I still can reproduce the problem rather consistently on various boxes. But there's a point I didn't immediately notice: it looks like one needs some "luck" to observe the crashes. So, I may sometimes need to execute a command like this one: ./mysql-test-run.pl --force --skip-ndb --do-test=rpl several times (10-15 times?) before getting a crash; thereafter, crashes will occur almost surely for several subsequent runs. But those crashes may also occur immediately, even on a freshly rebooted machine... As far as the system logs are concerned, I never could find any relevant info. I even tried today to increase syslogd's verbosity (in both system.log and asl.log), in case something could be emitted at a DEBUG level only: no more info to be found. That is, other parts of the system don't seem to complain because of unsuspectedly unavailable resources, or the like. In fact, the problem seems to be strictly internal to mysqld, possibly at cleanup/quit time. For example, with the i386 code, the crashed thread seems to always show following pattern: Thread 2 Crashed: 0 libSystem.B.dylib 0x92f255ba __pthread_kill + 10 1 mysqld 0x00170229 write_core + 43 2 mysqld 0x00079714 handle_segfault + 656 3 libSystem.B.dylib 0x92eab5eb _sigtramp + 43 4 ??? 0xffffffff 0 + 4294967295 5 libSystem.B.dylib 0x92e44aed free + 106 6 mysqld 0x0016d9ba mysql_close_free_options + 136 7 mysqld 0x0016f5b9 mysql_close + 133 8 mysqld 0x00165618 handle_slave_io + 4664 9 libSystem.B.dylib 0x92e6fc55 _pthread_start + 321 10 libSystem.B.dylib 0x92e6fb12 thread_start + 34 But I definitely am not a debug wizard... Now, I've reported the problem as "critical", because such seemingly self-inflicted crashes never are good news for a daemon... Thanks for your follow-up, Axel
[8 Jul 2008 18:39]
Joerg Bruehe
I had this symptom in some tests of the 5.1.26-rc on our OS X machines with PowerPC (both 32 and 64 bit). Some examples: ===== rpl.rpl_extraColmaster_innodb 'mix' [ fail ] mysqltest: In included file "./extra/rpl_tests/rpl_extraMaster_Col.test": At line 728: query 'STOP SLAVE' fail ed: 2013: Lost connection to MySQL server during query The result from queries just before the failure was: < snip > Last_IO_Errno # Last_IO_Error # Last_SQL_Errno 1091 Last_SQL_Error Error 'Can't DROP 'c7'; check that column/key exists' on query. Default database: 'test'. Quer y: 'ALTER TABLE t14 DROP COLUMN c7' STOP SLAVE; RESET SLAVE; *** Drop DROP TABLE t14; DROP TABLE t14; RESET MASTER; START SLAVE; ************************************************* * - Alter Master adding columns at end of table * * Expect: Error 1054 * ************************************************* *** Create STOP SLAVE; More results from queries before failure can be found in /PATH/mysql-test/var/log/rpl_extraColmaster_innodb.log ===== Similar for log mode 'stmt', and for test "rpl.rpl_extraColmaster_myisam" in log modes 'mix', 'row', and 'stmt'. ===== rpl.rpl_slave_status 'mix' [ fail ] mysqltest: At line NNN: query 'stop slave' failed: 2013: Lost connection to MySQL server during query The result from queries just before the failure was: stop slave; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; reset master; reset slave; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; start slave; grant replication slave on *.* to rpl@127.0.0.1 identified by 'rpl'; stop slave; More results from queries before failure can be found in /PATH/mysql-test/var/log/rpl_slave_status.log rpl.rpl_slave_status 'mix' [ fail ] mysqltest: At line NNN: query 'stop slave' failed: 2013: Lost connection to MySQL server during query The result from queries just before the failure was: stop slave; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; reset master; reset slave; drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9; start slave; grant replication slave on *.* to rpl@127.0.0.1 identified by 'rpl'; stop slave; More results from queries before failure can be found in /PATH/mysql-test/var/log/rpl_slave_status.log ===== Similar in log mode 'stmt'. There are other tests that fail with this symptom. Also, this does not start with 5.1.26-rc, it has a longer tradition. Searching the bug DB led me to report #15671 which reports a similar issue for OS X using the x86 CPU, and claims it got fixed by July 2006. Scanning all isolated test failures in the build archive for crashes reported with "stop slave", I found a spike in versions 5.0.20 - 5.0.22 on such platforms, and this then really dropped off (July 2006). So it seems this problem got fixed for x86, but still exists for PPC. I will attach a file with a list of all such occurrences after July 2006 which got isolated, the test logs are available in the archive.
[8 Jul 2008 18:49]
Joerg Bruehe
List of isolated occurrences in the test suite runs
Attachment: list-stopslave-2013 (application/octet-stream, text), 7.43 KiB.
[18 Nov 2008 23:12]
Trudy Pelzer
Per Vlad, bug#40796 is probably related: "After searching a bit in the bugdb, heap corruption with rpl involved appears to be reported on MacOSX http://bugs.mysql.com/bug.php?id=35319 Stack looks similar to what I've seen in http://bugs.mysql.com/bug.php?id=40796"
[29 Mar 2010 2:50]
Daogang Qu
The bug can't be repeated. I run the './mysql-test-run.pl --force --skip-ndb --do-test=rpl' command base on newest version of mysql-5.0-bugteam and mysql-5.1-bugeam for more than twenty times. But the bug was not been reproduced on both trees.
[29 Mar 2010 2:53]
Daogang Qu
The above tests are done on slid07.Norway.Sun.COM with Mac OS X 10.5, PPC arch.