Bug #20843 | tests fails randomly with assertion in completeClusterFailed | ||
---|---|---|---|
Submitted: | 4 Jul 2006 8:40 | Modified: | 6 Jul 2006 9:02 |
Reporter: | Tomas Ulin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 5.1 | OS: | |
Assigned to: | Tomas Ulin | CPU Architecture: | Any |
[4 Jul 2006 8:40]
Tomas Ulin
[4 Jul 2006 9:23]
Kristian Nielsen
This one is difficult to repeat, but not impossible. It occurs in Pushbuild, quite often in the Valgrind build, but also occasionally on other hosts. I was able to repeat by running the test in a loop in Valgrind: (cd mysql-test && for i in `seq 1 100`; do echo XXX $i XXX; MTR_BUILD_THREAD=4 perl mysql-test-run.pl --tmpdir=/dev/shm/t4 --vardir=/dev/shm/v4 --timer --ps-protocol --mysqld=--binlog-format=row --valgrind-all ndb_autodiscover3 | tee /tmp/1; fgrep -q '[ fail ]' /tmp/1 && exit 1; done) (it failed on the 9th run). I do not think the problem is caused by Valgrind, just that it happens more often in Valgrind, perhaps due to different thread scheduling. The same crash is seen on most/all hosts in pushbuild, just much less frequently. From the master1.err log: 060703 17:33:15 [ERROR] /usr/local/mysql/mysql-5.1-pristine/sql/mysqld: Incorrect information in file: './test/t2.frm' 060703 17:33:16 [Note] NDB Binlog: CREATE TABLE Event: REPL$test/t2 060703 17:33:16 [Note] NDB Binlog: logging ./test/t2 out of order bucket detected at cluster disconnect, data.gci: 27. tmp->m_gci: 6 mysqld: NdbEventOperationImpl.cpp:1634: void NdbEventBuffer::completeClusterFailed(): Assertion `false' failed. A stasck trace from Valgrind: ==10880== Thread 2: ==10880== Conditional jump or move depends on uninitialised value(s) ==10880== at 0x410264A: vfprintf (in /lib/tls/libc-2.3.6.so) ==10880== by 0x4100C99: buffered_vfprintf (in /lib/tls/libc-2.3.6.so) ==10880== by 0x4100F5D: vfprintf (in /lib/tls/libc-2.3.6.so) ==10880== by 0x4109D61: fprintf (in /lib/tls/libc-2.3.6.so) ==10880== by 0x840FD9E: print_stacktrace (stacktrace.c:158) ==10880== by 0x824D3BD: handle_segfault (mysqld.cc:2145) ==10880== by 0x4052657: (within /lib/tls/libpthread-2.3.6.so) ==10880== by 0x40EF06A: abort (in /lib/tls/libc-2.3.6.so) ==10880== by 0x40E6734: __assert_fail (in /lib/tls/libc-2.3.6.so) ==10880== by 0x86958F4: NdbEventBuffer::completeClusterFailed() (NdbEventOperationImpl.cpp:1634) ==10880== by 0x867844D: Ndb::report_node_failure_completed(unsigned) (Ndbif.cpp:264) ==10880== by 0x8678523: Ndb::statusMessage(void*, unsigned, bool, bool) (Ndbif.cpp:224) ==10880== by 0x868497D: TransporterFacade::ReportNodeFailureComplete(unsigned short) (TransporterFacade.cpp:834) ==10880== by 0x86CB55B: ClusterMgr::execNF_COMPLETEREP(unsigned const*) (ClusterMgr.cpp:393) ==10880== by 0x86CB700: ClusterMgr::reportNodeFailed(unsigned short) (ClusterMgr.cpp:474) ==10880== by 0x86CCAD7: ClusterMgr::reportDisconnected(unsigned short) (ClusterMgr.cpp:436)
[5 Jul 2006 13:46]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/8766
[5 Jul 2006 17:53]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/8793
[5 Jul 2006 21:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/8802