Bug #35148 Error '4009 Cluster Failure' in various tests on various platforms
Submitted: 7 Mar 2008 16:15 Modified: 6 Jul 2009 11:49
Reporter: Joerg Bruehe Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: 5.0.58
Triage: Triaged: D3 (Medium)

[7 Mar 2008 16:15] Joerg Bruehe
Description:
This is (from the symptoms) related to bugs 32735 and 32737 (maybe also 32675), but it (now ?) occurs on various platforms:
Whereas the bulgs listed above were restricted to platforms on which we frequently had issues with cluster (OS X and AIX), we now have this sypmtom also on Linux and Solaris.

This report is about failures in the build of release 5.0.58

First, the two failures with the largest spread of platforms:

=====
loaddata_autocom_ndb           [ fail ]

mysqltest: In included file "./include/loaddata_autocom.inc": At line 10: query 'create table t1 (a text, b text)' failed: 157: Could not connect to storage engine

The result from queries just before the failure was:
SET SESSION STORAGE_ENGINE = ndbcluster;
drop table if exists t1;
create table t1 (a text, b text);
...
Warnings from just before the error:
Error 1296 Got error 4009 'Cluster Failure' from NDB

Stopping All Servers
===---===---===---===---===---===--- OCCURRED:
Linux SLES9 (RPM),  i686: debug + ps
Linux SLES10 (RPM), x86_64: ps
Linux RHEL4 (RPM),  x86_64: debug
Solaris 8, x86: ps
OS X 10.4, PPC-64: ps
OS X 10.3, PPC-32: debug
Linux, i686 (ICC): normal + ps
=====

=====
ndb_restore_print              [ fail ]

mysqltest: At line NNN: query 'create table t1
(pk int key
,a1 BIT(1), a2 BIT(5), a3 BIT(33), a4 BIT(63), a5 BIT(64)
,b1 TINYINT, b2 TINYINT UNSIGNED
,c1 SMALLINT, c2 SMALLINT UNSIGNED
,d1 INT, d2 INT UNSIGNED
,e1 BIGINT, e2 BIGINT UNSIGNED
,f1 CHAR(1) BINARY, f2 CHAR(32) BINARY, f3 CHAR(255) BINARY
,g1 VARCHAR(32) BINARY, g2 VARCHAR(255) BINARY, g3 VARCHAR(1000) BINARY
,h1 BINARY(1), h2 BINARY(8), h3 BINARY(255)
,i1 VARBINARY(32), i2 VARBINARY(255), i3 VARBINARY(1000)
) engine myisam' failed: 157: Could not connect to storage engine

The result from queries just before the failure was:
use test;
drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9,t10;
create table t1
(pk int key
,a1 BIT(1), a2 BIT(5), a3 BIT(33), a4 BIT(63), a5 BIT(64)
,b1 TINYINT, b2 TINYINT UNSIGNED
,c1 SMALLINT, c2 SMALLINT UNSIGNED
,d1 INT, d2 INT UNSIGNED
,e1 BIGINT, e2 BIGINT UNSIGNED
,f1 CHAR(1) BINARY, f2 CHAR(32) BINARY, f3 CHAR(255) BINARY
,g1 VARCHAR(32) BINARY, g2 VARCHAR(255) BINARY, g3 VARCHAR(1000) BINARY
,h1 BINARY(1), h2 BINARY(8), h3 BINARY(255)
,i1 VARBINARY(32), i2 VARBINARY(255), i3 VARBINARY(1000)
) engine myisam;
...
Warnings from just before the error:
Error 1296 Got error 4009 'Cluster Failure' from NDB

Stopping All Servers
===---===---===---===---===---===--- OCCURRED:
OS X 10.3, PPC-32: normal
OS X 10.4, PPC-32: debug
OS X 10.4, x86: ps
OS X, 10.5, x86: ps
OS X, 10.5, x86_64: debug + ps
Linux, i686 (ICC): ps
Linux, x86: ps
Linux, x86_64 (ICC): ps
Linux, x86_64: normal
Linux, s390: normal
Solaris 10, Sparc (both 32 + 64): normal
Solaris 9, x86: normal
=====

Similar symptom, on fewer platforms, abbreviated to 
test name  +  list of platforms / test mode:

=====
ndb_alter_table                [ fail ]

OS X, 10.5, x86: ps
=====

=====
ndb_autodiscover3              [ fail ]

Linux SLES9 (RPM),  i686: debug
Linux RHEL4 (RPM),  x86_64: ps
Linux, i686 (ICC): debug
=====

=====
ndb_autodiscover3              [ fail ]

Linux RHEL4 (RPM),  x86_64: debug
OS X, 10.5, x86: normal + ps
=====

=====
ndb_autodiscover3              [ fail ]

OS X, 10.5, x86: debug
=====

=====
ndb_autodiscover               [ fail ]

OS X, 10.5, x86: normal + ps
=====

=====
ndb_basic                      [ fail ]

Linux SLES9 (RPM),  i686: debug
Linux RHEL4 (RPM),  x86_64: debug
OS X, 10.5, x86: ps
=====

=====
ndb_bitfield                   [ fail ]

Linux RHEL4 (RPM),  x86_64: debug
OS X, 10.5, x86: ps
=====

=====
ndb_blob                       [ fail ]

Linux RHEL4 (RPM),  x86_64: debug
OS X, 10.5, x86: ps
=====

=====
ndb_cache                      [ fail ]

Linux, x86: debug
=====

=====
ndb_charset                    [ fail ]

OS X 10.4, PPC-64: ps
=====

=====
ps_7ndb                        [ fail ]

Linux SLES9 (RPM),  i686: debug + ps
Linux RHEL4 (RPM),  x86_64: debug + normal + ps
Linux, i686 (ICC): ps
Linux, x86: ps
Linux, IA64 (ICC): debug
=====

=====
strict_autoinc_5ndb            [ fail ]

Linux SLES10 (RPM), i686 : debug
Linux SLES10 (RPM), x86_64 : debug
Linux RHEL4 (RPM),  x86_64: debug + normal
Linux, i686 (ICC): normal
Linux, x86: normal
Linux, IA64 (ICC): ps
OS X, 10.5, x86_64: debug
HP-UX 11.11, HP-PA: normal
=====

How to repeat:
Run the test suite ...
[23 May 2008 14:06] Joerg Bruehe
Bug also occurs in the 5.0.62 release build.

Affected tests with identical symptoms to those already classified here
(most likely, from 5.0.58) are
   loaddata_autocom_ndb
   ndb_alter_table
   ndb_autodiscover3
   ndb_autodiscover
   ndb_basic
   ndb_bitfield
   ndb_cache
   ps_7ndb
   strict_autoinc_5ndb
[6 Jul 2009 9:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77988

2793 Georgi Kodinov	2009-07-06
      Bug#38315 and Bug#35148: disabled sporadically failing NDB tests
[6 Jul 2009 9:04] Georgi Kodinov
moving back to verified. Just a test case disablement
[6 Jul 2009 10:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78003

2796 Georgi Kodinov	2009-07-06
      Bug #35148: disabled testcase loaddata_autocom_ndb
[6 Jul 2009 10:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78004

2797 Georgi Kodinov	2009-07-06
      Bug #35148: ndb_autodiscover3 disabled
[6 Jul 2009 11:49] Georgi Kodinov
back to verified : test case disablement.
[7 Jul 2009 7:52] Bugs System
Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source revid:joro@sun.com-20090706102719-2g256ax0ndg9jrlh) (merge vers: 5.0.84) (pib:11)
[8 Jul 2009 13:30] Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:joro@sun.com-20090706103609-62l08z5dpwqy29wt) (merge vers: 5.1.37) (pib:11)
[9 Jul 2009 7:35] Bugs System
Pushed into 5.0.84 (revid:joro@sun.com-20090707074938-ksah1ibn0vs92cem) (version source revid:joro@sun.com-20090706102719-2g256ax0ndg9jrlh) (merge vers: 5.0.84) (pib:11)
[9 Jul 2009 7:37] Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:joro@sun.com-20090706103609-62l08z5dpwqy29wt) (merge vers: 5.1.37) (pib:11)
[10 Jul 2009 11:20] Bugs System
Pushed into 5.4.4-alpha (revid:anozdrin@bk-internal.mysql.com-20090710111017-bnh2cau84ug1hvei) (version source revid:joro@sun.com-20090706104514-4dj1xron15x44x6h) (merge vers: 5.4.4-alpha) (pib:11)
[26 Aug 2009 13:46] Bugs System
Pushed into 5.1.37-ndb-7.0.8 (revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[26 Aug 2009 13:46] Bugs System
Pushed into 5.1.37-ndb-6.3.27 (revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (version source revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (merge vers: 5.1.37-ndb-6.3.27) (pib:11)
[26 Aug 2009 13:48] Bugs System
Pushed into 5.1.37-ndb-6.2.19 (revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (version source revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (merge vers: 5.1.37-ndb-6.2.19) (pib:11)
[27 Aug 2009 16:32] Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:magnus.blaudd@sun.com-20090827163030-6o3kk6r2oua159hr) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[26 Jul 2010 12:19] Sven Sandberg
I have seen this a few times recently when running the entire suite on my machine. E.g.:

rpl_ndb.rpl_ndb_circular_2ch             [ pass ]  22837

MTR's internal check of the test case 'rpl_ndb.rpl_ndb_circular_2ch' failed.
This means that the test case does not preserve the state that existed
before the test case was executed.  Most likely the test case did not
do a proper clean-up.
This is the diff of the states of the servers before and after the
test case was executed:
mysqltest: Logging to '/home/sven/bzr/b49978-cleanup_rpl_tests/5.1-bugteam/mysql-test/var/tmp/check-mysqld_1_1.log'.
mysqltest: Results saved in '/home/sven/bzr/b49978-cleanup_rpl_tests/5.1-bugteam/mysql-test/var/tmp/check-mysqld_1_1.result'.
mysqltest: Connecting to server localhost:12603 (socket /home/sven/bzr/b49978-cleanup_rpl_tests/5.1-bugteam/mysql-test/var/tmp/mysqld.1.1.sock) as 'root', connection 'default', attempt 0 ...
mysqltest: ... Connected.
mysqltest: Start processing test commands from './include/check-testcase.test' ...
mysqltest: ... Done processing test commands.
--- /home/sven/bzr/b49978-cleanup_rpl_tests/5.1-bugteam/mysql-test/var/tmp/check-mysqld_1_1.result	2010-07-26 14:41:43.000000000 +0300
+++ /home/sven/bzr/b49978-cleanup_rpl_tests/5.1-bugteam/mysql-test/var/tmp/check-mysqld_1_1.reject	2010-07-26 14:42:07.000000000 +0300
@@ -513,7 +513,3 @@
 mysql.time_zone_transition	3895294076
 mysql.time_zone_transition_type	168184411
 mysql.user	3850358533
-Warnings:
-Error	1296	Got error 4009 'Cluster Failure' from NDB
-Error	1296	Got error 4009 'Cluster Failure' from NDB
-Error	1296	Got error 4009 'Cluster Failure' from NDB
                                                                      

mysqltest: Result length mismatch

not ok