Bug #32737 Ndb is unreliable, fails with varying symptoms (platform-specific): optim. serv.
Submitted: 26 Nov 2007 19:59 Modified: 24 Aug 2010 7:40
Reporter: Joerg Bruehe Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.0 OS:Mac OS X (10.5)
Assigned to: CPU Architecture:Any
Triage: Triaged: D3 (Medium)

[26 Nov 2007 19:59] Joerg Bruehe
Description:
During a build using the 5.0.51 (community) sources,
I had several failures of NDB-related tests.
This bug report only covers testing the optimized server,
for the "debug"server see bug#32735 (same sources).

run using the normal protocol:

mysqlshow                      [ pass ]             78
mysqltest                      [ pass ]          24609
ndb_alter_table                [ fail ]  timeout

Stopping All Servers
Warning;  Aborted waiting on pid file: '/Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/mysql-test/var/run/master.pid' after 70 seconds
mysql-test-run: WARNING: Forcing kill of process 92488
Restoring snapshot of databases
Resuming Tests

ndb_alter_table2               [ pass ]           3198
ndb_autodiscover               [ fail ]  timeout

Stopping All Servers
Restoring snapshot of databases
Resuming Tests

ndb_autodiscover2              [ fail ]

mysqltest: At line 10: query 'select * from t9 order by a' failed: 1105: Failed to open 't9', error while unpacking from engine

The result from queries just before the failure was:
select * from t9 order by a;

More results from queries before failure can be found in /Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/mysql-test/var/log/ndb_autodiscover2.log

Warnings from just before the error:
Error 1146 Table 'test.t9' doesn't exist
Error 1296 Got error 4009 'Cluster Failure' from NDB

Stopping All Servers
Warning;  Aborted waiting on pid file: '/Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/mysql-test/var/run/master.pid' after 70 seconds
Warning;  Aborted waiting on pid file: '/Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/mysql-test/var/run/master1.pid' after 70 seconds
mysql-test-run: WARNING: Forcing kill of process 92617
mysql-test-run: WARNING: Forcing kill of process 92618
Restoring snapshot of databases
Resuming Tests

ndb_autodiscover3              [ fail ]

mysqltest: At line 29: query 'insert into t1 values (2)' failed with wrong errno 1015: 'Can't lock file (errno: 157)', instead of 1297...

The result from queries just before the failure was:
drop table if exists t1, t2;
create table t1 (a int key) engine=ndbcluster;
begin;
insert into t1 values (1);
insert into t1 values (2);
ERROR HY000: Can't lock file (errno: 157)

More results from queries before failure can be found in /Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/mysql-test/var/log/ndb_autodiscover3.log

Warnings from just before the error:
Error 1296 Got error 4009 'Cluster Failure' from NDB

Stopping All Servers
Warning;  Aborted waiting on pid file: '/Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/mysql-test/var/run/master.pid' after 70 seconds
Warning;  Aborted waiting on pid file: '/Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/mysql-test/var/run/master1.pid' after 70 seconds
mysql-test-run: WARNING: Forcing kill of process 92773
mysql-test-run: WARNING: Forcing kill of process 92774
Restoring snapshot of databases
Resuming Tests

ndb_backup_print               [ pass ]           2634
ndb_basic                      [ pass ]          34612
... (subsequent tests pass)

=====

Run using the PS protocol:

mysqlshow                      [ pass ]             85
mysqltest                      [ pass ]          24818
ndb_alter_table                [ fail ]  timeout

Stopping All Servers
Warning;  Aborted waiting on pid file: '/Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/mysql-test/var/run/master.pid' after 70 seconds
mysql-test-run: WARNING: Forcing kill of process 94627
Restoring snapshot of databases
Resuming Tests

ndb_alter_table2               [ pass ]           3197
ndb_autodiscover               [ pass ]          67000
ndb_autodiscover2              [ pass ]           3969
ndb_autodiscover3              [ fail ]

waitNodeState(STARTED, -1) timeout after 121 attemps
mysqltest: At line 44: command "$NDB_TOOLS_DIR/ndb_waiter --no-defaults -c $connect_str >> $NDB_TOOLS_OUTPUT" failed

The result from queries just before the failure was:
drop table if exists t1, t2;
create table t1 (a int key) engine=ndbcluster;
begin;
insert into t1 values (1);
insert into t1 values (2);
ERROR HY000: Got temporary error 4025 'Node failure caused abort of transaction' from ndbcluster
commit;
ERROR HY000: Got error 4350 'Transaction already aborted' from ndbcluster
drop table t1;
create table t2 (a int, b int, primary key(a,b)) engine=ndbcluster;
insert into t2 values (1,1),(2,1),(3,1),(4,1),(5,1),(6,1),(7,1),(8,1),(9,1),(10,1);
select * from t2 order by a limit 3;
a       b
1       1
2       1
3       1
exec of '/Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/bin/ndb_waiter --no-defaults -c "nodeid=6;host=localhost:12005" >> /Users/mysqldev/tmp-200711150808-5.0.51-30861/xserve-e/test/mysql-5.0.51-osx10.5-x86/mysql-test/var/log/ndb_testrun.log' failed, error: 256, status: 1, errno: 0

Now, the symptoms are
   157: Could not connect to storage engine
and
   Warnings from just before the error:                                          
   Error 1296 Got error 4009 'Cluster Failure' from NDB                                                       

This occurs for "ndb_backup_print" and (much later) "ndb_charset".

How to repeat:
Occurred while running the test suite using the optimized server.
[24 Jul 2008 17:00] Jonathan Perkin
More failures from 5.0.67, e.g (osx10.5-powerpc):

ndb_autodiscover3              [ fail ]

mysqltest: At line NNN: command "$NDB_MGM --no-defaults -e "all restart -i" >> $NDB_TOOLS_OUTPUT" failed                  
The result from queries just before the failure was:
drop table if exists t1, t2;
create table t1 (a int key) engine=ndbcluster;
begin;
insert into t1 values (1); insert into t1 values (2);
ERROR HY000: Got temporary error 4025 'Node failure caused abort of transaction' from ndbcluster
commit;
ERROR HY000: Got error 4350 'Transaction already aborted' from ndbcluster
drop table t1;
create table t2 (a int, b int, primary key(a,b)) engine=ndbcluster;
insert into t2 values (1,1),(2,1),(3,1),(4,1),(5,1),(6,1),(7,1),(8,1),(9,1),(10,1);
select * from t2 order by a limit 3;
a       b
1       1
2       1
3       1
exec of '/PATH/bin/ndb_mgm --no-defaults -e "all restart -i" >> /PATH/mysql-test/var/log/ndb_testrun.log' failed, error: 65280, status: 255, errno: 0

More results from queries before failure can be found in /PATH/mysql-test/var/log/ndb_autodiscover3.log