Bug #32675 NDB fails with "1296: Got error 157 'Unknown error code'"
Submitted: 23 Nov 2007 18:12
Reporter: Joerg Bruehe Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.0 OS:IBM AIX (64 bit)
Assigned to: CPU Architecture:Any
Tags: 5.0.51
Triage: Triaged: D1 (Critical)

[23 Nov 2007 18:12] Joerg Bruehe
Description:
All following test failures showed in a build of version 5.0.51 on AIX 5.2.
Typically, the log also contains notes about warnings, like these:

Warnings from just before the error:
Error 1296 Got error 4028 'Node failure caused abort of transaction' from NDB
Error 1296 Got error 4028 'Node failure caused abort of transaction' from NDB
Error 1296 Got error 4009 'Cluster Failure' from NDB

This here seems to be the most simple case:

=====
ndb_limit                      [ fail ]

mysqltest: At line NNN: query 'select count(*) from t2' failed: 1296: Got error 157 'Unknown error code' from ndbcluster

The result from queries just before the failure was:
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (
a bigint unsigned NOT NULL PRIMARY KEY,
b int unsigned not null,
c int unsigned
) engine=ndbcluster;
select count(*) from t2;

More results from queries before failure can be found in /PATH/mysql-test/var/log/ndb_limit.log

Warnings from just before the error:
Error 1296 Got error 270 'Transaction aborted due to node shutdown' from NDB
Error 1296 Got error 4028 'Node failure caused abort of transaction' from NDB
Error 1296 Got error 4009 'Cluster Failure' from NDB

Stopping All Servers
=====

Here, the complete list:

=====

ndb_charset                    [ fail ]

mysqltest: In included file "./suite/funcs_2/include/check_charset_ucs2.inc": At line 15: query 'ALTER TABLE test.t1 CHANGE
 a a CHAR(4) CHARACTER SET $cset COLLATE $coll' failed: 1296: Got error 157 'Unknown error code' from ndbcluster

=====

ndb_insert                     [ fail ]

mysqltest: At line NNN: query 'SELECT COUNT(*) FROM t1' failed: 1296: Got error 157 'Unknown error code' from ndbcluster

=====

ndb_insert                     [ fail ]

mysqltest: At line NNN: query 'SELECT COUNT(*) FROM t1' failed: 1296: Got error 157 'Unknown error code' from ndbcluster

=====

ndb_limit                      [ fail ]

mysqltest: At line NNN: query 'select count(*) from t2' failed: 1296: Got error 157 'Unknown error code' from ndbcluster

=====

ndb_loaddatalocal              [ fail ]

mysqltest: At line NNN: query 'select count(*) from t1' failed: 1296: Got error 157 'Unknown error code' from ndbcluster

=====

ndb_truncate                   [ fail ]

mysqltest: At line NNN: query 'select count(*) from t1' failed: 1296: Got error 157 'Unknown error code' from ndbcluster

=====

How to repeat:
Found by running the test suite in a build on AIX 5.2 (64 bit).
[4 Jan 2008 13:22] Joerg Bruehe
In 6.0.4-alpha, similar problems exist.

The first one shows up in this way:

=====
rpl_ndb.rpl_ndb_dd_advance     [ fail ]

=== SHOW MASTER STATUS ===
---- 1. ----
File    slave-bin.000002
Position        106
Binlog_Do_DB
Binlog_Ignore_DB
==========================

=== SHOW SLAVE STATUS ===
---- 1. ----
Slave_IO_State  Waiting for master to send event
Master_Host     127.0.0.1
Master_User     root
Master_Port     12010
Connect_Retry   1
Master_Log_File master-bin.000001
Read_Master_Log_Pos     17231
Relay_Log_File  slave-relay-bin.000002
Relay_Log_Pos   17002
Relay_Master_Log_File   master-bin.000001
Slave_IO_Running        Yes
Slave_SQL_Running       No
Replicate_Do_DB
Replicate_Ignore_DB
Replicate_Do_Table
Replicate_Ignore_Table
Replicate_Wild_Do_Table
Replicate_Wild_Ignore_Table
Last_Errno      1296
Last_Error      Error 'Got error 157 'Unknown error code' from NDBCLUSTER' on query. Default database: 'test'. Query: 'CREA
TE UNIQUE INDEX t1_i2 ON t1(c2)'
Skip_Counter    0
Exec_Master_Log_Pos     16856
Relay_Log_Space 17833
Until_Condition None
Until_Log_File
Until_Log_Pos   0
Master_SSL_Allowed      No
Master_SSL_CA_File
Master_SSL_CA_Path
Master_SSL_Cert
Master_SSL_Cipher
Master_SSL_Key
Seconds_Behind_Master
Master_SSL_Verify_Server_Cert   No
Last_IO_Errno   0
Last_IO_Error
Last_SQL_Errno  1296
Last_SQL_Error  Error 'Got error 157 'Unknown error code' from NDBCLUSTER' on query. Default database: 'test'. Query: 'CREA
TE UNIQUE INDEX t1_i2 ON t1(c2)'
=========================

mysqltest: At line NNN: could not sync with master ('select master_pos_wait('master-bin.000001', 17231)' returned NULL)

The result from queries just before the failure was:
< snip >
**** Do First Set of ALTERs in the master table ****
CREATE INDEX t1_i ON t1(c2, c3);
CREATE UNIQUE INDEX t1_i2 ON t1(c2);
ALTER TABLE t1 ADD c4 TIMESTAMP;
ALTER TABLE t1 ADD c5 DOUBLE;
ALTER TABLE t1 ADD INDEX (c5);
SHOW CREATE TABLE t1;
Table   Create Table
t1      CREATE TABLE `t1` (
  `c1` int(11) NOT NULL,
  `c2` int(11) NOT NULL,
  `c3` int(11) NOT NULL,
  `c4` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `c5` double DEFAULT NULL,
  PRIMARY KEY (`c1`),
  UNIQUE KEY `t1_i2` (`c2`),
  KEY `t1_i` (`c2`,`c3`),
  KEY `c5` (`c5`)
) /*!50100 TABLESPACE `ts1` STORAGE DISK */ ENGINE=ndbcluster DEFAULT CHARSET=latin1
**** Show first set of ALTERs on SLAVE ****
=====

More typical is this:

=====
rpl_ndb.rpl_ndb_sync           [ fail ]

ERROR: 4009 Cluster Failure
           Status: Unknown result, Classification: Unknown result error
           File: select_all.cpp (Line: 227)
mysqltest: In included file "./include/ndb_backup.inc": At line 9: command "$NDB_TOOLS_DIR/ndb_select_all --ndb-connectstri
ng="localhost:$NDBCLUSTER_PORT" -d sys --delimiter=',' SYSTAB_0 | grep 520093696 > $MYSQLTEST_VARDIR/tmp.dat" failed

The result from queries just before the failure was:
< snip >
USE ndbsynctest;
CREATE DATABASE ndbsynctest;
USE ndbsynctest;
CREATE TABLE t1 (c1 BIT(1) NOT NULL, c2 BIT(1) NOT NULL, c3 CHAR(15), PRIMARY KEY(c3)) ENGINE = NDB ;
INSERT INTO t1 VALUES (1,1,"row1"),(0,1,"row2"),(1,0,"row3"),(0,0,"row4");
CREATE TABLE t2 (c1 CHAR(15), c2 BIT(1) NOT NULL, c3 BIT(1) NOT NULL, PRIMARY KEY(c1)) ENGINE = NDB ;
INSERT INTO t2 VALUES ("ABC",1,1),("BCDEF",0,1),("CD",1,0),("DEFGHIJKL",0,0);
SELECT hex(c1),hex(c2),c3 FROM t1 ORDER BY c3;
hex(c1) hex(c2) c3
1       1       row1
0       1       row2
1       0       row3
0       0       row4
SELECT hex(c2),hex(c3),c1 FROM t2 ORDER BY c1;
hex(c2) hex(c3) c1
1       1       ABC
0       1       BCDEF
1       0       CD
0       0       DEFGHIJKL
exec of '/PATH/bin/ndb_select_all --ndb-connectstring="localhost:12015" -d sys --delimiter=',' SYSTAB_0 | grep 520093696 >
/PATH/mysql-test/var/tmp.dat' failed, error: 256, status: 1, errno: 0
=====

And without external commands, I still get:

=====
ndb.ndb_insert                 [ fail ]

mysqltest: At line NNN: query 'insert into t1 select * from t1 where b < 10 order by pk1' failed with wrong errno 1297: 'Go
t temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER', instead of 1022...

The result from queries just before the failure was:
< snip >
(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),
(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10);
ERROR 23000: Can't write; duplicate key in table 't1'
INSERT INTO t1 values (4000, 40, 44);
ERROR HY000: Got error 4350 'Transaction already aborted' from NDBCLUSTER
rollback;
select * from t1 where pk1=1;
pk1     b       c
1       1       1
select * from t1 where pk1=10;
pk1     b       c
10      10      10
select count(*) from t1 where pk1 <= 10 order by pk1;
count(*)
11
select count(*) from t1;
count(*)
2000
insert into t1 select * from t1 where b < 10 order by pk1;
ERROR HY000: Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER

More results from queries before failure can be found in /PATH/mysql-test/var/log/ndb_insert.log

Warnings from just before the error:
Error 1297 Got temporary error 4028 'Node failure caused abort of transaction' from NDB
Error 1297 Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER
Error 1028 Sort aborted
Error 1296 Got error 4350 'Transaction already aborted' from NDB
Error 1296 Got error 4350 'Transaction already aborted' from NDBCLUSTER
=====

Several other tests fail in a similar way.
[2 Feb 2009 18:37] Joerg Bruehe
This bug still exists:

In the builds of 5.0.77 (community), I have it on all 64 bit PPC platforms: AIX 5.2, AIX 5.3, and i5os.

Affected tests:
ndb_charset
ndb_insert
ndb_limit
ndb_loaddatalocal
ndb_restore
ndb_restore_print
ndb_truncate