MySQL Bugs: #67523: temporary error 20016 'Query aborted due to node failure' from NDBCLUSTER

Bug #67523	temporary error 20016 'Query aborted due to node failure' from NDBCLUSTER
Submitted:	8 Nov 2012 18:06	Modified:	15 Mar 2016 18:23
Reporter:	Ben Im	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-cluster-gpl-7.3.0-linux2.6-x86_64	OS:	Linux (3.2.0-29-generic #46-Ubuntu)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
mysql-cluster-gpl-7.3.0-linux2.6-x86_64.tar.gz
3.2.0-29-generic #46-Ubuntu

Got the following error:
07.11.2012 10:02:57 ERROR [JDBCExceptionReporter] Got temporary error 20016 'Query aborted due to node failure' from NDBCLUSTER

All ndb data nodes are up, connected, and communicating with each other according to ndbinfo > VIEW > nodes & transporters. 

Setup consists of 2 clusters
2 ndb data and sql nodes per cluster
ndb data and sql nodes are on a same host
2 ndb mgmt nodes for the clusters, 10.10.104.48 & 50

Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=3    @10.10.104.121  (mysql-5.5.25 ndb-7.3.0, Nodegroup: 0, Master)
id=4    @10.10.104.42  (mysql-5.5.25 ndb-7.3.0, Nodegroup: 0)
id=5    @10.10.104.43  (mysql-5.5.25 ndb-7.3.0, Nodegroup: 1)
id=6    @10.10.104.122  (mysql-5.5.25 ndb-7.3.0, Nodegroup: 1)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @10.10.104.48  (mysql-5.5.25 ndb-7.3.0)
id=2    @10.10.104.50  (mysql-5.5.25 ndb-7.3.0)

[mysqld(API)]   4 node(s)
id=20   @10.10.104.121  (mysql-5.5.25 ndb-7.3.0)
id=21   @10.10.104.42  (mysql-5.5.25 ndb-7.3.0)
id=22   @10.10.104.43  (mysql-5.5.25 ndb-7.3.0)
id=23   @10.10.104.122  (mysql-5.5.25 ndb-7.3.0)

Thanks,
Ben

How to repeat:
1. Clusters were functioning fine. 
2. stopped mysql.server service on 10.10.104.42. 
3. And then "4 stop" to stop ndb data from ndb_mgm. 
4. Then I was not able to connect this node, 4. I had to shutdown all ndb data nodes and then restart ndb data nodes and mysql.server on all nodes.
5. Repeated #2, #3, #4 for testing. Maybe 2 or 3 times. 
6. show command and ndbinfo indicate that all nodes are connected and functioning. 

Then I got the above error, 20016. Error log attached.

Error log is 960K large. Is there a way to generate error log smaller than 500K? Below generates 960K log:

root@NDB-Mgmt:/usr/local/mysql/bin# ./ndb_error_reporter ../config.ini username --fs

 Copying data from node 3

vidder@10.10.104.121's password:
ndb_3_error.log                               100% 4560     4.5KB/s   00:00
scp: /var/lib/mysql-cluster/ndb_3_fs: Permission denied
ndb_3_out.log                                 100%  192KB 192.4KB/s   00:00
ndb_3.pid                                     100%    4     0.0KB/s   00:00
ndb_3_trace.log.1                             100%  942KB 942.3KB/s   00:00
.....
and continues to other ndb data nodes.

Hi Ben,

Please only use the --fs option if asked too. I you skip it you will get manageable error reports you can attach to the bug.

Sadly the error report you have attached only contains your config.ini file.

Do you get same issue if trying with 7.3.1? If so please create a new error report and upload it after verifying it contains the logs it's supposed to. For example running:

$ tar tjf <name of error report>

$ ls -l <name of error report>

/Gustaf

Unfortunately I moved to 7.3.1 and have no plan to go back to 7.3.0. I will keep my eyes on it to see if I can duplicate it.

I have the same issue in 7.3.2, It had failed after a hot shutdow. I made a rolling restart and I've solved the problem.