Bug #67523 temporary error 20016 'Query aborted due to node failure' from NDBCLUSTER
Submitted: 8 Nov 2012 18:06 Modified: 15 Mar 2016 18:23
Reporter: Ben Im Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-cluster-gpl-7.3.0-linux2.6-x86_64 OS:Linux (3.2.0-29-generic #46-Ubuntu)
Assigned to: MySQL Verification Team CPU Architecture:Any

[8 Nov 2012 18:06] Ben Im
Description:
mysql-cluster-gpl-7.3.0-linux2.6-x86_64.tar.gz
3.2.0-29-generic #46-Ubuntu

Got the following error:
07.11.2012 10:02:57 ERROR [JDBCExceptionReporter] Got temporary error 20016 'Query aborted due to node failure' from NDBCLUSTER

All ndb data nodes are up, connected, and communicating with each other according to ndbinfo > VIEW > nodes & transporters. 

Setup consists of 2 clusters
2 ndb data and sql nodes per cluster
ndb data and sql nodes are on a same host
2 ndb mgmt nodes for the clusters, 10.10.104.48 & 50

Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=3    @10.10.104.121  (mysql-5.5.25 ndb-7.3.0, Nodegroup: 0, Master)
id=4    @10.10.104.42  (mysql-5.5.25 ndb-7.3.0, Nodegroup: 0)
id=5    @10.10.104.43  (mysql-5.5.25 ndb-7.3.0, Nodegroup: 1)
id=6    @10.10.104.122  (mysql-5.5.25 ndb-7.3.0, Nodegroup: 1)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @10.10.104.48  (mysql-5.5.25 ndb-7.3.0)
id=2    @10.10.104.50  (mysql-5.5.25 ndb-7.3.0)

[mysqld(API)]   4 node(s)
id=20   @10.10.104.121  (mysql-5.5.25 ndb-7.3.0)
id=21   @10.10.104.42  (mysql-5.5.25 ndb-7.3.0)
id=22   @10.10.104.43  (mysql-5.5.25 ndb-7.3.0)
id=23   @10.10.104.122  (mysql-5.5.25 ndb-7.3.0)

Thanks,
Ben

How to repeat:
1. Clusters were functioning fine. 
2. stopped mysql.server service on 10.10.104.42. 
3. And then "4 stop" to stop ndb data from ndb_mgm. 
4. Then I was not able to connect this node, 4. I had to shutdown all ndb data nodes and then restart ndb data nodes and mysql.server on all nodes.
5. Repeated #2, #3, #4 for testing. Maybe 2 or 3 times. 
6. show command and ndbinfo indicate that all nodes are connected and functioning. 

Then I got the above error, 20016. Error log attached.
[8 Nov 2012 18:15] Ben Im
Error log is 960K large. Is there a way to generate error log smaller than 500K? Below generates 960K log:

root@NDB-Mgmt:/usr/local/mysql/bin# ./ndb_error_reporter ../config.ini username --fs

 Copying data from node 3

vidder@10.10.104.121's password:
ndb_3_error.log                               100% 4560     4.5KB/s   00:00
scp: /var/lib/mysql-cluster/ndb_3_fs: Permission denied
ndb_3_out.log                                 100%  192KB 192.4KB/s   00:00
ndb_3.pid                                     100%    4     0.0KB/s   00:00
ndb_3_trace.log.1                             100%  942KB 942.3KB/s   00:00
.....
and continues to other ndb data nodes.
[2 May 2013 7:11] Gustaf Thorslund
Hi Ben,

Please only use the --fs option if asked too. I you skip it you will get manageable error reports you can attach to the bug.

Sadly the error report you have attached only contains your config.ini file.

Do you get same issue if trying with 7.3.1? If so please create a new error report and upload it after verifying it contains the logs it's supposed to. For example running:

$ tar tjf <name of error report>

$ ls -l <name of error report>

/Gustaf
[2 May 2013 15:56] Ben Im
Unfortunately I moved to 7.3.1 and have no plan to go back to 7.3.0. I will keep my eyes on it to see if I can duplicate it.
[16 Sep 2014 0:29] XX XX
I have the same issue in 7.3.2, It had failed after a hot shutdow. I made a rolling restart and I've solved the problem.