Bug #75530 Unknown Error
Submitted: 16 Jan 2015 16:28 Modified: 4 Mar 2015 7:49
Reporter: Joel Hanger Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.6.17-ndb-7.3.5 OS:Linux (CentOS release 6.5 (Final))
Assigned to: CPU Architecture:Any
Tags: 2341, cluster shutdown, failed, halt, internal, ndbrequire, programming, Temporary error, unkown

[16 Jan 2015 16:28] Joel Hanger
Description:
Cluster size:
  2 management nodes
  3 api nodes
  4 data node groups with 2 replicas
  
The 3 api nodes are replicating data into the cluster from 3 active production servers that are sharded. This has been running for approximately 6 months without any issues beyond what has been expected. 
This morning the cluster had shut down with the following error message:

Management server log: 
2015-01-16 12:35:39 [MgmtSrvr] ALERT    -- Node 10: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Node 10 error log:
Time: Friday 16 January 2015 - 12:35:38
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtcMain.cpp
Error object: DBTC (Line: 18397) 0x00000002
Program: ndbmtd
Pid: 9017 thr: 0
Version: mysql-5.6.17 ndb-7.3.5
Trace: /mnt/storage/mysql/data/ndb_10_trace.log.2 [t1..t4]
***EOM***

Yesterday some indexing changes were made to reduce DataMemory usage.
A planned change of MaxNoOfTriggers was going to be applied today in order to facilitate a change of indexes to hashes to further reduce DataMemory usage. 

At restart of data nodes the cluster was able to come back online. 

How to repeat:
Not sure of how this could be replicated.
[16 Jan 2015 23:27] Joel Hanger
After bringing the cluster up, 2 nodes ( from separate node groups ) lagged in starting up until the cluster was online. 

Nodes 8 & 9 were the lagging in this: 
[ndbd(NDB)]     8 node(s)
id=3    @10.1.5.XXX  (mysql-5.6.17 ndb-7.3.5, Nodegroup: 0, *)
id=4    @10.1.5.XXX  (mysql-5.6.17 ndb-7.3.5, Nodegroup: 0)
id=5    @10.1.5.XXX  (mysql-5.6.17 ndb-7.3.5, Nodegroup: 1)
id=6    @10.1.5.XXX  (mysql-5.6.17 ndb-7.3.5, Nodegroup: 1)
id=7    @10.1.5.XXX  (mysql-5.6.17 ndb-7.3.5, Nodegroup: 2)
id=8    @10.1.5.XXX  (mysql-5.6.17 ndb-7.3.5, Nodegroup: 2)
id=9    @10.1.5.XXX  (mysql-5.6.17 ndb-7.3.5, Nodegroup: 3)
id=10   @10.1.5.XXX  (mysql-5.6.17 ndb-7.3.5, Nodegroup: 3)

Node 7: Data usage is 59%(401062 32K pages of total 671328)
Node 8: Data usage is 66%(444062 32K pages of total 671328)

Node 9: Data usage is 68%(459314 32K pages of total 671328)
Node 10: Data usage is 60%(403285 32K pages of total 671328)

All the other node groups have the exact same usage however.. ie:

Node 3: Data usage is 58%(393240 32K pages of total 671328)
Node 4: Data usage is 58%(393240 32K pages of total 671328)
[4 Feb 2015 7:49] MySQL Verification Team
Thank you for the report.
I could not reproduce this issue at my end(tried with 7.3.7/8).
Could you please provide a repeatable test case to trigger this issue at our end? Also, see Bug #70217

Thanks,
Umesh
[5 Mar 2015 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".