Bug #62343 Data Node don't restart
Submitted: 4 Sep 2011 8:17 Modified: 5 Sep 2011 16:57
Reporter: Edu GF Email Updates:
Status: Analyzing Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1.56 ndb-7.1.15 OS:Linux (Fedora 15)
Assigned to: Matthew Montgomery CPU Architecture:Any

[4 Sep 2011 8:17] Edu GF
Description:
After stoping a node and trying to restar the node give the error:

2011-09-04 04:35:04 [ndbd] ALERT    -- Node 10: Forced node shutdown completed. Occured during startphase 5. Caused by error 2306: 'Pointer too large(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

And the node continue down.

id=10 (not connected, accepting connect from 10.235.235.136)
id=11   @10.235.235.138  (mysql-5.1.56 ndb-7.1.15, Nodegroup: 0)
id=12   @10.235.235.140  (mysql-5.1.56 ndb-7.1.15, Nodegroup: 0)
id=13   @10.235.235.142  (mysql-5.1.56 ndb-7.1.15, Nodegroup: 0, Master)

[ndb_mgmd(MGM)] 3 node(s)
id=1    @10.235.235.3  (mysql-5.1.56 ndb-7.1.15)
id=2    @10.235.235.4  (mysql-5.1.56 ndb-7.1.15)
id=3    @10.235.235.5  (mysql-5.1.56 ndb-7.1.15)

[mysqld(API)]   6 node(s)
id=20   @10.235.235.136  (mysql-5.1.56 ndb-7.1.15)
id=21   @10.235.235.138  (mysql-5.1.56 ndb-7.1.15)
id=22   @10.235.235.140  (mysql-5.1.56 ndb-7.1.15)
id=23   @10.235.235.142  (mysql-5.1.56 ndb-7.1.15)
id=24 (not connected, accepting connect from any host)
id=25 (not connected, accepting connect from any host)

How to repeat:
Kill a Data node with -9 simulating a node faluery and try to restart that node back on-line.
[30 Sep 2011 16:32] Matthew Bowie
I've had the same issue using Cluster 7.2 on CentOS 5.6 on Amazon EC2 and OpenIndiana on local hardware.  Error log from OpenIndiana:

Time: Friday 30 September 2011 - 10:15:15
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbtup/DbtupDiskAlloc.cpp
Error object: DBTUP (Line: 1921) 0x00000002
Program: ndbd
Pid: 12969
Version: mysql-5.1.51 ndb-7.2.0-beta
Trace: /mysqlcluster//ndb_5_trace.log.2
***EOM***

Trace log at:  http://luna.edu/media/ndb_5_trace.log.2

A stopped/killed node will not be able to rejoin the cluster unless you use --initial.  If the whole cluster is stopped, it will not be able to restart without using --initial on all nodes and restoring from a backup.  When using the distributed permissions added in 7.2 this causes even more pain.

Are there any changes we can make to config.ini to mitigate this until it's patched?