Bug #56256 Cluster failure while performing standard rolling restart (increased DataMemory)
Submitted: 25 Aug 2010 14:44 Modified: 25 Aug 2010 14:53
Reporter: Jeffrey R Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.1.44 OS:Linux (Fedora)
Assigned to: CPU Architecture:Any

[25 Aug 2010 14:44] Jeffrey R
Description:
While performing a standard rolling restart of the cluster in order to increase the amount of DataMemory allocated to each of the data nodes one of the data nodes forced the master to crash. After updating the configuration file on both management nodes, and performing a simultaneous restart of the ndb_mgmd processes, I restarted the first data node using --initial and waited for it to fully synchronize, i then proceeded to restart the second data node with --initial, this forced the first data node to fail.

How to repeat:
This is not 100% reproducible as I have done similar procedures before, but these were the steps in which this particular event took place.

Update config.ini on both ndb_mgmd nodes
Stop both ndb_mgmd nodes
Start both ndb_mgmd nodes using "ndb_mgmd -f /var/lib/mysql-cluster/config.ini --initial"
Stopped first data node using "3 STOP" in the ndb_mgm console
Started data node (id 3) with "ndbmtd --initial" --> waited for datanode to come online
Stopped second data node using "4 STOP" in the ndb_mgm console
Started data node (id 4) with "ndbmtd --initial"
CRASH OCCURED HERE

Suggested fix:
No idea
[25 Aug 2010 14:47] Andrew Hutchings
Please note that for increasing datamemory you should not start the data nodes with --initial.

Can you please send us the file generated by ndb_error_reporter for this?
[25 Aug 2010 14:48] Jeffrey R
ndbmtd error log

Attachment: ndbmtd_3_logs.tar.gz (application/x-gzip, text), 304.01 KiB.

[25 Aug 2010 14:53] Andrew Hutchings
This is a gcp_commit based GCP stop.  Basically the disks for the data node cannot handle the traffic required when starting the data node.

Please see the section on GCP Stop in:
http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html

If this does not help please increase TimeBetweenEpochsTimeout:

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-ti...