MySQL Bugs: #56256: Cluster failure while performing standard rolling restart (increased DataMemory)

Bug #56256	Cluster failure while performing standard rolling restart (increased DataMemory)
Submitted:	25 Aug 2010 14:44	Modified:	25 Aug 2010 14:53
Reporter:	Jeffrey R	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.1.44	OS:	Linux (Fedora)
Assigned to:		CPU Architecture:	Any

Description:
While performing a standard rolling restart of the cluster in order to increase the amount of DataMemory allocated to each of the data nodes one of the data nodes forced the master to crash. After updating the configuration file on both management nodes, and performing a simultaneous restart of the ndb_mgmd processes, I restarted the first data node using --initial and waited for it to fully synchronize, i then proceeded to restart the second data node with --initial, this forced the first data node to fail.

How to repeat:
This is not 100% reproducible as I have done similar procedures before, but these were the steps in which this particular event took place.

Update config.ini on both ndb_mgmd nodes
Stop both ndb_mgmd nodes
Start both ndb_mgmd nodes using "ndb_mgmd -f /var/lib/mysql-cluster/config.ini --initial"
Stopped first data node using "3 STOP" in the ndb_mgm console
Started data node (id 3) with "ndbmtd --initial" --> waited for datanode to come online
Stopped second data node using "4 STOP" in the ndb_mgm console
Started data node (id 4) with "ndbmtd --initial"
CRASH OCCURED HERE

Suggested fix:
No idea

Please note that for increasing datamemory you should not start the data nodes with --initial.

Can you please send us the file generated by ndb_error_reporter for this?

ndbmtd error log

Attachment: ndbmtd_3_logs.tar.gz (application/x-gzip, text), 304.01 KiB.

This is a gcp_commit based GCP stop.  Basically the disks for the data node cannot handle the traffic required when starting the data node.

Please see the section on GCP Stop in:
http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html

If this does not help please increase TimeBetweenEpochsTimeout:

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-ti...