Bug #61173 Exit single user mode resets ndb status to running.
Submitted: 13 May 2011 17:00 Modified: 6 Jun 2011 11:17
Reporter: S McCarthy Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1.51 ndb-7.1.9 OS:Linux (Ubuntu Natty)
Assigned to: Assigned Account CPU Architecture:Any
Tags: ndb cluster ndb_mgm startup corruption

[13 May 2011 17:00] S McCarthy
Description:
During data node startup, running 'exit single user mode' from ndb_mgm (regardless of single user state) causes the manager to report all nodes running.

In that condition, stopping a node from the manager returns a generic "application error".

This can lead to a state where the cluster appears to be functioning (and ready for rolling restart, failover, maintenance etc) but in fact is not, and must be resynced on startup.

How to repeat:
Stop a data node.
Restart the node (with --initial if needed, to generate a long startup delay.) It is not necessary to enable single user mode at any point.
"ngb_mgm -e show" will show it starting up.
Run "ndb_mgm -e 'exit single user mode'"
"ndb_mgm -e show" will show all nodes running, as will "all status".

Suggested fix:
Check if single-user-mode is active before disabling. Save and restore individual node states around single user.
[6 Jun 2011 11:17] Geert Vanderkelen
I could not reproduce entirely, but that's probably because I phase 2 is going to fast. However, I could reproduce the data node crash during start phase 5 when doing EXIT SINGLE USER MODE while it is starting.

Steps to reproduce:
1) Simple cluster configuration, 2 data nodes (don't think an API node has to run even)
2) Start cluster
3) Stop first data node and remove filesystem (aka start initial)
4) Start the data node again and keep doing ndb_mgm -e "EXIT SINGLE USER MODE" continuously ignoring errors from ndb_mgm
5) Watch data node going down

Verified using MySQL Cluster 7.1.9