Bug #21996 Missleading error message
Submitted: 4 Sep 2006 14:42 Modified: 25 Jul 2008 15:55
Reporter: Hartmut Holzgraefe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: cluster, error message, startup

[4 Sep 2006 14:42] Hartmut Holzgraefe
Description:
The log messages produced when a system restart fails due to not all nodes being started within StartPartialTimeout is not very helpfull:

  Conflict when selecting restart type (Internal error, programming error or missing error message, please report a bug)

How to repeat:
Shut down a cluster, restart only some nodes, with nodes from at least one node group completely missing

Suggested fix:
Produce a more meaningfull error message, mentioning StartPartialTimeout settings
[20 Jan 2007 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[2 Sep 2007 10:39] Hartmut Holzgraefe
i'm still getting this with 5.0.45, running a 2x2 cluster with all nodes on one box (although i don't think that really matters)

- fresh cluster start
- after cluster is up: ALL STOP
- restart only 2 ndbd processes
- these become the nodes of the first node group,
  no nodes in the 2nd node group
- after waiting 30s for the remaining nodes to show
  up everything is shut down
[2 Sep 2007 10:41] Hartmut Holzgraefe
config.ini and log files

Attachment: ndb_error_report_20070902123950.tar.bz2 (application/x-tar, text), 31.11 KiB.

[25 Jul 2008 15:45] Hartmut Holzgraefe
Can't reproduce in 6.3.14, here the first node to fail reports:

  Time: Friday 25 July 2008 - 17:38:52
  Status: Temporary error, restart node
  Message: Insufficent nodes for system restart (Restart error)
  Error: 2353
  Error data: Unable to start missing node group!  starting: 0000000000000030   (missing fs for: 0000000000000000)
  Error object: QMGR (Line: 1567) 0x0000000a
  Program: ndbd
  Pid: 29325
  Trace: /data2/bug/8494/cluster/ndb_5_trace.log.2
  Version: mysql-5.1.24 ndb-6.3.14-RC

and the next node fails with

  Time: Friday 25 July 2008 - 17:39:27
  Status: Temporary error, restart node
  Message: Another node failed during system restart, please investigate error(s) on other node(s) (Restart error)
  Error: 2308
  Error data: Node 5 disconnected
  Error object: QMGR (Line: 2766) 0x0000000a
  Program: ndbd
  Pid: 6390
  Trace: /data2/bug/8494/cluster/ndb_4_trace.log.1
  Version: mysql-5.1.24 ndb-6.3.14-RC

"Insufficent nodes for system restart (Restart error)" and "Unable to start missing node group!" for the first failing node and "Node 5 disconnected" on the other node as a reference where to look for the real error is perfectly ok now so i think we can close this one. A backport to older versions should not be necessary.
[25 Jul 2008 15:55] Hartmut Holzgraefe
Thank you for your bug report. This issue has already been fixed in the latest released version of that product, which you can download at

  http://www.mysql.com/downloads/