Bug #28215 MySQL Cluster version 5.0.37 is unable to start due to file system incosistency
Submitted: 3 May 2007 8:24 Modified: 3 May 2007 12:57
Reporter: Nir Simionovich Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.0.37 OS:Linux
Assigned to: CPU Architecture:Any

[3 May 2007 8:24] Nir Simionovich
Description:
After a cluster failure that hadn't been determined yet the reason for, the following is observed in the ndb_1_cluster log:

2007-05-03 10:59:44 [MgmSrvr] INFO     -- Node 3: DICT: index 6 rebuild done
2007-05-03 10:59:44 [MgmSrvr] INFO     -- Node 3: DICT: index 7 rebuild done
2007-05-03 10:59:44 [MgmSrvr] INFO     -- Node 3: DICT: index 9 rebuild done
2007-05-03 10:59:45 [MgmSrvr] ALERT    -- Node 1: Node 3 Disconnected
2007-05-03 10:59:45 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed, restarting. Occured during startphase 8. Caused by error
2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.
2007-05-03 10:59:45 [MgmSrvr] INFO     -- Mgmt server state: nodeid 3 reserved for ip 192.114.69.36, m_reserved_nodes 000000000000000a.
2007-05-03 10:59:45 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2007-05-03 10:59:46 [MgmSrvr] INFO     -- Node 3: Communication to Node 2 opened
2007-05-03 10:59:46 [MgmSrvr] INFO     -- Node 3: Waiting 30 sec for nodes 0000000000000004 to connect, nodes [ all: 000000000000000c conne
cted: 0000000000000008 no-wait: 0000000000000000 ]
2007-05-03 10:59:46 [MgmSrvr] INFO     -- Mgmt server state: nodeid 3 freed, m_reserved_nodes 0000000000000002.
2007-05-03 10:59:49 [MgmSrvr] INFO     -- Node 3: Waiting 27 sec for nodes 0000000000000004 to connect, nodes [ all: 000000000000000c conne
cted: 0000000000000008 no-wait: 0000000000000000 ]
2007-05-03 10:59:52 [MgmSrvr] INFO     -- Node 3: Waiting 24 sec for nodes 0000000000000004 to connect, nodes [ all: 000000000000000c conne
cted: 0000000000000008 no-wait: 0000000000000000 ]

Now, after reviewing the data nodes, I've decided to remove one of the nodes and see if the cluster will come up with a single node only, but the problem persists. In addition, I've observed the following in the data node log files:

2007-05-03 10:49:19 [ndbd] INFO     -- NDB Cluster -- DB node 3
2007-05-03 10:49:19 [ndbd] INFO     -- Version 5.0.37 --
2007-05-03 10:49:19 [ndbd] INFO     -- Configuration fetched at 192.114.69.34 port 1186
2007-05-03 10:49:19 [ndbd] INFO     -- Start initiated (version 5.0.37)
2007-05-03 10:59:44 [ndbd] INFO     -- Error handler restarting system
2007-05-03 10:59:45 [ndbd] INFO     -- Error handler shutdown completed - exiting
2007-05-03 10:59:45 [ndbd] ALERT    -- Node 3: Forced node shutdown completed, restarting. Occured during startphase 8. Caused by error 281
5: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.
2007-05-03 10:59:45 [ndbd] INFO     -- Ndb has terminated (pid 26699) restarting
2007-05-03 10:59:45 [ndbd] INFO     -- Angel pid: 26660 ndb pid: 26741

How to repeat:
unknown at this point, as I don't have a clue as to what caused this to happen.
[3 May 2007 12:14] Hartmut Holzgraefe
Can't repeat as the original cause is not known, 
restarting node 3 with --initial should solve
the current situation if this is the only node
reporting file system problems
[3 May 2007 12:57] Nir Simionovich
Well, the situation is identical on both the nodes in the cluster, making the entire cluster non-working. I've tried bringing up the cluster with node3 then bring it up with node4, both showed the same exact issue.