Description:
After a cluster failure that hadn't been determined yet the reason for, the following is observed in the ndb_1_cluster log:
2007-05-03 10:59:44 [MgmSrvr] INFO -- Node 3: DICT: index 6 rebuild done
2007-05-03 10:59:44 [MgmSrvr] INFO -- Node 3: DICT: index 7 rebuild done
2007-05-03 10:59:44 [MgmSrvr] INFO -- Node 3: DICT: index 9 rebuild done
2007-05-03 10:59:45 [MgmSrvr] ALERT -- Node 1: Node 3 Disconnected
2007-05-03 10:59:45 [MgmSrvr] ALERT -- Node 3: Forced node shutdown completed, restarting. Occured during startphase 8. Caused by error
2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.
2007-05-03 10:59:45 [MgmSrvr] INFO -- Mgmt server state: nodeid 3 reserved for ip 192.114.69.36, m_reserved_nodes 000000000000000a.
2007-05-03 10:59:45 [MgmSrvr] INFO -- Node 1: Node 3 Connected
2007-05-03 10:59:46 [MgmSrvr] INFO -- Node 3: Communication to Node 2 opened
2007-05-03 10:59:46 [MgmSrvr] INFO -- Node 3: Waiting 30 sec for nodes 0000000000000004 to connect, nodes [ all: 000000000000000c conne
cted: 0000000000000008 no-wait: 0000000000000000 ]
2007-05-03 10:59:46 [MgmSrvr] INFO -- Mgmt server state: nodeid 3 freed, m_reserved_nodes 0000000000000002.
2007-05-03 10:59:49 [MgmSrvr] INFO -- Node 3: Waiting 27 sec for nodes 0000000000000004 to connect, nodes [ all: 000000000000000c conne
cted: 0000000000000008 no-wait: 0000000000000000 ]
2007-05-03 10:59:52 [MgmSrvr] INFO -- Node 3: Waiting 24 sec for nodes 0000000000000004 to connect, nodes [ all: 000000000000000c conne
cted: 0000000000000008 no-wait: 0000000000000000 ]
Now, after reviewing the data nodes, I've decided to remove one of the nodes and see if the cluster will come up with a single node only, but the problem persists. In addition, I've observed the following in the data node log files:
2007-05-03 10:49:19 [ndbd] INFO -- NDB Cluster -- DB node 3
2007-05-03 10:49:19 [ndbd] INFO -- Version 5.0.37 --
2007-05-03 10:49:19 [ndbd] INFO -- Configuration fetched at 192.114.69.34 port 1186
2007-05-03 10:49:19 [ndbd] INFO -- Start initiated (version 5.0.37)
2007-05-03 10:59:44 [ndbd] INFO -- Error handler restarting system
2007-05-03 10:59:45 [ndbd] INFO -- Error handler shutdown completed - exiting
2007-05-03 10:59:45 [ndbd] ALERT -- Node 3: Forced node shutdown completed, restarting. Occured during startphase 8. Caused by error 281
5: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.
2007-05-03 10:59:45 [ndbd] INFO -- Ndb has terminated (pid 26699) restarting
2007-05-03 10:59:45 [ndbd] INFO -- Angel pid: 26660 ndb pid: 26741
How to repeat:
unknown at this point, as I don't have a clue as to what caused this to happen.