Bug #65480 ndbmtd REDO log corruption
Submitted: 31 May 2012 22:10 Modified: 28 Jun 2016 16:12
Reporter: Chris Brown Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.1.17 OS:Linux (Fedora 12)
Assigned to: MySQL Verification Team CPU Architecture:Any

[31 May 2012 22:10] Chris Brown
Description:
ndbmtd fails to start and corrupts its REDO after a 'shutdown' is issued through ndb_mgm if ndbmtd is started before ndb_mgmd completely shuts down.

The first error in the ndb_1_error.log is:

Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dblqh/DblqhMain.cpp
Error object: DBLQH (Line: 16932) 0x00000002
Program: /usr/sbin/ndbmtd

following errors:

Status: Ndbd file system error, restart node initial
Message: Error while reading the REDO log (Ndbd file system inconsistency error, please report a bug)
Error: 2310
Error data: Error while reading REDO log. from 18396
part: 1 D=9, F=0 Mb=0 FP=1 W1=1891 W2=1395549294 : Invalid logword gci: 2943
Error object: DBLQH (Line: 18444) 0x00000002
Program: /usr/sbin/ndbmtd

How to repeat:
1. configure and start using noOfReplicas=2, one ndbd node, and one inactive ndbd node, one ndb_mgmd.
2. issue 'ndb_mgm --execute shutdown'.
3. before ndb_mgmd has shut down (as soon as ndbmtd stops), start the new ndbmtd instance

Repro rate seems to be high (2/2 so far).

Workaround: allow ndb_mgmd to fully stop before starting ndbmtd, however once the REDO log is corrupted, ndbmtd does not appear recoverable (except from backup).
[31 May 2012 22:14] Chris Brown
Sorry an amendment to the steps.  after step 1:

1a: change the cluster configuration so that the 'inactive' node has a new ip address.  (not sure if this is required)
[28 Jun 2016 16:12] MySQL Verification Team
cannot reproduce with 7.1.37 (nor 7.2.23)