MySQL Bugs: #65480: ndbmtd REDO log corruption

Bug #65480	ndbmtd REDO log corruption
Submitted:	31 May 2012 22:10	Modified:	28 Jun 2016 16:12
Reporter:	Chris Brown	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.1.17	OS:	Linux (Fedora 12)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
ndbmtd fails to start and corrupts its REDO after a 'shutdown' is issued through ndb_mgm if ndbmtd is started before ndb_mgmd completely shuts down.

The first error in the ndb_1_error.log is:

Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dblqh/DblqhMain.cpp
Error object: DBLQH (Line: 16932) 0x00000002
Program: /usr/sbin/ndbmtd

following errors:

Status: Ndbd file system error, restart node initial
Message: Error while reading the REDO log (Ndbd file system inconsistency error, please report a bug)
Error: 2310
Error data: Error while reading REDO log. from 18396
part: 1 D=9, F=0 Mb=0 FP=1 W1=1891 W2=1395549294 : Invalid logword gci: 2943
Error object: DBLQH (Line: 18444) 0x00000002
Program: /usr/sbin/ndbmtd

How to repeat:
1. configure and start using noOfReplicas=2, one ndbd node, and one inactive ndbd node, one ndb_mgmd.
2. issue 'ndb_mgm --execute shutdown'.
3. before ndb_mgmd has shut down (as soon as ndbmtd stops), start the new ndbmtd instance

Repro rate seems to be high (2/2 so far).

Workaround: allow ndb_mgmd to fully stop before starting ndbmtd, however once the REDO log is corrupted, ndbmtd does not appear recoverable (except from backup).

Sorry an amendment to the steps.  after step 1:

1a: change the cluster configuration so that the 'inactive' node has a new ip address.  (not sure if this is required)

cannot reproduce with 7.1.37 (nor 7.2.23)