Bug #42564 Ndbd file system error
Submitted: 3 Feb 2009 9:24 Modified: 4 Nov 2009 14:43
Reporter: Walter Schloegl Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-6.4 OS:Solaris
Assigned to: CPU Architecture:Any
Triage: Triaged: D1 (Critical)

[3 Feb 2009 9:24] Walter Schloegl
Description:
We are using:
revid jonas@mysql.com-20090131154212-kunoo34f0hqmiz4z dated 
2009-02-02 from repo mysql-5.1-telco-6.4 compiled with Solaris Workshop 12 
as follows:

CC=cc CXX=CC CFLAGS="-xO5 -fast -g -mt -m64" CXXFLAGS="-xO5 -fast -g -mt 
-m64" ./configure --prefix=<somepath> --with-plugins=all --without-docs 
--without-man

When trying to check the correction for bugid 42450
the following problem occured:

ndb_mgm> Node 3: Forced node shutdown completed. Occured during startphase 5. Caused by error 2310: 'Error while reading the REDO log(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.  

In ndb_3_out.log are messages like:
...
2009-02-02 17:59:34 [ndbd] WARNING  -- Ndb kernel thread 6 is stuck in: Job Hand
ling elapsed=7920
2009-02-02 17:59:34 [ndbd] INFO     -- Watchdog: User time: 241  System time: 94
4
2009-02-02 17:59:34 [ndbd] WARNING  -- Ndb kernel thread 6 is stuck in: Job Hand
ling elapsed=8030
2009-02-02 17:59:34 [ndbd] INFO     -- Watchdog: User time: 241  System time: 94
5
2009-02-02 17:59:35 [ndbd] WARNING  -- Ndb kernel thread 6 is stuck in: Job Hand
ling elapsed=8140
2009-02-02 17:59:35 [ndbd] INFO     -- Watchdog: User time: 242  System time: 94
5
...
saving 102fb0000 at 1005e2d10 (0)saving 102ee8000 at 1005df010 (0)

saving 103140000 at 1005ea710 (0)saving 103078000 at 1005e6a10 (0)

saving 102e20000 at 1005db310 (0)
restartCreateObj(1) file: 1
restartCreateObj(2) file: 1
restartCreateObj(3) file: 1
restartCreateObj(4) file: 1
restartCreateObj(5) file: 1
restartCreateObj(6) file: 1
restartCreateObj(8) file: 1
restartCreateObj(9) file: 1
restartCreateObj(10) file: 1
restartCreateObj(11) file: 1
restartCreateObj(12) file: 1
restartCreateObj(16) file: 1
restartCreateObj(17) file: 1
restartCreateObj(18) file: 1
restartCreateObj(19) file: 1
restartCreateObj(20) file: 1
restartCreateObj(21) file: 1
restartCreateObj(22) file: 1
restartCreateObj(23) file: 1
restartCreateObj(24) file: 1
restartCreateObj(26) file: 1
restartCreateObj(30) file: 1
restartCreateObj(32) file: 1
tab: 2 frag: 0 replicaP->nextLcp: 0
scanning idx: 1 lcpId: 578 774900
scanning idx: 0 lcpId: 0
- scanning idx: 2 lcpId: 0
Didnt find any LCP for node: 3 tab: 2 frag: 0
tab: 2 frag: 1 replicaP->nextLcp: 0
scanning idx: 1 lcpId: 578 774900
scanning idx: 0 lcpId: 0
...
Didnt find any LCP for node: 3 tab: 32 frag: 5
tab: 32 frag: 6 replicaP->nextLcp: 0
scanning idx: 1 lcpId: 578 774900
scanning idx: 0 lcpId: 577 774900
- scanning idx: 2 lcpId: 0
Didnt find any LCP for node: 3 tab: 32 frag: 6
tab: 32 frag: 7 replicaP->nextLcp: 0
scanning idx: 1 lcpId: 578 774900
scanning idx: 0 lcpId: 577 774900
- scanning idx: 2 lcpId: 0
Didnt find any LCP for node: 3 tab: 32 frag: 7
2009-02-02 17:59:50 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck in: Print Jo
b Buffers at crash elapsed=109
2009-02-02 17:59:50 [ndbd] INFO     -- Watchdog: User time: 2518  System time: 1
349
2009-02-02 17:59:50 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck in: Print Jo
b Buffers at crash elapsed=219
2009-02-02 17:59:50 [ndbd] INFO     -- Watchdog: User time: 2518  System time: 1
349
2009-02-02 17:59:50 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck in: Print Jo
b Buffers at crash elapsed=329
...
2009-02-02 17:59:52 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck in: Print Jo
b Buffers at crash elapsed=549
2009-02-02 17:59:52 [ndbd] INFO     -- Watchdog: User time: 2551  System time: 1
358
2009-02-02 17:59:52 [ndbd] INFO     -- Error while reading REDO log. from 16262
D=8, F=4 Mb=142 FP=4575 W1=3386 W2=7 : end of log wo/ having found last GCI gci:
 784127
2009-02-02 17:59:52 [ndbd] INFO     -- DBLQH (Line: 16369) 0x0000000a
2009-02-02 17:59:52 [ndbd] INFO     -- Error handler startup shutting down syste
m
2009-02-02 17:59:52 [ndbd] INFO     -- Error handler shutdown completed - exitin
g
2009-02-02 17:59:52 [ndbd] INFO     -- Angel received ndbd startup failure count
 1.
2009-02-02 17:59:56 [ndbd] ALERT    -- Node 3: Forced node shutdown completed. O
ccured during startphase 5. Caused by error 2310: 'Error while reading the REDO
log(Ndbd file system inconsistency error, please report a bug). Ndbd file system
 error, restart node initial'.

How to repeat:
On Node A:
- running an "insert" Process (insert into ....)
- running a "delete" Process (delete from ...)
(Both processes dealing with the same table)

On Node B:
- restart ndbmtd
- during start phase, kill -9 "ndbmtd"
[11 Feb 2009 8:24] Bernd Ocklin
This is not reproducible with the given information. Also our autotest environment stress testing node starts did not catch such a redo log corruption ever. Can you please upload a full cluster logs, out files, trace files and mysql server logs? (ndb_error_reporter tool).

Bernd
[13 Mar 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[27 Apr 2009 13:00] Ole John Aske
I have tried to reproduce this bug wo/ any success. However, during this effort bug#44291 has been discovered & fixed - which may result in inconsistent REDO logs  as described in this bug.
[4 Nov 2009 14:43] Jon Stephens
Duplicate of BUG#44952.