Bug #62650 Data Node crashes in DBDIH during node restart (followed by restore of mysqldum)
Submitted: 7 Oct 2011 8:12 Modified: 7 Nov 2011 11:02
Reporter: Johan Andersson Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.1.15a OS:Any
Assigned to: CPU Architecture:Any
Tags: dbdih, Failure, ndbmtd, node restart

[7 Oct 2011 8:12] Johan Andersson
Description:
After doing a restore from a .sql file with all 4 nodes up and running
then trying to restart a single node first error was the same DBDIH

Time: Wednesday 5 October 2011 - 00:39:59
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 14611) 0x00000000
Program: /usr/local/mysql/mysql-7.1.15a-linux-x86_64/bin/ndbmtd
Pid: 3609 thr: 0
Version: mysql-5.1.56 ndb-7.1.15a
Trace: /data/mysqlcluster//ndb_4_trace.log.1 /data/mysqlcluster//ndb_4_trace.log

Then the subsequent node restart gave:

Time: Wednesday 5 October 2011 - 01:08:49
Status: Ndbd file system error, restart node initial
Message: Invalid LCP (Ndbd file system inconsistency error, please report a bug)
Error: 2352
Error data: Error 899 (line: 1188) during restore of  0/T1608F15
Error object: RESTORE (Line: 1213) 0x00000000
Program: /usr/local/mysql/mysql-7.1.15a-linux-x86_64/bin/ndbmtd
Pid: 3720 thr: 5
Version: mysql-5.1.56 ndb-7.1.15a
Trace: /data/mysqlcluster//ndb_4_trace.log.2 /data/mysqlcluster//ndb_4_trace.log.2_t

How to repeat:
Hard to say, but if we can know what table it is, then we could give data for that table.
[7 Oct 2011 8:14] Johan Andersson
I would love to ftp tracefiles, but ftp.mysql.com is not working.
[7 Oct 2011 8:25] Johan Andersson
file is split using 'split' -  4 parts in total

Attachment: xaa (application/octet-stream, text), 500.00 KiB.

[7 Oct 2011 8:26] Johan Andersson
file is split using 'split' -  4 parts in total

Attachment: xac (application/octet-stream, text), 500.00 KiB.

[7 Oct 2011 8:26] Johan Andersson
file is split using 'split' -  4 parts in total

Attachment: xab (application/octet-stream, text), 500.00 KiB.

[7 Oct 2011 8:27] Johan Andersson
file is split using 'split' -  4 parts in total  - original filename: ndb_error_report_20111005013312.tar.bz2

Attachment: xad (application/octet-stream, text), 216.64 KiB.

[7 Oct 2011 11:00] Jonas Oreland
Hi Johan,

It's LCP related.
And (current) LCP is as everyone knows fubar.
But your case looks like a corner case.

Any attaching a quick and dirty maybe fix
(i didn't manage to reproduce your problem)

/Jonas
[7 Oct 2011 11:01] Jonas Oreland
maybe fix

Attachment: johan.patch (text/x-patch), 430 bytes.

[7 Oct 2011 11:01] Jonas Oreland
btw, was the crash reproducible ?
[7 Oct 2011 12:07] Johan Andersson
Thanks for the incredibly fast turnaround.

Yes, reproducable every time.
Does the fix cover both the DBDIH and the LCP problem?
[7 Oct 2011 12:14] Jonas Oreland
1) hold on with the blessing until we see if it fixes problem
2) pretty sure the LCP is caused by first problem (although not 100%)

/Jonas
[9 Nov 2011 11:08] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".