MySQL Bugs: #49737: failed ndbrequire in restore.cpp during node restart

Bug #49737	failed ndbrequire in restore.cpp during node restart
Submitted:	16 Dec 2009 11:52	Modified:	18 Jan 2010 14:46
Reporter:	Gustaf Thorslund	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.1-telco-6.3	OS:	Linux (SUSE 10 SP2, x86_64)
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	6.3.27a

Description:
This happens sometimes when restarting a node. Further attempts to restart it also fails.

Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: restore.cpp
Error object: RESTORE (Line: 490) 0x0000000a
Program: ndbd
Pid: 28138
Trace: /mysql/ndbd/data/ndb_3_trace.log.3
Version: mysql-5.1.37 ndb-6.3.27a-GA

from restore.cpp

Restore::restore_next(Signal* signal, FilePtr file_ptr)
{
.
.
.
     if(4 * len > left)
     {
       /**
        * Not enought bytes to read "record"
        */
       ndbout_c("records: %d len: %x left: %d",
	       status & File::READING_RECORDS, 4*len, left);

       if (unlikely((status & File:: FILE_THREAD_RUNNING) == 0))
       {
	ndbrequire(false); // line 490 in 6.3.27a
       }
       len= 0;
       break;
     }

How to repeat:
stop node 1
stop node 2
start node 2
start node 1

But it doesn't happen always, not even often.

Got logs.

Suggested fix:
A more verbose error message might be a start. Doesn't appear to be a temporary error.

Reproduced this problem.  Steps to reproduce:

1. Load a cluster with over 100M of data
2. Go into ndb_2_fs/LCP/x/ (where x is the last LCP)
3. shell> truncate --size 10000000 T2F0.Data (assuming that is the table)
4. Start cluster

*bang*

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97233

3080 Jonas Oreland	2010-01-18
      ndb - bug#49737 - fix correct error message if encountering truncated LCP file

Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20100118111104-1za86tch6mtilp2j) (version source revid:jonas@mysql.com-20100118111104-1za86tch6mtilp2j) (merge vers: 5.1.41-ndb-6.3.31) (pib:16)

Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20100118111217-x0zg2b6o9j8i19sy) (version source revid:jonas@mysql.com-20100118111217-x0zg2b6o9j8i19sy) (merge vers: 5.1.41-ndb-7.0.11) (pib:16)

Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20100118112129-nm7iovqd6l6rhngh) (version source revid:jonas@mysql.com-20100118112129-nm7iovqd6l6rhngh) (merge vers: 5.1.41-ndb-7.1.0) (pib:16)

pushed: Note: only error message improved
(as no suggestion *how* error condition came to be has been supplied in
 bug report)

No user changes to document. Closed without further action.