Bug #49737 failed ndbrequire in restore.cpp during node restart
Submitted: 16 Dec 2009 11:52 Modified: 18 Jan 2010 14:46
Reporter: Gustaf Thorslund Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1-telco-6.3 OS:Linux (SUSE 10 SP2, x86_64)
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: 6.3.27a

[16 Dec 2009 11:52] Gustaf Thorslund
Description:
This happens sometimes when restarting a node. Further attempts to restart it also fails.

Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: restore.cpp
Error object: RESTORE (Line: 490) 0x0000000a
Program: ndbd
Pid: 28138
Trace: /mysql/ndbd/data/ndb_3_trace.log.3
Version: mysql-5.1.37 ndb-6.3.27a-GA

from restore.cpp

Restore::restore_next(Signal* signal, FilePtr file_ptr)
{
.
.
.
     if(4 * len > left)
     {
       /**
        * Not enought bytes to read "record"
        */
       ndbout_c("records: %d len: %x left: %d",
	       status & File::READING_RECORDS, 4*len, left);

       if (unlikely((status & File:: FILE_THREAD_RUNNING) == 0))
       {
	ndbrequire(false); // line 490 in 6.3.27a
       }
       len= 0;
       break;
     }

How to repeat:
stop node 1
stop node 2
start node 2
start node 1

But it doesn't happen always, not even often.

Got logs.

Suggested fix:
A more verbose error message might be a start. Doesn't appear to be a temporary error.
[17 Dec 2009 21:27] Andrew Hutchings
Reproduced this problem.  Steps to reproduce:

1. Load a cluster with over 100M of data
2. Go into ndb_2_fs/LCP/x/ (where x is the last LCP)
3. shell> truncate --size 10000000 T2F0.Data (assuming that is the table)
4. Start cluster

*bang*
[18 Jan 2010 11:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97233

3080 Jonas Oreland	2010-01-18
      ndb - bug#49737 - fix correct error message if encountering truncated LCP file
[18 Jan 2010 11:23] Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20100118111104-1za86tch6mtilp2j) (version source revid:jonas@mysql.com-20100118111104-1za86tch6mtilp2j) (merge vers: 5.1.41-ndb-6.3.31) (pib:16)
[18 Jan 2010 11:23] Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20100118111217-x0zg2b6o9j8i19sy) (version source revid:jonas@mysql.com-20100118111217-x0zg2b6o9j8i19sy) (merge vers: 5.1.41-ndb-7.0.11) (pib:16)
[18 Jan 2010 11:24] Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20100118112129-nm7iovqd6l6rhngh) (version source revid:jonas@mysql.com-20100118112129-nm7iovqd6l6rhngh) (merge vers: 5.1.41-ndb-7.1.0) (pib:16)
[18 Jan 2010 11:25] Jonas Oreland
pushed: Note: only error message improved
(as no suggestion *how* error condition came to be has been supplied in
 bug report)
[18 Jan 2010 14:46] Jon Stephens
No user changes to document. Closed without further action.