Bug #16738 DD: Cluster does not handle running out of space in disk data file
Submitted: 24 Jan 2006 1:46 Modified: 26 Jan 2006 0:45
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.6-alpha OS:Linux (Linux)
Assigned to: Jonas Oreland

[24 Jan 2006 1:46] Jonathan Miller
Description:
Created small disk data file of 5 MB using the following:

$sth = $dbhM->prepare("CREATE LOGFILE GROUP TPCB_LOG
                           ADD UNDOFILE './tpcb_log/undofile.dat'
                           INITIAL_SIZE 150M
                           UNDO_BUFFER_SIZE = 1M
                           ENGINE=NDB;")
      or die "Prepare CREATE LOGFILE GROUP error: ", $dbhM->errstr;
    $sth->execute()
      or die "CREATE LOGFILE GROUP error: ", $sth->errstr;
    $sth->finish();

    $sth = $dbhM->prepare("ALTER LOGFILE GROUP TPCB_LOG
                           ADD UNDOFILE './tpcb_log/undofile2.dat'
                           INITIAL_SIZE 150M
                           ENGINE=NDB;")
      or die "Prepare ALTER LOGFILE GROUP error: ", $dbhM->errstr;
    $sth->execute()
      or die "ALTER LOGFILE GROUP error: ", $sth->errstr;
    $sth->finish();

    $sth = $dbhM->prepare("CREATE TABLESPACE TPCB_TS
                           ADD DATAFILE './tpcb_ts/datafile.dat'
                           USE LOGFILE GROUP TPCB_LOG
                           INITIAL_SIZE 5M
                           ENGINE=NDB;")
      or die "Prepare CREATE TABLESPACE error: ", $dbhM->errstr;

Started insert of 10000000 rows of data into accounts table. At row 39,600 ndb data node aborted with:

Time: Tuesday 24 January 2006 - 01:18:48
Status: Ndbd file system error, restart node initial
Message: Read underflow (Ndbd file system inconsistency error, please report a bug)
Error: 2816
Error data: PGMAN: File system read failed. OS errno: 1000
Error object: PGMAN (Line: 1814) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 7155
Trace: /space/run/ndb_4_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

--------------- Signal ----------------
r.bn: 261 "PGMAN", r.proc: 4, r.sigId: 8496200 gsn: 263 "FSREADREF" prio: 1
s.bn: 253 "NDBFS", s.proc: 4, s.sigId: 8496199 length: 4 trace: 1 #sec: 0 fragInf: 0
 UserPointer: 129
 ErrorCode: 2816, Read underflow
 OS ErrorCode: 1000
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 4, r.sigId: 8496199 gsn: 164 "CONTINUEB" prio: 1
s.bn: 253 "NDBFS", s.proc: 4, s.sigId: 8496193 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel again with no delay
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 4, r.sigId: 8496198 gsn: 267 "FSSYNCREQ" prio: 0
s.bn: 247 "DBLQH", s.proc: 4, s.sigId: 8496194 length: 3 trace: 2 #sec: 0 fragInf: 0

How to repeat:
Edit load_tpcb.pl and change the  INITIAL_SIZE to 5M and change account to == 10000000 and start perl script

Suggested fix:
The cluster should notice that it is almost out of space leaving some room in the file. It should issue warnings/errors and not allow file to fill.
[24 Jan 2006 9:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/1553
[25 Jan 2006 10:11] Jonas Oreland
pushed fix into 5.1.6
[26 Jan 2006 0:45] Mike Hillyer
Noted in 5.1.6 changelog:

      <listitem>
        <para>
          NDB Cluster returned wrong error when tablespace on disk
          was full. (Bug #16738)
        </para>
      </listitem>