Bug #48037 ndbmtd cluster crash during insert of data (dbtup)
Submitted: 14 Oct 2009 10:09 Modified: 20 Oct 2009 14:34
Reporter: Johan Andersson Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: cluster crash, dbtup

[14 Oct 2009 10:09] Johan Andersson
Description:
Loading a mysqldump file into cluster (approx 3.1GB of data) crashes the cluster:

ERROR 1114 (HY000): The table 'XYZ' is full
ERROR 1114 (HY000): The table 'XYZ' is full
ERROR 1297 (HY000): Got temporary error 286 'Node failure caused abort of transaction' from NDBCLUSTER
ERROR 1114 (HY000): The table 'XYZ' is full
ERROR 1114 (HY000): The table 'XYZ' is full
ERROR 1114 (HY000): The table 'XYZ' is full
ERROR 1114 (HY000): The table 'XYZ' is full
ERROR 1297 (HY000): Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

[TCP DEFAULT]
SendBufferMemory=2M
ReceiveBufferMemory=2M

[NDB_MGMD DEFAULT]
PortNumber=1186
Datadir=/data1/mysqlcluster/

[NDB_MGMD]
Id=1
Hostname=C
LogDestination=FILE:filename=ndb__cluster.log,maxsize=10000000,maxfiles=6
ArbitrationRank=1

[NDB_MGMD]
Id=2
Hostname=D
LogDestination=FILE:filename=ndb__cluster.log,maxsize=10000000,maxfiles=6
ArbitrationRank=1

[NDBD DEFAULT]
NoOfReplicas=2
Datadir=/data1/mysqlcluster/
FileSystemPathDD=/data1/mysqlcluster/
DataMemory=1024M
IndexMemory=128M
LockPagesInMainMemory=1

MaxNoOfConcurrentOperations=100000

StringMemory=25
MaxNoOfTables=4096
MaxNoOfOrderedIndexes=10000
MaxNoOfUniqueHashIndexes=2500
MaxNoOfAttributes=120000
DiskCheckpointSpeedInRestart=100M
FragmentLogFileSize=256M
InitFragmentLogFiles=FULL
NoOfFragmentLogFiles=12
RedoBuffer=32M

TimeBetweenLocalCheckpoints=20
TimeBetweenGlobalCheckpoints=1000
TimeBetweenEpochs=100

MemReportFrequency=30
BackupReportFrequency=10

### Params for setting logging 
LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15

### Params for increasing Disk throughput 
BackupMaxWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M

TimeBetweenWatchdogCheckInitial=60000

MaxNoOfExecutionThreads=8
LongMessageBuffer=32M

BatchSizePerLocalScan=512
[NDBD]
Id=3
Hostname=A

[NDBD]
Id=4
Hostname=B

How to repeat:
Load the dumpfile (see Private section) for location of it.

Two node cluster with:

MaxNoOfExecutionThreads=8
DataMemory=1024M
IndexMemory=128M

Should be enough, probably even smaller works too..
[19 Oct 2009 11:42] Jonas Oreland
reproduced :-(
[20 Oct 2009 12:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/87458

3021 Jonas Oreland	2009-10-20
      ndb - bug#48037
        Fix various abort cases with indexes that could in worst case 
          cause node-failure
[20 Oct 2009 12:33] Jonas Oreland
pushed to 6.2.19, 6.3.28 and 7.0.9
[20 Oct 2009 14:34] Jon Stephens
Documented bug fix in the NDB-6.2.19, 6.3.28, and 7.0.9 changelogs, as follows:

        In certain cases, performing very large inserts on NDB tables caused 
        the memory allocations for ordered unique indexes to be exceeded. 
        This could cause aborted transactions and possibly lead to data
        node failures.

        See also BUG#48113.

Closed.
[28 Oct 2009 9:25] Johan Andersson
Verified - can load data now without crash!