Bug #21892 Cluster restart fails with disk tables
Submitted: 29 Aug 2006 5:40 Modified: 30 Aug 2006 0:02
Reporter: Jason Downing Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.11 OS:Linux (Debian 2.6.17 (32bit))
Assigned to: Assigned Account CPU Architecture:Any
Tags: failed ndb require, failed restart

[29 Aug 2006 5:40] Jason Downing
Description:
Cluster starts up fine with an --initial. I can load data in from a backup file without problem. If the datafile has no disk based tables and I have not created any tablespace or datafiles on disk I can shutdown the cluster and restart without problem.

If I have disk data files and my backup file declares tables that use disk data storage, I can load my data into the cluster without problem, but if I shutdown the cluster it will not restart. This is what it says on ndb_mgm when I attempt a restart:

Node 3: Forced node shutdown completed, restarting. Occured during startphase 4. Initiated by signal 0. Caused by error 2301: 'Assertion(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 2: Forced node shutdown completed, restarting. Occured during startphase 4. Initiated by signal 0. Caused by error 2301: 'Assertion(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

I can restart with --initial without problem.

Log files/tracelogs are attached.

Here is my config:

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=250M
IndexMemory=50M
MaxNoOfAttributes=3000
MaxNoOfConcurrentOperations=1000000
StartFailureTimeout=1000000
StartPartialTimeout=200000
LogLevelStartup=15
LogLevelShutdown=15
LogLevelStatistic=15
LogLevelCheckpoint=15
LogLevelNodeRestart=15
LogLevelConnection=15
LogLevelError=15
LogLevelInfo=15
StopOnError=N

[NDB_MGMD]
hostname=192.168.2.12
datadir=/var/lib/mysql-cluster
Id=1

[NDBD]
hostname=192.168.2.18
datadir=/usr/local/mysql/data
Id=2

[NDBD]
hostname=192.168.2.19
datadir=/usr/local/mysql/data
Id=3

[MYSQLD]
hostname=192.168.2.13
Id=4

These were the commands I used to create the tablespace:

CREATE LOGFILE GROUP lg_1
    ADD UNDOFILE 'undo_1.dat'
    INITIAL_SIZE 10M
    UNDO_BUFFER_SIZE 2M
    ENGINE NDB;

CREATE TABLESPACE ts_1
    ADD DATAFILE 'data_1.dat'
    USE LOGFILE GROUP lg_1
    INITIAL_SIZE 1000M
    ENGINE NDB;

ALTER TABLESPACE ts_1
    ADD DATAFILE 'data_2.dat'
    INITIAL_SIZE 1000M
    ENGINE NDB;

I also filed bug 21172, which was a similar problem except that it occurred before any data was added, and only when there were disk data files present. Jonas supplied a patch for 21172, which I applied and tested. He also said a 2.6 kernel would fix the problem as well. I have verified that the 2.6 kernel fixes that problem, but it definitely does not fix the current problem. I also applied the patch for 21172 and tested it with a 2.6 kernel, and that did not fix this problem either.

The error log on a data node says this:

Error object: ../../../../../storage/ndb/src/kernel/vm/ArrayPool.hpp line: 379 (block: DBLQH)

So perhaps this is where the problem is. I have tried a few things like removing null values from a blob column and removing the blob column altogether. The results have been inconsistent, once it restarted without a blob column, and and once it failed to restart without a blob column. If the problem is in my data file I'm not sure where.

Thanks, Jason

How to repeat:
See below
[29 Aug 2006 5:46] Jason Downing
Tracelogs/errorlogs

Attachment: 29-8-06 Bug.zip (application/zip, text), 67.79 KiB.

[29 Aug 2006 6:44] Jonas Oreland
Hi,

this is a duplicate of http://bugs.mysql.com/bug.php?id=21271
which is fixed, and fix will hopefully make 5.1.12
[30 Aug 2006 0:02] Jason Downing
Ok thanks for letting me know. You can consider this bug closed, I'll try the new version or maybe the patch and refile another bug if I have further problems.

Thanks, Jason