Bug #34207 Cluster will not restart, phase 3 takes 20 minutes then crash
Submitted: 31 Jan 2008 20:45 Modified: 13 Mar 2009 8:45
Reporter: Jeff Wang Email Updates:
Status: Closed Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.1.22 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[31 Jan 2008 20:45] Jeff Wang

I'm using version 5.1.22 with on disk data.  After inserting some disk data, and I performed a full shutdown and restart of cluster.  Phase 3 takes 20 minutes and the nodes crash in phase 4.  Looking at the ndb logs, I see:

2008-01-31 12:29:32 [ndbd] WARNING  -- Ndb kernel is stuck in: Job Handling
2008-01-31 12:29:32 [ndbd] INFO     -- Watchdog: User time: 2129  System time: 3081
2008-01-31 12:29:55 [ndbd] INFO     -- You have found a bug! Failed op (INSERT) during REDO table: 1218 fragment: 0 err: 827
2008-01-31 12:29:55 [ndbd] INFO     -- DBLQH (Line: 15778) 0x0000000a

My setup is:

-2 node cluster
-1000 tables
-100,000 rows of 1KB on disk data

How to repeat:
not sure if this is generally reproducible or if it has something to do with my setup.
[31 Jan 2008 20:48] Jeff Wang
trace file

Attachment: trace.log (, text), 330.38 KiB.

[2 Feb 2008 9:26] Jonas Oreland
> perror --ndb 827
NDB error code 827: Out of memory in Ndb Kernel, table data (increase DataMemory): Permanent error: Insufficient space

What does your "all dump 1000" look like before restart?

[3 Feb 2008 0:11] Jeff Wang
I don't have the cluster up anymore as I completed nuked it and did a fresh setup with ndb --initial.  I have been able to do system and node restarts without any problems now.
[3 Feb 2008 21:57] Adam Dixon
Do you still have your cluster log file? ndb_1_cluster.log from your mgm node per chance? If so, please attach this.
[4 Feb 2008 21:30] Jeff Wang
ndb cluster mgm log

Attachment: ndb_1_cluster.log (, text), 4.00 KiB.

[4 Feb 2008 21:33] Jeff Wang
ndb_cluster mgm log (retrying)

Attachment: ndb_1_cluster.log (, text), 6.74 KiB.

[13 Mar 2009 8:45] Jonas Oreland
827 can happen during NR
close as not a bug