MySQL Bugs: #50696: Switching from ndbd to ndbmtd causes file system error

Bug #50696	Switching from ndbd to ndbmtd causes file system error
Submitted:	28 Jan 2010 18:41	Modified:	26 Dec 2010 15:50
Reporter:	Matthew Boehm	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-7.0	OS:	Linux (RHEL5.4)
Assigned to:		CPU Architecture:	Any
Tags:	cluster, crash, file system, mysql-5.1.39 ndb-7.0.9, ndbd, ndbmtd

Description:
According to the docs, it should be a seamless transition to switch from single-threaded ndbd to multi-threaded. Just stop one node and startup the ndbmtd version. Alas, I’ve had no success with this.

Timeline:
* Initial start a fresh 2 node cluster, 1GB Index, 12GB Data.
* Loaded up about 10GB of data and 750MB of index.
* Ran some benchmarks (dbt2, sysbench, internal). All passed.
* ndb_mgm> shutdown
* restart manager with no config changes.
* start ndbmtd on node1 and node2. Get errors below:

Node 2: Forced node shutdown completed, restarting. Occured during startphase 4. Caused by error 2352: 'Invalid LCP(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.

Node 1: Forced node shutdown completed, restarting. Occured during startphase 4. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

From ndb_2_out.log:
jbalock thr: 0 waiting for lock, contentions: 9 spins: 48797
RESTORE table: 73 750530 rows applied
RESTORE table: 74 749470 rows applied
2010-01-28 12:20:00 [ndbd] INFO     -- Error 625 during restore of  0/T74F1
2010-01-28 12:20:00 [ndbd] INFO     -- RESTORE (Line: 1194) 0x00000008
2010-01-28 12:20:00 [ndbd] INFO     -- Error handler startup restarting system
2010-01-28 12:20:00 [ndbd] INFO     -- Error handler shutdown completed - exiting
2010-01-28 12:20:00 [ndbd] INFO     -- Angel received ndbd startup failure count 3.
2010-01-28 12:20:00 [ndbd] ALERT    -- Ndbd has failed 3 consecutive startups. Not restarting
2010-01-28 12:20:00 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Occured during startphase 4. Caused by error 2352: 'Invalid LCP(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.

From ndb_1_out.log:
RESTORE table: 61 2497210 rows applied
RESTORE table: 70 224477 rows applied
RESTORE table: 70 225523 rows applied
RESTORE table: 79 23 rows applied
RESTORE table: 79 27 rows applied
RESTORE table: 65 7123592 rows applied
2010-01-28 12:20:01 [ndbd] INFO     -- Node 2 disconnected
2010-01-28 12:20:01 [ndbd] INFO     -- QMGR (Line: 2971) 0x00000008
2010-01-28 12:20:01 [ndbd] INFO     -- Error handler startup restarting system
2010-01-28 12:20:01 [ndbd] INFO     -- Error handler shutdown completed - exiting
2010-01-28 12:20:01 [ndbd] INFO     -- Angel received ndbd startup failure count 3.
2010-01-28 12:20:01 [ndbd] ALERT    -- Ndbd has failed 3 consecutive startups. Not restarting
2010-01-28 12:20:01 [ndbd] ALERT    -- Node 1: Forced node shutdown completed. Occured during startphase 4. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

How to repeat:
See "timeline" section above.

Suggested fix:
None at this time. Will attempt to repeat timeline but starting with initial'd ndbmtd instead and see if that works.

Uploaded bug-data-50696.tar.bz2 to ftp site.

with no changes made, i attempted to start cluster back up with ndbd (single-threaded) and that worked. So there's got to be something in the MT version that it's not finding in the file system.

Error 625 is:

NDB error code 625: Out of memory in Ndb Kernel, hash index part (increase IndexMemory): Permanent error: Insufficient space

Please increase this and it should start.

If this doesn't work please let us know

Doubled indexmemory to 2GB. Preformed rolling restart of non-threaded to ensure change took. then did rolling restart with ndbmtd and everything seems to be up and running!

Thanks. Sorry for the bogus bug.

marking not a bug.

Reopening bug because I just tried to go from ndbmtd back to ndbd with no config changes and got node#1 crash. Will attach ndb_report.

Did upgrade to 7.0.9

crash on node 1 going ndbmtd -> ndbd

Attachment: ndb_error_report_20100212170145.tar.bz2 (application/octet-stream, text), 130.42 KiB.

Is it a workraound to restart the data node (ndb[mt]d) with --initial?

No. It's not. I just tried it on node #1. Node #2 is still ndbmtd. Node#1 crashed 3 times. Will post traces in a minute.

Only solution at the moment is to completely shutdown the cluster and restart everything as --initial.

Current byte-offset of file-pointer is: 1566

Time: Wednesday 17 February 2010 - 09:09:05
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdict/Dbdict.cpp
Error object: DBDICT (Line: 3791) 0x00000008
Program: /usr/sbin/ndbd
Pid: 23668
Version: mysql-5.1.39 ndb-7.0.9b
Trace: /var/lib/mysql/cluster/ndb_1_trace.log.1
***EOM***

Time: Wednesday 17 February 2010 - 09:16:01
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdict/Dbdict.cpp
Error object: DBDICT (Line: 3791) 0x00000008
Program: /usr/sbin/ndbd
Pid: 23711
Version: mysql-5.1.39 ndb-7.0.9b
Trace: /var/lib/mysql/cluster/ndb_1_trace.log.2
***EOM***

Time: Wednesday 17 February 2010 - 09:23:00
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdict/Dbdict.cpp
Error object: DBDICT (Line: 3791) 0x00000008
Program: /usr/sbin/ndbd
Pid: 23754
Version: mysql-5.1.39 ndb-7.0.9b
Trace: /var/lib/mysql/cluster/ndb_1_trace.log.3
***EOM***

newest crash report on doing --initial restart on 1 node

Attachment: ndb_error_report_20100217100613.tar.bz2 (application/octet-stream, text), 168.42 KiB.

the latest error-tar-ball contains the error 904
which means
sh> perror --ndb 904
NDB error code 904: Out of fragment records (increase MaxNoOfOrderedIndexes): Permanent error: Insufficient space

the error is probably caused by:

- you created tables in the multi-threaded version
  this will create tables with *more* fragments than "normal" ndbd
  but config is also "altered" on the fly, so that more fragment records are
  allocated.

- then you start ndbd, where all the fragments exists on disk,
  but the config isn't altered on the fly

---

clearly not optimal behavior
but should be easy to avoid by
increasing as suggested.

So, according to above, if i create with ndbd, then switch to ndbmtd, then back, everything should be fine.

But if i create with ndbmtd, then switch to ndbd, I have to increase MaxNoOfOrderedIndexes before I can do that?

Is this caveat documented somewhere? The docs say 'seamless transition' between threaded and non-threaded. This isn't seamless.

Hi Matthew,

1) my guess is based on the fact that you created new tables while running
a ndbmtd-only cluster, prior to trying to switch back. Can you confirm that
this is indeed true ?

2) if this happens to be true, (i.e we have found cause of problem) then either
we consider how to change behavior or try to document it.
But neither of those are fruitful, until we do know the real cause.

3) regardless of cause, i'm sorry that you've been forced to face this problem.
but may I ask why you're changing back and forth, so maybe we can
create a set of tests that captures your use case.

/Jonas

1) yes. My original setup was created on ndbd and i tried to switch to ndbmtd. That didn't work so i --initial'd the entire cluster and started over with ndbmtd.

3) doing this for benchmarking really. i'm testing the usefulness and speed and overall benefit of the Dolphin Supersockets cards in conjunction with MT over T and two nodes vs four.

I have 8 test cases:
  twoNode/threaded/ethernet
  twoNode/non-t/ethernet
  twoNode/threaded/supersocket
  twoNode/non-t/supersocket

repeat above 4 but with 4 nodes instead of 2.

using a combination of dbt2, sysbench and a customer's home grown test from a crappy schema design.

it didn't seem 'fair' to the testing results if i did --initial for each run and reloaded the data each time, though because of this issue, i'm thinking that may be a better way to do it and simply note that in my final paper.