Bug #34698 Node restart in cluster with thousands of tables is very slow
Submitted: 20 Feb 2008 18:28 Modified: 24 Jun 23:46
Reporter: Jeff Wang
Status: Analyzing
Category:Server: Cluster Severity:S3 (Non-critical)
Version:mysql-5.1 OS:Any
Assigned to: Hartmut Holzgraefe Target Version:
Tags: 5.1.24
Triage: Triaged: D4 (Minor) / R6 (Needs Assessment) / E6 (Needs Assessment)

[20 Feb 2008 18:28] Jeff Wang
Description:
Hi,

I set up a fresh, 2 data node cluster. I've created about 3500 tables in the
system as a stress test.  I did not insert any data.  After doing this, I
tried to restart one of my nodes (#3) and it takes 10+ minutes just to shut
it down.  According to the NDB logs, the alive node (#2) it spends most of
it's time doing this:

Didnt find any LCP for node: 3 tab: 1014 frag: 1
startNextCopyFragment
prepare to handover bucket: 1
switchover complete bucket 1 state: 2handover
changing file from 557056 to 557056
checkTakeOverInMasterStartNodeFailure ffffff00
start_resend(1, empty bucket -> active
REMOVING lcp: 9 from table: 0 frag: 0 node: 3
REMOVING lcp: 9 from table: 0 frag: 1 node: 3
REMOVING lcp: 9 from table: 1 frag: 0 node: 3
REMOVING lcp: 9 from table: 1 frag: 1 node: 3

......
......

REMOVING lcp: 9 from table: 4793 frag: 0 node: 3
REMOVING lcp: 9 from table: 4793 frag: 1 node: 3

Restarting the node takes even longer at around 40+ minutes.  

It seems that just having a lot of tables in the system really deteriorates
node restart time.  I haven't even inserted any data.

How to repeat:
As described above.
[18 May 16:02] Jonathan Miller
Please try a latter version.
[19 Jun 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".