Bug #16772 Starting node joins cluster too early
Submitted: 25 Jan 2006 9:19 Modified: 27 Mar 2006 12:45
Reporter: Tomas Ulin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:4.1-> OS:
Assigned to: Jonas Oreland CPU Architecture:Any

[25 Jan 2006 9:19] Tomas Ulin
Description:
master node goes down
LCP takeover
node restarts and is allowed to rejoin too early, before node failure handling completes

2006-01-24 16:48:12 [MgmSrvr] INFO     -- Node 6: Table 84: TabCopyStatus: 0 TabUpdateStatus: 0 TabLcpStatus: 1
2006-01-24 16:48:12 [MgmSrvr] INFO     -- Node 6:  Fragment 0: noLcpReplicas==1  0(on 4)=185(Ongoing)
2006-01-24 16:48:12 [MgmSrvr] INFO     -- Node 6:  Fragment 1: noLcpReplicas==2  0(on 5)=185(Ongoing) 1(on 6)=185(Ongoing)
2006-01-24 16:48:12 [MgmSrvr] INFO     -- Node 6:  Fragment 2: noLcpReplicas==1  0(on 4)=185(Ongoing)
2006-01-24 16:48:12 [MgmSrvr] INFO     -- Node 6:  Fragment 3: noLcpReplicas==2  0(on 6)=185(Ongoing) 1(on 5)=185(Ongoing)
2006-01-24 16:48:36 [MgmSrvr] WARNING  -- Node 5: Failure handling of node 3 has not completed in 1 min. - state = 3
2006-01-24 16:48:37 [MgmSrvr] WARNING  -- Node 6: Failure handling of node 3 has not completed in 1 min. - state = 3
2006-01-24 16:48:37 [MgmSrvr] WARNING  -- Node 4: Failure handling of node 3 has not completed in 1 min. - state = 3
2006-01-24 16:48:49 [MgmSrvr] INFO     -- Mgmt server state: nodeid 3 reserved for ip 172.30.16.2, m_reserved_nodes 000000000000da8a.
2006-01-24 16:48:50 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2006-01-24 16:48:50 [MgmSrvr] INFO     -- Mgmt server state: nodeid 3 freed, m_reserved_nodes 000000000000da82.
2006-01-24 16:49:02 [MgmSrvr] INFO     -- Node 3: Start phase 1 completed 
2006-01-24 16:49:36 [MgmSrvr] WARNING  -- Node 5: Failure handling of node 3 has not completed in 2 min. - state = 3
2006-01-24 16:49:37 [MgmSrvr] WARNING  -- Node 6: Failure handling of node 3 has not completed in 2 min. - state = 3
2006-01-24 16:49:37 [MgmSrvr] WARNING  -- Node 4: Failure handling of node 3 has not completed in 2 min. - state = 3
2006-01-24 16:50:00 [MgmSrvr] INFO     -- Node 5: Communication to Node 3 opened
2006-01-24 16:50:00 [MgmSrvr] INFO     -- Node 5: Node 3 Connected
2006-01-24 16:50:00 [MgmSrvr] INFO     -- Node 3: Node 5 Connected
2006-01-24 16:50:00 [MgmSrvr] INFO     -- Node 6: Communication to Node 3 opened
2006-01-24 16:50:00 [MgmSrvr] INFO     -- Node 4: Communication to Node 3 opened
2006-01-24 16:50:00 [MgmSrvr] INFO     -- Node 6: Node 3 Connected
2006-01-24 16:50:00 [MgmSrvr] INFO     -- Node 3: Node 6 Connected
2006-01-24 16:50:01 [MgmSrvr] INFO     -- Node 4: Node 3 Connected
2006-01-24 16:50:01 [MgmSrvr] INFO     -- Node 3: Node 4 Connected
2006-01-24 16:51:20 [MgmSrvr] INFO     -- Node 6: Local checkpoint 187 started. Keep GCI = 88536 oldest restorable GCI = 88627

How to repeat:
see above

ater dump:

2006-01-25 09:11:45 [MgmSrvr] INFO     -- Node 4: Printing nodeStatus of all nodes
2006-01-25 09:11:45 [MgmSrvr] INFO     -- Node 4: Node = 3 has status = 5
2006-01-25 09:11:45 [MgmSrvr] INFO     -- Node 4: Node = 4 has status = 1
2006-01-25 09:11:45 [MgmSrvr] INFO     -- Node 4: Node = 5 has status = 1
2006-01-25 09:11:45 [MgmSrvr] INFO     -- Node 4: Node = 6 has status = 1

...

2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 1 (MGM) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 2 (MGM) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 3 (DB) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 4 (DB) does nothing
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 5 (DB) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 6 (DB) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 7 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 8 (API) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 9 (API) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 10 (API) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 11 (API) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 12 (API) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 13 (API) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 14 (API) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 15 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 16 (API) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 17 (API) is connected
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 18 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 19 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 20 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 21 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 22 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 23 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 24 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 25 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 26 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 27 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 28 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 29 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 30 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 31 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 32 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 33 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 34 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 35 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 36 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 37 (API) is trying to connect
2006-01-25 09:11:55 [MgmSrvr] INFO     -- Node 4: Connection to 38 (API) is trying to connect

....

2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 4: ctoStatus = 0

2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 4: Node 3: ZINIT(1)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 4: Node 4: ZRUNNING(3)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 4: Node 5: ZRUNNING(3)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 4: Node 6: ZRUNNING(3)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 5: creadyDistCom = 1, cpresident = 6

2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 5: cpresidentAlive = 1, cpresidentCand = 3

2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 5: ctoStatus = 0

2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 5: Node 3: ZINIT(1)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 5: Node 4: ZRUNNING(3)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 5: Node 5: ZRUNNING(3)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 5: Node 6: ZRUNNING(3)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 6: creadyDistCom = 1, cpresident = 6

2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 6: cpresidentAlive = 1, cpresidentCand = 3

2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 6: ctoStatus = 0

2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 6: Node 3: ZINIT(1)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 6: Node 4: ZRUNNING(3)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 6: Node 5: ZRUNNING(3)
2006-01-25 09:12:00 [MgmSrvr] INFO     -- Node 6: Node 6: ZRUNNING(3)
[17 Mar 2006 9:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/3917
[24 Mar 2006 12:31] Jonas Oreland
pushed into 5.1.8
[27 Mar 2006 10:12] Jonas Oreland
pushed into 5.0.20
[27 Mar 2006 12:04] Jonas Oreland
pushed into 4.1.19
[27 Mar 2006 12:45] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 4.1.19, 5.0.20, and 5.1.8 changelogs. Closed.