Bug #13266 Online adding of MySQLD causes cluster to crash
Submitted: 16 Sep 2005 14:54 Modified: 13 Jun 2006 23:30
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1 OS:Linux (Linux)
Assigned to: Tomas Ulin CPU Architecture:Any

[16 Sep 2005 14:54] Jonathan Miller
Description:
Had added 3 new mysqld nodes to the config.ini and restarted the ndb_mgmd.
2 out of 6 data nodes had been restarted when I started one of the mysqld's. The cluster crashed after the mysqld connected.

Date/Time: Friday 16 September 2005 - 16:22:35
Type of error: error
Message: Internal program error (failed ndbrequire)
Fault ID: 2341
Problem data: SimulatedBlock.cpp
Object of reference: SUMA (Line: 356) 0x0000000a
ProgramName: /home/ndbdev/jmiller/builds/libexec/ndbd
ProcessID: 3989
TraceFile: /space/run/ndb_7_trace.log.1
Version 5.1.2 (a_drop5p4)
***EOM***

--------------- Signal ----------------
r.bn: 257 "SUMA", r.proc: 7, r.sigId: 107695226 gsn: 593 "SUB_GCP_COMPLETE_REP" prio: 1
s.bn: 246 "DBDIH", s.proc: 7, s.sigId: 107695223 length: 3 trace: 2 #sec: 0 fragInf: 0
 gci: f813
 H'0000f813 H'00f60007 H'00000001

2005-09-16 16:22:31 [MgmSrvr] INFO     -- Mgmt server state: nodeid 27 reserved for ip 10.100.1.95, m_reserved_nodes 0000000008000002.
2005-09-16 16:22:31 [MgmSrvr] INFO     -- Node 4: Node 27 Connected
2005-09-16 16:22:31 [MgmSrvr] INFO     -- Node 4: Node 27: API version 5.1.2
2005-09-16 16:22:36 [MgmSrvr] ALERT    -- Node 4: Node 7 Disconnected
2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 4: Communication to Node 7 closed2005-09-16 16:22:36 [MgmSrvr] ALERT    -- Node 5: Node 7 Disconnected
2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 1: Node 7 Connected
2005-09-16 16:22:36 [MgmSrvr] ALERT    -- Node 4: Node 8 Disconnected
2005-09-16 16:22:36 [MgmSrvr] ALERT    -- Node 5: Node 8 Disconnected
2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 1: Node 8 Connected
2005-09-16 16:22:36 [MgmSrvr] ALERT    -- Node 4: Node 9 Disconnected
2005-09-16 16:22:36 [MgmSrvr] ALERT    -- Node 4: Node 6 Disconnected
2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 4: Communication to Node 6 closed2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 4: Communication to Node 7 closed2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 4: Communication to Node 8 closed2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 4: Communication to Node 9 closed2005-09-16 16:22:36 [MgmSrvr] ALERT    -- Node 5: Node 9 Disconnected
2005-09-16 16:22:36 [MgmSrvr] ALERT    -- Node 5: Node 6 Disconnected
2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 5: Communication to Node 6 closed2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 5: Communication to Node 7 closed2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 5: Communication to Node 8 closed2005-09-16 16:22:36 [MgmSrvr] INFO     -- Node 5: Communication to Node 9 closed

How to repeat:
see above

Suggested fix:
cluster does not crash :-)
[9 Jun 2006 8:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/7431
[12 Jun 2006 11:41] Tomas Ulin
pushed to 5.1.12
[13 Jun 2006 23:30] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 5.1.12 changelog. Closed.
[7 Dec 2006 12:46] bengt nilsson
Hi,

I'm not sure if the fix was supposed to solve the 'Invalid SUB_GCP_COMPLETE_REP' I describe below, but anyway, it doesn't work in 5.1.12.

I have 12 nodes in total: 4 Mgmnt servers, 4 data nodes and 4 sql nodes. 

The data nodes and the sql nodes resides on the same 4 physical servers. These servers are using Linux HA in 'two pairs'. The data nodes resides outside the Linux Ha service but the sql nodes is included. That means that there are normally just two sql nodes connected. 

If one of the data/sql-nodes lose its LAN connection both the data and the sql node will disappear from the cluster. When the LAN connection comes back the data joins the cluster again nicely but the sql node will not. It start/stops continiously with the error ' Invalid SUB_GCP_COMPLETE_REP'. 

From the Mgmnt error log (mgmnt nodes=1-4, data=5-8, sql=9-12), node 9 is trying to connect: 

2006-12-06 14:16:49 [MgmSrvr] INFO -- Mgmt server state: nodeid 9 reserved for ip 172.28.246.11, m_reserved_nodes 0000000000000202. 
2006-12-06 14:16:49 [MgmSrvr] INFO -- Node 9: mysqld --server-id=1 
2006-12-06 14:16:49 [MgmSrvr] INFO -- Node 7: Node 9 Connected 
2006-12-06 14:16:49 [MgmSrvr] INFO -- Node 8: Node 9 Connected 
2006-12-06 14:16:49 [MgmSrvr] INFO -- Node 6: Node 9 Connected 
2006-12-06 14:16:49 [MgmSrvr] INFO -- Node 5: Node 9 Connected 
2006-12-06 14:16:49 [MgmSrvr] INFO -- Node 5: Node 9: API version 5.1.12 
2006-12-06 14:16:49 [MgmSrvr] INFO -- Node 6: Node 9: API version 5.1.12 
2006-12-06 14:16:49 [MgmSrvr] INFO -- Node 7: Node 9: API version 5.1.12 
2006-12-06 14:16:49 [MgmSrvr] INFO -- Node 8: Node 9: API version 5.1.12 
2006-12-06 14:16:50 [MgmSrvr] ALERT -- Node 5: Node 9 Disconnected 
2006-12-06 14:16:50 [MgmSrvr] INFO -- Node 5: Communication to Node 9 closed2006-12-06 14:16:50 [MgmSrvr] ALERT -- Node 6: Node 9 Disconnected 
2006-12-06 14:16:50 [MgmSrvr] INFO -- Node 6: Communication to Node 9 closed2006-12-06 14:16:50 [MgmSrvr] ALERT -- Node 7: Node 9 Disconnected 
2006-12-06 14:16:50 [MgmSrvr] INFO -- Node 7: Communication to Node 9 closed2006-12-06 14:16:50 [MgmSrvr] ALERT -- Node 8: Node 9 Disconnected 
2006-12-06 14:16:50 [MgmSrvr] INFO -- Node 8: Communication to Node 9 closed2006-12-06 14:16:50 [MgmSrvr] ALERT -- Node 8: Node 9 Disconnected 
2006-12-06 14:16:50 [MgmSrvr] ALERT -- Node 7: Node 9 Disconnected 
2006-12-06 14:16:50 [MgmSrvr] ALERT -- Node 6: Node 9 Disconnected 
2006-12-06 14:16:50 [MgmSrvr] INFO -- Mgmt server state: nodeid 9 freed, m_reserved_nodes 0000000000000002. 
2006-12-06 14:16:50 [MgmSrvr] INFO -- Mgmt server state: nodeid 9 reserved for ip 172.28.246.11, m_reserved_nodes 0000000000000202. 
2006-12-06 14:16:50 [MgmSrvr] INFO -- Node 9: mysqld --server-id=1 
2006-12-06 14:16:53 [MgmSrvr] INFO -- Node 8: Communication to Node 9 opened2006-12-06 14:16:53 [MgmSrvr] INFO -- Node 8: Node 9 Connected 
2006-12-06 14:16:53 [MgmSrvr] INFO -- Node 8: Node 9: API version 5.1.12 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 5: Communication to Node 9 opened2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 5: Node 9 Connected 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 5: Node 9: API version 5.1.12 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 7: Communication to Node 9 opened2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 6: Communication to Node 9 opened2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 7: Node 9 Connected 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 6: Node 9 Connected 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 6: Node 9: API version 5.1.12 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 7: Node 9: API version 5.1.12 
2006-12-06 14:16:54 [MgmSrvr] ALERT -- Node 5: Node 9 Disconnected 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 5: Communication to Node 9 closed2006-12-06 14:16:54 [MgmSrvr] ALERT -- Node 6: Node 9 Disconnected 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 6: Communication to Node 9 closed2006-12-06 14:16:54 [MgmSrvr] ALERT -- Node 7: Node 9 Disconnected 
2006-12-06 14:16:54 [MgmSrvr] ALERT -- Node 7: Node 9 Disconnected 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 7: Communication to Node 9 closed2006-12-06 14:16:54 [MgmSrvr] ALERT -- Node 8: Node 9 Disconnected 
2006-12-06 14:16:54 [MgmSrvr] INFO -- Node 8: Communication to Node 9 closed2006-12-06 14:16:54 [MgmSrvr] ALERT -- Node 6: Node 9 Disconnected 
2006-12-06 14:16:55 [MgmSrvr] INFO -- Mgmt server state: nodeid 9 freed, m_reserved_nodes 0000000000000002. 
2006-12-06 14:16:55 [MgmSrvr] INFO -- Mgmt server state: nodeid 9 reserved for ip 172.28.246.11, m_reserved_nodes 0000000000000202. 
2006-12-06 14:16:55 [MgmSrvr] INFO -- Node 9: mysqld --server-id=1 
2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 5: Communication to Node 9 opened2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 5: Node 9 Connected 
2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 5: Node 9: API version 5.1.12 
2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 6: Communication to Node 9 opened2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 7: Communication to Node 9 opened2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 6: Node 9 Connected 
2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 6: Node 9: API version 5.1.12 
2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 7: Node 9 Connected 
2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 7: Node 9: API version 5.1.12 
2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 8: Communication to Node 9 opened2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 8: Node 9 Connected 
2006-12-06 14:16:58 [MgmSrvr] INFO -- Node 8: Node 9: API version 5.1.12 
2006-12-06 14:16:59 [MgmSrvr] ALERT -- Node 5: Node 9 Disconnected 
2006-12-06 14:16:59 [MgmSrvr] INFO -- Node 5: Communication to Node 9 closed2006-12-06 14:16:59 [MgmSrvr] ALERT -- Node 6: Node 9 Disconnected 
2006-12-06 14:16:59 [MgmSrvr] INFO -- Node 6: Communication to Node 9 closed2006-12-06 14:16:59 [MgmSrvr] ALERT -- Node 7: Node 9 Disconnected 
2006-12-06 14:16:59 [MgmSrvr] INFO -- Node 7: Communication to Node 9 closed2006-12-06 14:16:59 [MgmSrvr] ALERT -- Node 8: Node 9 Disconnected 
2006-12-06 14:16:59 [MgmSrvr] INFO -- Node 8: Communication to Node 9 closed2006-12-06 14:16:59 [MgmSrvr] ALERT -- Node 6: Node 9 Disconnected 
2006-12-06 14:16:59 [MgmSrvr] ALERT -- Node 8: Node 9 Disconnected 
2006-12-06 14:16:59 [MgmSrvr] INFO -- Mgmt server state: nodeid 9 freed, m_reserved_nodes 0000000000000002. 
2006-12-06 14:16:59 [MgmSrvr] INFO -- Mgmt server state: nodeid 9 reserved for ip 172.28.246.11, m_reserved_nodes 0000000000000202. 

and so on 

/Regards Bengt