Bug #57164 INVALID SUB_GCP_COMPLETE_REP during restart after adding nodes
Submitted: 1 Oct 2010 12:41 Modified: 4 Oct 2010 15:26
Reporter: Thomas Nielsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.1.5, 7.1.8 OS:Linux (rhel5, sles10, sles11)
Assigned to: Jonas Oreland CPU Architecture:Any

[1 Oct 2010 12:41] Thomas Nielsen
Description:
mysqld stops with INVALID SUB_GCP_COMPLETE_REP during restart after adding nodes.

How to repeat:
Get it occationally in automated testing of add node functionallity with 7.1.5 and also 7.1.8. Hard to reproduce, but the basic steps are:

- create and start a cluster (1 mgmd, 2 ndbd,
- add two nodes to cfg
- restart mgmd and original ndbds
- restart mysqld => fails with INVALID SUB_GCP_COMPLETE_REP

See attached mysqld.err file
[4 Oct 2010 6:09] Jonas Oreland
Log analysis:
1) cluster-log is missing (ndb_error_report is preferred...)
2) first .err file doesnt say much...looks like it crashing during start
3) second .err file, contains the following
- create ode group
- create node group
- drop node group
- create node group
- crash
  i.e "How to repeat" is missing information
[4 Oct 2010 6:18] Jonas Oreland
config.ini and my.cnf would also be good
[4 Oct 2010 6:25] Thomas Nielsen
Beat me to updating the report - the second log is indeed doing create nodegroup, drop nodegroup, create nodegroup to crash. The nodegroups are created as

create nodegroup 5,6   (=> nodegorup 1)
create nodegroup 7,8   (=> nodegorup 2)
drop nodegroup 2
drop nodegroup 1
create nodegroup 6,7

Note that the second nodegroup is overlapping between the original two nodegroups for the new nodes.
[4 Oct 2010 8:54] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.47-ndb-7.0.20 (revid:jonas@mysql.com-20101004085252-kyd65j2xhnawal0o) (version source revid:jonas@mysql.com-20101004085252-kyd65j2xhnawal0o) (merge vers: 5.1.47-ndb-7.0.20) (pib:21)
[4 Oct 2010 8:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/119794

3823 Jonas Oreland	2010-10-04
      ndb - bug#57164 - serialize first SUB_START wrt CREATE/DROP nodegroup by wrapping first SUB_START in schema-transaction
[4 Oct 2010 9:00] Jonas Oreland
pushed to 7.0.20 and 7.1.9

When an mysqld (ndbapi) starts and starts subscribing to replication events 
  (event-api) it will save no of nodegroups (in fact it will save no of
   buckets), which is used internally in event-api.
  If a create/drop nodegroup was executing simultaneous, this count
  could get incorrect.

  The patch makes wraps the first SUB_START_REQ with a schema-transaction,
    which will make sure that no create/drop nodegroup is running.
[4 Oct 2010 15:26] Jon Stephens
Documented bugfix in the NDB-7.0.20 and 7.1.9 changelogs as follows:

        Successive CREATE NODEGROUP and DROP NODEGROUP commands could
        cause mysqld processes to crash.

Closed.