MySQL Bugs: #42973: race condition if allocating node ids from different ndb

Bug #42973	race condition if allocating node ids from different ndb_mgmd simultaniously
Submitted:	18 Feb 2009 14:15	Modified:	18 Feb 2009 20:04
Reporter:	Jonas Oreland	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	*	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
if using 2 (or more) ndb_mgmd, and starting several applications (or mysqld)
that has different order of ndb_mgmd in the connect-strings.
then, there is a race-condition so that 2 applications can get *same* node id.
when app later connects, it will not get fully connected.
with various weird error cases as a consequence of this.

How to repeat:
see above...
loop many times, race-condition is very small

Suggested fix:
store allocation inside kernel, including timeout (that is given in ndb_mgmd)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66768

2841 Jonas Oreland	2009-02-18
      ndb - bug#42973 - fix parallel nodeid allocations with multiple ndb_mgmd

Pushed into 5.1.32-ndb-6.2.17 (revid:jonas@mysql.com-20090218143240-wjprrmehfp18x33j) (version source revid:jonas@mysql.com-20090218142958-iwgv1qidu5ohf2b5) (merge vers: 5.1.32-ndb-6.2.17) (pib:6)

Pushed into 5.1.32-ndb-6.3.23 (revid:jonas@mysql.com-20090218143350-aidjdqu0m7yu3phl) (version source revid:jonas@mysql.com-20090218143350-aidjdqu0m7yu3phl) (merge vers: 5.1.32-ndb-6.3.23) (pib:6)

Pushed into 5.1.32-ndb-6.4.3 (revid:jonas@mysql.com-20090218150143-qqyitqae3639g9me) (version source revid:jonas@mysql.com-20090218150104-yli5h2ldr1g10t3b) (merge vers: 5.1.32-ndb-6.4.3) (pib:6)

Documented bugfix in the NDB 6.2.17, 6.3.23, and 6.4.3 changelogs as follows:

        When using multiple management servers and starting several API
        nodes (possibly including one or more SQL nodes) whose
        connectstrings listed the management servers in different order,
        it was possible for 2 API nodes to be assigned the same node ID.
        When this happened it was possible for an API node not to get
        fully connected, consequently producing a number of errors whose
        cause was not easily recognizable.

Also updated 4.1/5.0 Cluster Limitations and 5.1/NDB-6.x Limitations Resolved sections of the Manual.