Bug #42973 race condition if allocating node ids from different ndb_mgmd simultaniously
Submitted: 18 Feb 2009 14:15 Modified: 18 Feb 2009 20:04
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:* OS:Any
Assigned to: Jonas Oreland

[18 Feb 2009 14:15] Jonas Oreland
Description:
if using 2 (or more) ndb_mgmd, and starting several applications (or mysqld)
that has different order of ndb_mgmd in the connect-strings.
then, there is a race-condition so that 2 applications can get *same* node id.
when app later connects, it will not get fully connected.
with various weird error cases as a consequence of this.

How to repeat:
see above...
loop many times, race-condition is very small

Suggested fix:
store allocation inside kernel, including timeout (that is given in ndb_mgmd)
[18 Feb 2009 14:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66768

2841 Jonas Oreland	2009-02-18
      ndb - bug#42973 - fix parallel nodeid allocations with multiple ndb_mgmd
[18 Feb 2009 15:03] Bugs System
Pushed into 5.1.32-ndb-6.2.17 (revid:jonas@mysql.com-20090218143240-wjprrmehfp18x33j) (version source revid:jonas@mysql.com-20090218142958-iwgv1qidu5ohf2b5) (merge vers: 5.1.32-ndb-6.2.17) (pib:6)
[18 Feb 2009 15:04] Bugs System
Pushed into 5.1.32-ndb-6.3.23 (revid:jonas@mysql.com-20090218143350-aidjdqu0m7yu3phl) (version source revid:jonas@mysql.com-20090218143350-aidjdqu0m7yu3phl) (merge vers: 5.1.32-ndb-6.3.23) (pib:6)
[18 Feb 2009 15:05] Bugs System
Pushed into 5.1.32-ndb-6.4.3 (revid:jonas@mysql.com-20090218150143-qqyitqae3639g9me) (version source revid:jonas@mysql.com-20090218150104-yli5h2ldr1g10t3b) (merge vers: 5.1.32-ndb-6.4.3) (pib:6)
[18 Feb 2009 20:04] Jon Stephens
Documented bugfix in the NDB 6.2.17, 6.3.23, and 6.4.3 changelogs as follows:

        When using multiple management servers and starting several API
        nodes (possibly including one or more SQL nodes) whose
        connectstrings listed the management servers in different order,
        it was possible for 2 API nodes to be assigned the same node ID.
        When this happened it was possible for an API node not to get
        fully connected, consequently producing a number of errors whose
        cause was not easily recognizable.

Also updated 4.1/5.0 Cluster Limitations and 5.1/NDB-6.x Limitations Resolved sections of the Manual.