Bug #37688 Race-condition between NODE_FAILREP and TAKE_OVERTCCONF
Submitted: 27 Jun 2008 8:14 Modified: 21 Sep 2009 13:26
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: In all versions

[27 Jun 2008 8:14] Jonas Oreland
Description:
If TAKE_OVERTCCONF (from master) arrives *before* node has received NODE_FAILREP
for that node, there is a theoretical race-condition.

This bug was introduced when fixing a series of cascading master failures.
This causes
* testNodeRestart -n Bug25364 T1
* testNodeRestart -n Bug28717 T1
to fail (4-node)

Note: they fail due to delaying a bunch of signals (such as NODE_FAILREP)
using error-inserts

How to repeat:
see above

Suggested fix:
attached patch is start...but not complete enough...
[27 Jun 2008 8:15] Jonas Oreland
half-baked fix

Attachment: bug37688.patch (text/x-patch), 11.34 KiB.

[13 Mar 2009 8:47] Jonas Oreland
seems to more likely using ndbmtd 
(which makes sense)
[21 Sep 2009 8:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83838

3055 Jonas Oreland	2009-09-21
      ndb - bug#37688 - fix race condition between TAKE_OVERTCCONF and NODE_FAILREP by introducing a generic route-facility
[21 Sep 2009 8:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83840

3055 Jonas Oreland	2009-09-21
      ndb - bug#37688 - fix race condition between TAKE_OVERTCCONF and NODE_FAILREP by introducing a generic route-facility
[21 Sep 2009 9:29] Jonas Oreland
pushed to 6.3.27, 7.0.8 and 7.1
(i.e not in 6.2)
[21 Sep 2009 13:26] Jon Stephens
Documented bugfix in the NDB-6.3.27 and 7.0.8 changelogs as follows:

        When a data node received a TAKE_OVERTCCONF signal from the
        master before that node had received a NODE_FAILREP, a race
        condition could in theory result.

Closed.
[30 Sep 2009 8:13] Bugs System
Pushed into 5.1.37-ndb-6.3.28 (revid:jonas@mysql.com-20090930070741-13u316s7s2l7e1ej) (version source revid:jonas@mysql.com-20090921083042-9xciok3hzbzc1i53) (merge vers: 5.1.37-ndb-6.3.27) (pib:11)
[30 Sep 2009 8:14] Bugs System
Pushed into 5.1.37-ndb-7.0.9 (revid:jonas@mysql.com-20090930075942-1q6asjcp0gaeynmj) (version source revid:jonas@mysql.com-20090921084240-0a0kzk1djvu8m93j) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[30 Sep 2009 8:15] Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:jonas@mysql.com-20090930080049-1c8a8cio9qgvhq35) (version source revid:jonas@mysql.com-20090921084935-uq21e1hs9alc9b81) (merge vers: 5.1.35-ndb-7.1.0) (pib:11)