Bug #57522 Rare race-condition in GCP take-over
Submitted: 18 Oct 2010 12:56 Modified: 25 Oct 2010 18:53
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[18 Oct 2010 12:56] Jonas Oreland
Description:
During gcp take-over it could be that
one of the nodes did not get SUB_GCP_COMPLETE_REP hence will report back
GCP_COMMITTING while others report GCP_PREPARING...

This is due to to async nature of SUB_GCP_COMPLETE_REP (last step of GCP)

"Rare" is as this code is unaltered since 6.3.2 and this is first time it's observed.

How to repeat:
new test prg

Suggested fix:
Make SUB_GCP_COMPLETE_REP synchronous (i.e wait for reply)
so that no nodes can be in GCP_PREPARING while others are in GCP_COMMITTING
  for this reason
[19 Oct 2010 18:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/121188

3309 Jonas Oreland	2010-10-19
      ndb - bug#57522 - make last step of micro gcp syncronous to avoid rare race with master take-over
[19 Oct 2010 18:56] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:jonas@mysql.com-20101019182617-aftkg9ejc7153i8l) (version source revid:jonas@mysql.com-20101019182617-aftkg9ejc7153i8l) (merge vers: 5.1.51-ndb-6.3.39) (pib:21)
[19 Oct 2010 18:57] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:jonas@mysql.com-20101019184234-t8n5ru7xn90so3md) (version source revid:jonas@mysql.com-20101019184234-t8n5ru7xn90so3md) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)
[19 Oct 2010 18:58] Jonas Oreland
pushed to 6.3.39, 7.0.20 and 7.1.9
[20 Oct 2010 13:15] Jon Stephens
Documented as follows in the NDB-6.3.39, 7.0.20, and 7.1.9 changelogs:

        During GCP takeover, it was possible for a data node not to
        receive a SUB_GCP_COMPLETE_REP signal, with the result that it
        would report itself as GCP_COMMITTING while the other data nodes
        reported GCP_PREPARING.

losed.
[23 Oct 2010 9:13] Jonas Oreland
autotest shows a new problem...will need to investigate...
setting status back to in progress
[25 Oct 2010 10:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/121789

3322 Jonas Oreland	2010-10-25
      ndb - bug#57522 - post autotest fix 1. Fix upgrade which was made "upside-down"
[25 Oct 2010 18:36] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:jonas@mysql.com-20101025103440-wr9euk241kw0tzmq) (version source revid:jonas@mysql.com-20101025103440-wr9euk241kw0tzmq) (merge vers: 5.1.51-ndb-6.3.39) (pib:21)
[25 Oct 2010 18:38] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:jonas@mysql.com-20101025183508-tr5thq2ogue0ilmm) (version source revid:jonas@mysql.com-20101025183508-tr5thq2ogue0ilmm) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)
[25 Oct 2010 18:53] Jonas Oreland
Fixed introduced bug in bug-fix.
No extra docs needed.
Bug (in bug fix) never in released version