Bug #56044 GcpStop after master failure
Submitted: 17 Aug 2010 8:22 Modified: 19 Aug 2010 11:38
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[17 Aug 2010 8:22] Jonas Oreland
Description:
GcpStop with the following symptoms

1) master failure
2) new master crashes with GcpStop
3) printout refers to
   c_COPY_GCIREQ_Counter = [SignalCounter: m_count=1 0000000000000010]

   NOTE: c_COPY_GCIREQ is not clear, i.e it's blocking on COPY_GCI

---

Problem is a race condition so that writing on sysfile must complete very shortly
after old master has failed.

How to repeat:
new error codes

Suggested fix:
fix race
[17 Aug 2010 10:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115902

3253 Jonas Oreland	2010-08-17
      ndb - bug#56044 - fix race condition in COPY_GCI during master failure
[17 Aug 2010 10:13] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.47-ndb-6.3.37 (revid:jonas@mysql.com-20100817100741-d9ds0hf3hrtmjgth) (version source revid:jonas@mysql.com-20100817100741-d9ds0hf3hrtmjgth) (merge vers: 5.1.47-ndb-6.3.37) (pib:20)
[17 Aug 2010 10:13] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.47-ndb-7.0.18 (revid:jonas@mysql.com-20100817101101-x2r5szheuk3rysn6) (version source revid:jonas@mysql.com-20100817101101-x2r5szheuk3rysn6) (merge vers: 5.1.47-ndb-7.0.18) (pib:20)
[17 Aug 2010 10:17] Jonas Oreland
pushed to 6.3.37, 7.0.18 and 7.1.7
[19 Aug 2010 11:38] Jon Stephens
Documented bugfix as follows in the NDB-6.3.37, 7.0.18, and 7.1.7 changelogs:

      Following a failure of the master data node, the new master sometimes
      experienced a race condition which caused the node to terminate with a
      GcpStop error.

Closed.