Bug #48584 Race condition in LCP master take-over
Submitted: 5 Nov 2009 20:44 Modified: 6 Nov 2009 2:54
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[5 Nov 2009 20:44] Jonas Oreland
Description:
During a LCP master take over, when new master has not received COPY_GCI(LCP)
but other participants have. (i.e some nodes has gotten a message in the LCP protocol, but the newly elected master has not received this message)

The newly master, can utilize an uninitialized variable, causing it to crash
with ndbrequire

How to repeat:
run 
testNodeRestart -n RestartNodeDuringLCP

until it happens

Suggested fix:
restart the protocol at correct position, so that variables will be initialized
properly.
[5 Nov 2009 20:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/89523

3158 Jonas Oreland	2009-11-05
      ndb - bug#48584 - fix master lcp take-over bug
[5 Nov 2009 21:50] Jonas Oreland
pushed to 6.3.29 and 7.0.10
[6 Nov 2009 2:54] Jon Stephens
Documented bugfix in the NDB-6.3.29 and 7.0.10 changelogs as follows:

        During an LCP master takeover, when the newly elected master did
        not receive a COPY_GCI LCP protocol message but other nodes
        participating in the local checkpoint had received one, the new
        master could use an uninitialized variable, which caused it to
        crash.

Closed.