Bug #41297 Stale locks during node-failure-handling
Submitted: 8 Dec 2008 11:11 Modified: 10 Dec 2008 23:37
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:* OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[8 Dec 2008 11:11] Jonas Oreland
Description:
during testing of bug#41295 a race-condition in tc-take-over (node-failure-handling) was discovered that could lead to operations(locks) not 
taken-over, and subsequently getting stale.

this could lead to subsequent node-restart failing, and application
endlessly getting into lock-conflict with operation that would never go away
unless node was restarted.

How to repeat:
new test-prg (for bug#41295)
[8 Dec 2008 12:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60898

2764 Jonas Oreland	2008-12-08
      ndb - bug#41295 bug#41296 bug#41297
[8 Dec 2008 14:00] Bugs System
Pushed into 5.1.30-ndb-6.2.17  (revid:jonas@mysql.com-20081208123555-23afeiagk2vputc1) (version source revid:jonas@mysql.com-20081208123555-23afeiagk2vputc1) (pib:5)
[8 Dec 2008 14:01] Bugs System
Pushed into 5.1.30-ndb-6.3.20  (revid:jonas@mysql.com-20081208123555-23afeiagk2vputc1) (version source revid:jonas@mysql.com-20081208133911-5ef2zriejdniqgkd) (pib:5)
[8 Dec 2008 14:02] Bugs System
Pushed into 5.1.30-ndb-6.4.0  (revid:jonas@mysql.com-20081208123555-23afeiagk2vputc1) (version source revid:jonas@mysql.com-20081208135815-5pzw01ax9hrbbw3j) (pib:5)
[8 Dec 2008 14:14] Jonas Oreland
note: 6.3.20 *might* be incorrect, check with tomas
[10 Dec 2008 23:37] Jon Stephens
Documented bugfix in the NDB-6.2.17 and NDB-6.3.21 changelogs as follows:

        A race condition in transaction coordinator takeovers (part of
        node failure handling) could lead to operations (locks) not
        being taken over and subsequently getting stale. This could lead
        to subsequent failures of node restarts, and to applications
        getting into an endless lock conflict with operations that would
        not complete until the node was restarted.

(Fix appears in 6.3.21 rather than 6.3.20 per email from Jörg.)
[12 Dec 2008 23:28] Bugs System
Pushed into 6.0.9-alpha  (revid:jonas@mysql.com-20081208123555-23afeiagk2vputc1) (version source revid:tomas.ulin@sun.com-20081209185954-9svcixh2p5hsfi6w) (pib:5)