Bug #40697 Transactions can be pending longer than needed at node-failure
Submitted: 13 Nov 2008 14:01 Modified: 14 Nov 2008 9:36
Reporter: Jonas Oreland
Status: Closed
Category:Server: Cluster Severity:S3 (Non-critical)
Version:* OS:Any
Assigned to: Jonas Oreland Target Version:

[13 Nov 2008 14:01] Jonas Oreland
Description:
If a ndbapi-application(or mysqld ofcourse) has a outstanding transaction with a
transaction coordinator that fails (node-failure), then surviving will process the
failure and then inform the waiting ndbapi.

Currently, the surviving nodes will not inform that ndbapi until the failure has been
processed in *all* protocols, when the ndbapi is only interesting with the
failure-handling of the transaction protocol (TC-take-over).

How to repeat:
.

Suggested fix:
Inform ndbapi when the TC-take-over has completed,
so that it can carry on quicker.
[13 Nov 2008 14:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58633

2733 Jonas Oreland	2008-11-13
      ndb - bug#40697 - transaction can wait longer needed during node-failure-handling
[13 Nov 2008 14:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58635

2734 Jonas Oreland	2008-11-13
      ndb - bug#40697 - transaction can wait longer needed during node-failure-handling
[13 Nov 2008 14:33] Bugs System
Pushed into 5.1.29-ndb-6.2.17  (revid:jonas@mysql.com-20081113131556-ka7b01usk25cqbug)
(version source revid:jonas@mysql.com-20081113131556-ka7b01usk25cqbug) (pib:5)
[13 Nov 2008 14:34] Bugs System
Pushed into 5.1.29-ndb-6.3.19  (revid:jonas@mysql.com-20081113131556-ka7b01usk25cqbug)
(version source revid:jonas@mysql.com-20081113132059-k9ud1vk9xxbzsp1v) (pib:5)
[13 Nov 2008 14:35] Bugs System
Pushed into 5.1.29-ndb-6.4.0  (revid:jonas@mysql.com-20081113131556-ka7b01usk25cqbug)
(version source revid:jonas@mysql.com-20081113133629-doa4bx0p1a061bvx) (pib:5)
[14 Nov 2008 9:36] Jon Stephens
Documented bugfix in the NDB-6.2.17 and NDB-6.3.19 changelogs as follows:

        Transaction failures took longer to handle than was necessary.

        When a data node acting as transaction coordinator (TC) failed,
        the surviving data nodes did not inform the API node initiating
        the transaction of this until the failure had been processed by
        all protocols, when the API node needed only to know about
        failure handling by the transaction protocol -- that is, it
        needed to be informed only about the TC takeover process. Now,
        API nodes (including MySQL servers acting as cluster SQL nodes)
        are informed as soon as the TC takeover is complete, so that it
        can carry on operating more quickly.
[13 Dec 2008 0:29] Bugs System
Pushed into 6.0.9-alpha  (revid:jonas@mysql.com-20081113131556-ka7b01usk25cqbug) (version
source revid:tomas.ulin@sun.com-20081209185954-9svcixh2p5hsfi6w) (pib:5)