Bug #47329 Endless 4006 in startTransaction()/getScanOperation() + node-failure scenario
Submitted: 15 Sep 2009 13:56 Modified: 16 Sep 2009 13:38
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.2 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[15 Sep 2009 13:56] Jonas Oreland
Description:
Using a pattern like this

0 loop:
1  p = startTransaction();
2  o = p->getNdbScanOperation();
3  if (o == 0 && p->getNdbError().status == NdbError::TemporaryError)
4  {
5    p->close();
6    goto loop;
7  }

Could in some (rare) scenarios end up in infinite loop.
  (if api-node discovered node-failure in between 1 and 2)

This as getNdbScanOperation() will detect if node (that p is connected to)
  is dead, but startTransaction()/closeTransaction() will not,
  causing the same transaction to be started/closed endlessly.

How to repeat:
"testIndex -n NFNR3 T6 T13" reproduces semi-consistent in autotest.

Suggested fix:
in closeTransaction() (step 5 above) check if sequence no is correct, and if not
remove the transaction from pool transactions allocated from ndbd.
[15 Sep 2009 16:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83313

2999 Jonas Oreland	2009-09-15
      ndb - bug#47329 - release unusable NdbTransaction-object on closeTransaction
[15 Sep 2009 17:02] Bugs System
Pushed into 5.1.37-ndb-6.3.27 (revid:jonas@mysql.com-20090915165710-mro7wyb4547aqje1) (version source revid:jonas@mysql.com-20090915165710-mro7wyb4547aqje1) (merge vers: 5.1.37-ndb-6.3.27) (pib:11)
[15 Sep 2009 17:02] Bugs System
Pushed into 5.1.37-ndb-7.0.8 (revid:jonas@mysql.com-20090915165909-j9hf2cmeavek8t5l) (version source revid:jonas@mysql.com-20090915165909-j9hf2cmeavek8t5l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[15 Sep 2009 17:03] Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:jonas@mysql.com-20090915170148-jh1wiy6upn3yipjl) (version source revid:jonas@mysql.com-20090915170148-jh1wiy6upn3yipjl) (merge vers: 5.1.35-ndb-7.1.0) (pib:11)
[15 Sep 2009 17:06] Jonas Oreland
also pushed to 6.2.19
[15 Sep 2009 20:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83380

3008 Martin Skold	2009-09-15 [merge]
      Merge
      added:
        storage/ndb/test/run-test/conf-ndb07.cnf
      modified:
        storage/ndb/src/common/portlib/NdbDir.cpp
        storage/ndb/src/kernel/blocks/ERROR_codes.txt
        storage/ndb/src/kernel/blocks/dbdih/DbdihMain.cpp
        storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp
        storage/ndb/src/ndbapi/Ndb.cpp
        storage/ndb/test/ndbapi/testSystemRestart.cpp
        storage/ndb/test/run-test/Makefile.am
        storage/ndb/test/src/UtilTransactions.cpp
[15 Sep 2009 21:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83385

3010 Martin Skold	2009-09-15 [merge]
      Merge
      added:
        storage/ndb/test/run-test/conf-ndb07.cnf
      modified:
        storage/ndb/src/common/portlib/NdbDir.cpp
        storage/ndb/src/kernel/blocks/ERROR_codes.txt
        storage/ndb/src/kernel/blocks/dbdih/DbdihMain.cpp
        storage/ndb/src/ndbapi/Ndb.cpp
        storage/ndb/test/ndbapi/testSystemRestart.cpp
        storage/ndb/test/run-test/Makefile.am
        storage/ndb/test/src/UtilTransactions.cpp
[15 Sep 2009 21:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83388

3051 Martin Skold	2009-09-15 [merge]
      Merge
      added:
        storage/ndb/test/run-test/conf-ndb07.cnf
      modified:
        storage/ndb/src/kernel/blocks/ERROR_codes.txt
        storage/ndb/src/kernel/blocks/dbdih/DbdihMain.cpp
        storage/ndb/src/ndbapi/Ndb.cpp
        storage/ndb/test/ndbapi/testSystemRestart.cpp
        storage/ndb/test/run-test/Makefile.am
        storage/ndb/test/src/UtilTransactions.cpp
[15 Sep 2009 21:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83390

2996 Martin Skold	2009-09-15 [merge]
      Merge
      modified:
        storage/ndb/src/ndbapi/Ndb.cpp
        storage/ndb/test/src/UtilTransactions.cpp
[16 Sep 2009 13:38] Jon Stephens
Documetned bugfix in the NDB-6.2.19, 6.3.27, and 7.0.8 changelogs, as follows:

        In some circumstances, if an API node encountered a data node
        failure between the creation of a transaction and the start of a
        scan using that transaction, then any subsequent calls to
        startTransaction() and closeTransaction() could cause the same
        transaction to be started and closed repeatedly.

Closed.