Bug #41233 Using few large TS data files can cause tmp deadlock error under load
Submitted: 4 Dec 2008 15:43 Modified: 24 Apr 2009 20:26
Reporter: Jonathan Miller Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Linux
Assigned to: CPU Architecture:Any

[4 Dec 2008 15:43] Jonathan Miller
Description:
When first testing DD using NDB Atomics (NDB API Performance Application) I would receive back test failures due to deadlock messages. 

The program was packaging 900 random updates into a single transaction focused on a single table of 10,000,000 rows.

The TS data files consisted of 2 data files with the size of 2GB.

To get around this, Tomas suggested creating many small sized data files allowing for additional reader threads.

I created 40 data files sized 50MB each. This did work around the temp deadlock errors that I had been getting  

How to repeat:
Easy to repeat with ACRT after modifying the loader script to create 2 data files at 2GB each and then using NDBAtomics to run the load.

Suggested fix:
Allowing additional reader threads per data file or ensuring deadlocks can not happen when using large files
[18 Dec 2008 8:37] Jonas Oreland
Jeb,

this has been fixed in 6.4
(and won't be backported)

can you retest with 6.4?

/Jonas
[13 Mar 2009 9:03] Jonas Oreland
dont know what to do
[24 Apr 2009 20:26] Jonathan Miller
Hi,

I was able to repeat the test today and received the following during "Random Updates":

Error in Atomics.cpp, line: 460, code: 266, msg: Time-out in NDB, probably caused by deadlock.

I did have DiskIOThreadPool=2

The code at line 460:

        //Execute to retrieve the data, but do not commit so we keep
        // our lock on that row
        if (m_pMyTrans->execute( NdbTransaction::NoCommit) == -1 ){
line 460->APIERROR(m_pMyTrans->getNdbError());
          pMyNdb->closeTransaction(m_pMyTrans);
          return 1;
        }

Please note that the code does ensure that during the "Random Updates" we are not reusing the same PK in the batch of operational updates:

for (int h = 0; h < this->GetGroupNumber(); h++){
  accountIdTableCounter = h;
  while (idFound){
    idFound = false;
     for (int i = 0; i < accountIdTableCounter; i++){
       if (accountIdTable[i] == nAccountID){
         idFound = true;
         this->MakeData();
         break;
       }
     }
  }

I will break this down to an easy test case next week if "Okayed" by Manager.