Bug #20446 NDB API: scans with updates by multiple processes/threads can skip some records
Submitted: 14 Jun 2006 1:39 Modified: 2 Nov 2006 6:03
Reporter: Anatoly Pidruchny (Candidate Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.0.22-max OS:Any (all)
Assigned to: Pekka Nousiainen CPU Architecture:Any

[14 Jun 2006 1:39] Anatoly Pidruchny
Description:
When multiple processes or threads in parallel are doing the same ordered scan with exclusive lock and update the retrieved records, sometimes the scan can skip some number of records and the skipped records are not updated as the result.

How to repeat:
Please see the attached test program. The cluster configuration file config.ini is also provided. The program was tested on a 64-bit Linux and on a HP-UX 11.11 systems. In both cases it reproduced the problem.

The program creates and populates the table mytablename in test_db_1 database, then creates two threads, passing them two different Ndb objects representing two different connections to the test_db_1 database. The threads in parallel repeat the following operations:

1. Start a transaction;
2. Define an NdbIndexScanOperation for the primary key of the mytablename
table;
3. Call readTuples with NdbOperation::LM_Exclusive parameter to request exclusive locks to be used, and with NdbScanOperation::SF_OrderBy option to request the result set to be ordered;
4. Define the attributes to be retrieved with getValues;
5. Execute the transaction;
6. Call NdbIndexScanOperation::nextResult(true);
7. Create a separate update transaction;
8. In a cycle, for each record retrieved with the nextResult(true), do steps 8.1 and 8.2:
8.1. Create an update operation using NdbIndexScanOperation::updateCurrentTuple(updateTrans). This call requests the record to be taken over to the update transaction;
8.2. Set a value of an attribute in the update operation;
9. Execute the update transaction;
10. Close the update transaction;
11. Close the NdbIndexScanOperation;
12. Close the transaction used for the scan.

The program shows that approximately every 4-th of the scans start not from the first record, as every scan is supposed to start, but from the 124-th (on HP-UX) or 125-th (on 64-bit Linux) record.

The example output of the program is:

64-bit Linux:

Populating the table
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 123
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 123
Thread: 1095068000 - updating records 124 through 241
Thread: 1084578144 - updating records 0 through 119
Thread: 1095068000 - updating records 0 through 123
Thread: 1095068000 - updating records 0 through 123
Thread: 1095068000 - updating records 0 through 119
...

HP-UX 11.11:

Populating the table
Thread: 7 - updating records 0 through 122
Thread: 7 - updating records 0 through 122
Thread: 7 - updating records 0 through 118
Thread: 8 - updating records 123 through 234
Thread: 7 - updating records 0 through 122
Thread: 7 - updating records 0 through 122
Thread: 7 - updating records 0 through 118
Thread: 8 - updating records 123 through 234
Thread: 7 - updating records 0 through 122
Thread: 7 - updating records 0 through 122
Thread: 7 - updating records 0 through 118
Thread: 8 - updating records 123 through 234
Thread: 7 - updating records 0 through 122
Thread: 7 - updating records 0 through 122
Thread: 7 - updating records 0 through 118
Thread: 8 - updating records 123 through 234
Thread: 7 - updating records 0 through 122
Thread: 7 - updating records 0 through 118
Thread: 8 - updating records 123 through 234
Thread: 8 - updating records 0 through 122
Thread: 8 - updating records 0 through 118
Thread: 7 - updating records 123 through 234
Thread: 8 - updating records 0 through 122
Thread: 8 - updating records 0 through 122
...
[14 Jun 2006 1:40] Anatoly Pidruchny
The test program.

Attachment: scan_update_rec_skipped.cpp (application/octet-stream, text), 7.49 KiB.

[14 Jun 2006 1:40] Anatoly Pidruchny
Cluster configuration file

Attachment: config.ini (application/octet-stream, text), 448 bytes.

[27 Jun 2006 14:44] Jonas Oreland
Hi

thx for an excellent bug report.

i looked at api code, and ndbapi code.
and it seems like there is a bug in the ordered index scan with/ exclusive locks.

i'll reassign this to the guy who wrote it...

/Jonas
[9 Oct 2006 10:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13329

ChangeSet@1.2265, 2006-10-09 12:35:11+02:00, pekka@orca.ndb.mysql.com +5 -0
  ndb - bug#20446: test case and cleanups (not fix)
[9 Oct 2006 14:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13335

ChangeSet@1.2266, 2006-10-09 16:13:32+02:00, pekka@orca.ndb.mysql.com +6 -0
  ndb - bug#20446: fix + other TUX scan improvements
[1 Nov 2006 14:38] Jonas Oreland
pushed into 5.0.29
[1 Nov 2006 14:52] Jonas Oreland
pushed into 5.1.13
[2 Nov 2006 6:03] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix for 5.0.29 & 5.1.13.

Nice description of issue - thanks!
[4 Nov 2006 3:12] Jon Stephens
*Fix for 5.0 documented in 5.0.30 Release Notes.*