Bug #80733 Acc table scan may scan same row twice
Submitted: 14 Mar 2016 18:21 Modified: 12 May 2016 5:39
Reporter: Mauritz Sundell Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.4.10 OS:Any
Assigned to: CPU Architecture:Any

[14 Mar 2016 18:21] Mauritz Sundell
Description:
Normally a table scan not using an ordered index nor have any disk data will use acc scan.
This also could happen while scanning an unique index without order.

If the table shrinks (deleted rows) after scan started and then grow again (inserted rows) that can cause the same row, not deleted nor inserted, to be scanned twice.

Bug existed in 7.0.6 and probably before that too.

If a row is scanned twice, it will normally not result in an error, unless some action triggered for each row fails if executed twice.

Some sql statements that maybe could fail or have unexpected behaviour (have not double checked code, nor verified by testing):
* insert ... select .. from t - which could complain about duplicate key
* delete * from t - could maybe complain about already deleted
* update t set col = col +1 - could possibly update a row twice
* alter table t ...
* with foreign keys, delete or update cascade actions

Acc scan is scan of the fragments linear hash table of row references.
While the scan scans bucket from start to end, the hash table can both expand and shrink.
There is a scan bit for each scan and element in table.
On scan start the scan bit for all elements in table are garbage.
And prior start scanning a bucket the scan bit is cleared for all elements.
Also at expand or shrink the involved buckets scan bits are cleared if they are unscanned.
This implies that buckets above the top bucket at scan start have valid scanbits since they must been result of an expand.
The faulty code also assumed that unscanned buckets below the original top bucket have uninitated or cleared scan bits, and that it is ok to clear scan bits for those buckets when scan enter.
But that is wrong, since if one:
1) start scan, but read few rows
2) shrink table by deleting some rows, then som unscanned elements are moved to a lower bucket. And top bucket are below the original.
3) then let scan proceed and scan some bucket with moved elements from 2).
4) the expand table, by inserting some new rows so table expands back to original size
5) then scan reach the top bucket that bucket have valid scan bits and some of them set, but scan will clear all the scan bits prior scan - and some rows may be scanned twice!

How to repeat:
Follow the steps at the end of description above.
I found no easy way to reproduce this, but added some error insert and a ndbapi test program.

Suggested fix:
Let Scanrec::startNoOfBucket be the least top bucket seen so far instead of the original top bucket then scan started.

@@ -5638,6 +5685,10 @@ void Dbacc::execSHRINKCHECK2(Signal* signal)
     fragrecptr.p->dirRangeFull = ZFALSE;
   }

+  if (mergeSourceBucket == scanPtr.p->startNoOfBuckets)
+  {
+    scanPtr.p->startNoOfBuckets --;
+  }
   shrink_adjust_reduced_hash_value(mergeDestBucket);

   /*--------------------------------------------------------------------------*/
[14 Mar 2016 18:25] Mauritz Sundell
Posted by developer:
 
This bug should be fixed in 7.2 and up.

Even if acc scan do not guarantee that a deleted row and an inserted row with same key are not scanned twice.
It is severe that rows that are neither deleted or inserted during scan are scanned twice!
[18 Mar 2016 15:24] Mauritz Sundell
Posted by developer:
 
Another related bug.

If one have scanned some buckets and then let table expand.
If the split bucket is below current bucket, the new top bucket should be treated as scanned.
But if one let table shrink again, scan bits of top bucket are cleared before merge, and merge bucket are marked for rescan.
Continuing the scan it may scan some rows twice.
[12 May 2016 5:39] Jon Stephens
Documented fix in the NDB 7.5.2 changelog as follows:

    A table scan using neither an ordered index nor any Disk Data
    columns normally uses an ACC scan. If this happened while
    scanning an unique but unordered index which shrank (due to rows
    being deleted) after the scan started and then grew again (rows
    inserted), a single row that had been neither deleted nor
    inserted could be scanned twice.

Closed.