| Bug #44015 | Abort of insert+delete can lead to committed read scan reading inconsistent data | ||
|---|---|---|---|
| Submitted: | 1 Apr 2009 17:04 | Modified: | 15 Apr 2009 2:41 | 
| Reporter: | Frazer Clement | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) | 
| Version: | 6.2+ | OS: | Any | 
| Assigned to: | Jonas Oreland | CPU Architecture: | Any | 
   [1 Apr 2009 17:04]
   Frazer Clement        
  Patch to add testcases to testNdbApi
Attachment: 62-weird-assert.patch (text/x-patch), 4.71 KiB.
   [3 Apr 2009 7:47]
   Jonas Oreland        
  note: split into 2, update subject on this. When aborting a insert+delete, the insert and delete are aborted "separately" (since TC does not know that the operations are on same row) The abort of the insert comes first, if a committed read scan (tup scan or index scan) then examines the row after the insert has been aborted but before the delete has been aborted, it could in same cases find the row in a inconsistent state. NOTE: backup+lcp does committed-read tup scans! This problem was fixed for the ordered index, by always also aborting all operations *after* the operation being asked to abort. The fix for the bug is to generalize that code, and also apply it to the actual data row.
   [3 Apr 2009 7:48]
   Jonas Oreland        
  extra clarification: pk-operations or scans using any kind of lock is not affected, since they are serialized in ACC
   [3 Apr 2009 8:26]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/71291 2897 Jonas Oreland 2009-04-03 ndb - bug#44015 - fix abort of insert+delete, so that committed read scan can't get inbetween
   [3 Apr 2009 19:55]
   Bugs System        
  Pushed into 5.1.32-ndb-6.3.24 (revid:jonas@mysql.com-20090403100824-h0lvd8lr4frk17dc) (version source revid:jonas@mysql.com-20090403100824-h0lvd8lr4frk17dc) (merge vers: 5.1.32-ndb-6.3.24) (pib:6)
   [3 Apr 2009 19:56]
   Bugs System        
  Pushed into 5.1.32-ndb-7.0.5 (revid:jonas@mysql.com-20090403125707-ma9xedfo4t8oip3z) (version source revid:jonas@mysql.com-20090403125707-ma9xedfo4t8oip3z) (merge vers: 5.1.32-ndb-7.0.5) (pib:6)
   [3 Apr 2009 19:57]
   Bugs System        
  Pushed into 5.1.32-ndb-6.2.18 (revid:jonas@mysql.com-20090403082438-zxbfx8pofugzjlf5) (version source revid:jonas@mysql.com-20090403082438-zxbfx8pofugzjlf5) (merge vers: 5.1.32-ndb-6.2.18) (pib:6)
   [4 Apr 2009 13:45]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/71384 2898 Jonas Oreland 2009-04-04 ndb - bug#44015 - apparently all tux triggers should fire first...
   [4 Apr 2009 20:51]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/71387 2898 Jonas Oreland 2009-04-04 ndb - bug#44015 - apparently all tux triggers should fire first...
   [4 Apr 2009 20:55]
   Bugs System        
  Pushed into 5.1.32-ndb-6.2.18 (revid:jonas@mysql.com-20090404205024-foid3jeg2n1xxw1w) (version source revid:jonas@mysql.com-20090404205024-foid3jeg2n1xxw1w) (merge vers: 5.1.32-ndb-6.2.18) (pib:6)
   [4 Apr 2009 20:56]
   Bugs System        
  Pushed into 5.1.32-ndb-6.3.24 (revid:jonas@mysql.com-20090404205223-at44d5n9y4uovmzc) (version source revid:jonas@mysql.com-20090404205223-at44d5n9y4uovmzc) (merge vers: 5.1.32-ndb-6.3.24) (pib:6)
   [4 Apr 2009 20:56]
   Bugs System        
  Pushed into 5.1.32-ndb-7.0.5 (revid:jonas@mysql.com-20090404205352-va9m5fufgc20ho8h) (version source revid:jonas@mysql.com-20090404205352-va9m5fufgc20ho8h) (merge vers: 5.1.32-ndb-7.0.5) (pib:6)
   [15 Apr 2009 2:41]
   Jon Stephens        
  Documented bugfix in the NDB-6.2.18, 6.3.24, and 7.0.5 changelogs as follows:
        When aborting an operation involving both an insert and a delete, the
        insert and delete were aborted separately. This was because the
        transaction coordinator did not know that the operations affected on
        same row, and, in the case of a committed-read (tuple or index) scan,
        the abort of the insert was performed first, then the row was examined
        after the insert was aborted but before the delete was aborted. In some
        cases, this would leave the row in a inconsistent state. This could
        occur when a local checkpoint was performed during a backup. This issue
        did not affect primary ley operations or scans that used locks (these
        are serialized).
        After this fix, for ordered indexes, all operations that follow the
        operation to be aborted are now also aborted.
 
Description: Attached patch adds 2 testcases to testNdbApi. Executing them against 6.2/6.3/6.4 generally results in assertion failures in TUP and LQH indicating some sort of data corruption. /* testNdbApi -n WeirdAssertFail * Generates phrase "here2" on 6.3 which is * output by DbtupExecQuery::handleReadReq() * detecting that the record's tuple checksum * is incorrect. * Later can generate assertion failure in * prepare_read * ndbassert(src_len >= (dynstart - src_data)); * resulting in node failure */ /* testNdbApi -n WeirdAssertFail2 * Results in assertion failure in DbtupCommit::execTUP_DEALLOCREQ() * ndbassert(ptr->m_header_bits & Tuple_header::FREE); * Also, sometimes an ndbrequire failre in LQH::execACCKEYREF * if (unlikely(! (tcPtr->seqNoReplica == 0 || * errCode != ZTUPLE_ALREADY_EXIST || * (tcPtr->operation == ZREAD && * (tcPtr->dirtyOp || tcPtr->opSimple))))) * { * ... * ndbrequire(false); * * Results in node failure */ How to repeat: Run testcases using standard Hugo tables against 2-node cluster at 6.2, 6.3 or 6.4. Testcases experiment with theme of inserting and deleting the same rows in a single transaction, then aborting the transaction. Point of assertion/ndbrequire failures varies, and may not always occur, or may not occur until after some other NDABPI errors (etc. Out of Redo log etc.).