Bug #43650 Falcon SELECTs return "can't find record" in READ COMMITTED isolation level
Submitted: 14 Mar 2009 11:28 Modified: 26 May 2010 17:47
Reporter: Philip Stoev Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0-falcon-team OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_ERROR HANDLING

[14 Mar 2009 11:28] Philip Stoev
Description:
When executing a SELECT in READ COMMITTED mode, the SELECT may fail with a "can't find record" error. Likely this is because a record ceased to exist between the first pass of the SELECT operation and the actual retrieval of the record.  Depending on the SELECTs and the concurrent updates, a situation may arise where a SELECT may never be able to complete at all.

Discussions within the team indicated that this is a genuine bug. SELECTs should be atomic and thus this error message is not acceptable.

How to repeat:
$ perl runall.pl \
  --engine=Falcon \
  --mysqld=--loose-falcon-lock-wait-timeout=5 \
  --mysqld=--loose-innodb-lock-wait-timeout=5 \
  --rows=10 \
  --basedir=/build/bzr/6.0-falcon-team \
  --queries=100000000 \
  --duration=900 \
  --gendata=conf/combinations.zz \
  --grammar=conf/rr.yy \
  --mysqld=--transaction-isolation=READ-COMMITTED
[15 Mar 2009 14:50] Kevin Lewis
As discussed in a separate email, a read-committed verb should ignore a record number from a preselected record list if the record was more recently deleted and committed.
[3 Apr 2009 8:08] Philip Stoev
I observed this error once on a falcon_compare_self scenario - no ALTER, 1 connection, REPEATABLE READ isolation. Maybe it is not just concurrent transactions that would cause records to be lost.
[23 Jun 2009 22:06] Kevin Lewis
I tracked this down to a situation where the server is selecting records from a non-indexed field and sorting them in a read-committed transaction.  After sorting, the server calls StorageInterface::rnd_pos() sending in a record number to be read and returned.  But by this time, another transaction has deleted the record that was previously selected and sorted.  So Falcon reports "can't find record".  But if we are only interested in what is committed, then any record that is deleted while the server is sorting should not be read.
[24 Jun 2009 22:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77093

2745 Kevin Lewis	2009-06-24
      Bug#43650 - The server calls StorageInterface::rnd_pos() when it has a list of records previously read and sorted and it wants to read them a final time.  If the isolation mode is read-committed, the call to fetchVersion will return NULL if the currently committed version is deleted.  This means it was deleted during the transaction.  Since the transaction was able to find a visible version of the record before, that visible version MUST still be around.  So the only reason that fetchVersion would return NULL is if the current visible version is deleted.  In repeatable-read, fetchVersion would return that previous visible record.
      
      So if fetchVersion or fetchForUpdate returns NULL to StorageDatabase::fetch(), instead of returning 
         StorageErrorRecordNotFound - HA_KEY_NOT_FOUND - "can't find record",
      it should return 
         StorageErrorRecordDeleted - HA_ERR_RECORD_DELETED
      so that the server will ignore this record and go  on to the nexxt record.
[24 Jun 2009 22:24] Kevin Lewis
Code reviewed by Ann Harrison