MySQL Bugs: #31310: Locked rows silently skipped in read-committed isolation level

Bug #31310	Locked rows silently skipped in read-committed isolation level
Submitted:	1 Oct 2007 9:46	Modified:	31 Oct 2007 0:02
Reporter:	Domas Mituzas	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server	Severity:	S1 (Critical)
Version:	5.1-bk,5.1.22,5.1.20	OS:	Any
Assigned to:	Gleb Shchepa	CPU Architecture:	Any
Tags:	bfsm_2007_10_18, corruption, regression

Description:
In read-committed isolation, locked rows are skipped by update statements, rather than being blocked on. This leads to wrong data on table...

Verified at 20071001 5.1-bk 
This is regression, 5.0 works properly. 

How to repeat:
SESSION 1:
mysql> create table ist1 (a int primary key, b int) engine=InnoDB;
Query OK, 0 rows affected (0.18 sec)

mysql> insert into ist1 values (1,2);
Query OK, 1 row affected (0.14 sec)

mysql> begin;
Query OK, 0 rows affected (0.30 sec)

mysql> update ist1 set b=12 where a=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

SESSION 2:

mysql> set session transaction isolation level read committed;
Query OK, 0 rows affected (0.12 sec)

mysql> begin;
Query OK, 0 rows affected (0.23 sec)

mysql> update ist1 set b=21 where a=1;
Query OK, 0 rows affected (0.32 sec)
Rows matched: 0  Changed: 0  Warnings: 0

mysql> select * from ist1;
+---+------+
| a | b    |
+---+------+
| 1 |    2 | 
+---+------+
1 row in set (0.00 sec)

Suggested fix:
wait for lock, instead of skipping the row.

This is a critical bug.

In 5.1, READ COMMITTED removes most of the gap locking and locking of nonmatching rows if there is a WHERE condition. But it should not skip locked rows!

Also verified with 5.1.22.

I can repeat on the latest 5.1.23/InnoDB-zip.

As asked by Jeffrey, repeated on 5.1.20

The idea in a 'semi-consistent read' is that if a row is locked, then InnoDB returns the latest committed version, so that MySQL can decide if the row matches the WHERE condition in the UPDATE.

sql_update.cc:
"
  while (!(error=info.read_record(&info)) && !thd->killed)
  {
    if (!(select && select->skip_record()))
    {
      if (table->file->was_semi_consistent_read())
        continue;  /* repeat the read of the same row if it still exists */
"

If the row matches the WHERE condition, MySQL reads the same row again, and InnoDB this time locks it, or waits for a lock on the row.

The bug here is that MySQL's read_multi_range_next() keeps its own information about the cursor position, and refuses to read the row again because it thinks that the cursor is already at the end of the potential result set.

Breakpoint 3, QUICK_RANGE_SELECT::get_next (this=0x198c7a0)
    at opt_range.cc:8262
8262      if (in_ror_merged_scan)
Current language:  auto; currently c++
(gdb) next
8273        if (in_range)
(gdb)
8276          result= file->read_multi_range_next(&mrange);
(gdb) step
handler::read_multi_range_next (this=0x19b2d28, found_range_p=0x440c8280)
    at handler.cc:3130
3130      DBUG_ENTER("handler::read_multi_range_next");
(gdb) next
3133      DBUG_ASSERT(multi_range_curr < multi_range_end);
(gdb)
3138        if (multi_range_curr->range_flag != (UNIQUE_RANGE | EQ_RANGE))
(gdb)
3152          result= HA_ERR_END_OF_FILE;
(gdb) next
3156        for (multi_range_curr++;
(gdb)
3135      do
(gdb)
3173      *found_range_p= multi_range_curr;
(gdb)
3174      DBUG_PRINT("exit",("handler::read_multi_range_next: result %d", result));
(gdb)
3175      DBUG_RETURN(result);
(gdb)
3176    }
(gdb)
QUICK_RANGE_SELECT::get_next (this=0x198c7a0) at opt_range.cc:8277
8277          if (result != HA_ERR_END_OF_FILE)
(gdb)
8281        uint count= min(multi_range_length, ranges.elements -
(gdb)
8283        if (count == 0)
(gdb)
8286          in_range= FALSE;
(gdb)
8287          if (in_ror_merged_scan)
(gdb)
8289          DBUG_RETURN(HA_ERR_END_OF_FILE);
(gdb)

How to fix this: MySQL code should be aware that it may need read the same row twice, even in a unique equality search on an index.

A quick fix is to disable InnoDB's semi-consistent read optimization in the unique equality search. I do not know yet if the same bug exists for a non-unique range search.

Classifying this as a MySQL Server bug.

We want to preserve the 'semi-consistent' optimization of UPDATE ... WHERE ...

MySQL code must be aware it may need to re-read the same row twice, also in a unique equality search on an index. opt_range.cc and other relevant files should be checked for this assumption.

Or, could we fix this inside InnoDB? That is very hard because it may happen that the last committed row version that we returned to MySQL is already obsolete. We really need to re-read the row to know if it matches the WHERE condition.

This code in handler.cc assumes that the same row cannot be read twice:

int handler::read_multi_range_next(KEY_MULTI_RANGE **found_range_p)
{
  int result;
  DBUG_ENTER("handler::read_multi_range_next");

  /* We should not be called after the last call returned EOF. */
  DBUG_ASSERT(multi_range_curr < multi_range_end);

  do
  {
    /* Save a call if there can be only one row in range. */
    if (multi_range_curr->range_flag != (UNIQUE_RANGE | EQ_RANGE))
    {
      result= read_range_next();

      /* On success or non-EOF errors jump to the end. */
      if (result != HA_ERR_END_OF_FILE)
        break;
    }
    else
    {
      /*
        We need to set this for the last range only, but checking this
        condition is more expensive than just setting the result code.
      */
      result= HA_ERR_END_OF_FILE;
    }

> If the row matches the WHERE condition, MySQL reads the same row again, 
> and InnoDB this time locks it, or waits for a lock on the row.

A question from someone seeing all this for the first time: why is the API designed so we need to "read the same row again" and not have a separate call 

  handler->lock_row() 

that would mean "the last returned row has passed the WHERE clause, please lock it"?  The "read the same row again" solution seems like making one API call serve two different purposes?

Whoever merges this into 5.2 will need to check how this can be made to work with DS-MRR, or verify that DS-MRR is disabled for such UPDATE queries

Sergey,

"
A question from someone seeing all this for the first time: why is the API designed so we
need to "read the same row again" and not have a separate call 

  handler->lock_row() 

that would mean "the last returned row has passed the WHERE clause, please lock it"?  The
"read the same row again" solution seems like making one API call serve two different
purposes?
"

the returned row is the last committed version. It may have changed by the time we decide to lock the row. Therefore, it is not enough just to lock the cursor row. The row also has to be refetched so that the MySQL interpreter can test the WHERE conditions.

Regards,

Heikki

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/35009

ChangeSet@1.2580, 2007-10-05 23:27:12+05:00, gshchepa@gleb.loc +3 -0
  Fixed bug #31310.
  Locked rows of the InnoDB storage was silently skipped in the read-committed
  isolation level.
  
  QUICK_RANGE_SELECT lacks second (blocking) read of record that was read
  semi-consistently and just skip it.
  
  The handler::read_multi_range_next method has been modified
  to retry previous range if the previous read was semi-consistent.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/35095

ChangeSet@1.2580, 2007-10-08 17:09:38+05:00, gshchepa@gleb.loc +3 -0
  Fixed bug #31310.
  Locked rows of the InnoDB storage was silently skipped in the read-committed
  isolation level.
  
  QUICK_RANGE_SELECT lacks second (blocking) read of record that was read
  semi-consistently and just skip it.
  
  The handler::read_multi_range_next method has been modified
  to retry reading of the previous record after the semi-consistent
  read.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/35139

ChangeSet@1.2587, 2007-10-09 01:07:15+05:00, gshchepa@gleb.loc +3 -0
  Fixed bug #31310.
  Locked rows of the InnoDB storage was silently skipped in the read-committed
  isolation level.
  
  QUICK_RANGE_SELECT for unique ranges lacks second (blocking) read
  of the record that was read semi-consistently and just skip it.
  
  The handler::read_multi_range_next method has been modified
  to retry previous unique range if the previous read was
  semi-consistent.

We use ha_innobase::index_read() with previously used key to re-read the record after the semi-consistent read of this record.

Pushed into 5.1.23-beta

Pushed into 6.0.4-alpha

Noted in 5.1.23, 6.0.4 changelogs.

For InnoDB tables with READ COMMITTED isolation level, UPDATE
statements skipped rows locked by another transaction, rather than
waiting for the locks to be released.

Bug #33282 is set as duplicate of these bug here.