MySQL Bugs: #41029: MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table)

Bug #41029	MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table)
Submitted:	25 Nov 2008 19:37	Modified:	22 Nov 2010 1:31
Reporter:	Mikhail Izioumtchenko	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	6.0-bzr	OS:	Any
Assigned to:	Olav Sandstå	CPU Architecture:	Any
Tags:	mrr, optimizer_switch

Description:
in the test case below, SELECT FOR UPDATE in session 1 results
in two record (no gap) locks, when MRR is enabled, in gap locks,
when MRR is disabled, resulting in different behavior of session 2.
It also means that session 2 behaves differently in 5.1 and 6.0+MRR.
The fact is, MRR or no MRR, this kind of statement must lock the gap
if transaction isolation level is REPEATABLE READ. 

How to repeat:
The test was provided by Paul DuBois, with my minor modifications.
Session 1:

set global binlog_format=row;
set autocommit=0;
use test;
drop table if exists t;
create table t (dummy int primary key, a int unique, b int) engine innodb;
insert into t values(1,1,1),(3,3,3),(5,5,5);
commit;
set session transaction isolation level repeatable read;
select @@tx_isolation;
start transaction;
select * from t where a > 2 for update;

now we start session 2 and execute:

use test;
set autocommit=0;
start transaction;
select 1;
insert into t values(2,2,2);
select 2;
insert into t values(0,0,0);
select 3;
select @@tx_isolation;
select * from t;

in 5.1, this will timeout on the first insert, and will never insert
the (2,2,2) record.
in 6.0, it will not timeout and will insert the (2,2,2) record.
And, if you add 
set optimizer_use_mrr=disable;
to the beginning of session 1, session 2 will timeout on the first insert even in 6.0

in both 5.1 and 6.0 cases the latest launchpad code was used.

Suggested fix:
None at the moment. Here's an explanation, however:
MRR tricks InnoDB into thinking the search is unique so no gap locks
are necessary, as seen from the following stack:

(1:28:51 PM) michael: #0  sel_set_rec_lock (rec=0xb556809d "\200", index=0xb62b8468,
    offsets=0xaf4ff1e0, mode=5, type=1024, thr=0xb62ac0e8) at row/row0sel.c:841

### in 5.1 or with no MRR sel_set_rec_lock is called with type=0

#1  0x0846086d in row_search_for_mysql (buf=0xa3a1d78 "ÿ", mode=2,
    prebuilt=0xb62b6468, match_mode=1, direction=0) at row/row0sel.c:4028
#2  0x083ed9a8 in ha_innobase::index_read (this=0xa3a1be0, buf=0xa3a1d78 "ÿ",
    key_ptr=0xaf450021 "\003", key_len=4, find_flag=HA_READ_KEY_EXACT)
    at handler/ha_innodb.cc:4481

### note HA_READ_KEY_EXACT resulting in unique == 1 in row_search_for_mysql()
### in turn leading to type=1024 passed to sel_set_rec_lock()
#3  0x083ee1b8 in ha_innobase::rnd_pos (this=0xa3a1be0, buf=0xa3a1d78 "ÿ",
    pos=0xaf450021 "\003") at handler/ha_innodb.cc:4896
#4  0x082fc725 in DsMrr_impl::dsmrr_next (this=0xa3a1d58, h=0xa3a1be0,
    range_info=0xaf4ff914) at handler.cc:4494
#5  0x083f2631 in ha_innobase::multi_range_read_next (this=0xa3a1be0,
    range_info=0xaf4ff914) at handler/ha_innodb.cc:8539
#6  0x082eaf48 in QUICK_RANGE_SELECT::get_next (this=0xa3a4ca8)
    at opt_range.cc:8596
#7  0x082f35aa in rr_quick (info=0xa3d2de4) at records.cc:322
#8  0x08280b5e in join_init_read_record (tab=0xa3d2da0) at sql_select.cc:14557
#9  0x0827f242 in sub_select (join=0xa3c9118, join_tab=0xa3d2da0,
    end_of_records=false) at sql_select.cc:13723
#10 0x0827eeb1 in do_select (join=0xa3c9118, fields=0xa3a6ad4, table=0x0,
    procedure=0x0) at sql_select.cc:13470
#11 0x0826a139 in JOIN::exec (this=0xa3c9118) at sql_select.cc:2835
#12 0x0826a61a in mysql_select (thd=0xa3a5648, rref_pointer_array=0xa3a6b44,
    tables=0xa3c8a78, wild_num=1, fields=@0xa3a6ad4, conds=0xa3c8fb8,
    og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0,
    select_options=2149337600, result=0xa3c9108, unit=0xa3a67b0,
    select_lex=0xa3a6a40) at sql_select.cc:3025

some suggestions/comments from Heikki:

(1:33:33 PM) Heikki: SELECT ... LOCK IN SHARE MODE, ... FOR UPDATE
(1:34:42 PM) Heikki: MRR can break REPEATABLE READ
(1:35:08 PM) Heikki: It must not be used for UPDATE or DELETE in REPEATABLE READ
(1:35:52 PM) Heikki: MRR forgets that a serializable database must lock EVERYTHING it reads
(1:36:17 PM) Heikki: MRR thinks it is allowed to peek at rows, without locking
(1:36:32 PM) Heikki: That is not allowed in serializable execution

With 6.0.9-debug I've got a crash on INSERT (2,2,2) from the second session (even with MRR disabled):

Version: '6.0.9-alpha-debug'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
/home2/openxs/dbs/6.0/libexec/mysqld(my_print_stacktrace+0x29)[0x890486c]
/home2/openxs/dbs/6.0/libexec/mysqld(handle_segfault+0x271)[0x82f43b7]
[0xffffe420]
/home2/openxs/dbs/6.0/libexec/mysqld[0x872a067]
/home2/openxs/dbs/6.0/libexec/mysqld(row_ins+0x8d)[0x872a261]
/home2/openxs/dbs/6.0/libexec/mysqld(row_ins_step+0x144)[0x872a3f5]
/home2/openxs/dbs/6.0/libexec/mysqld(row_insert_for_mysql+0x201)[0x872ba8f]
/home2/openxs/dbs/6.0/libexec/mysqld(_ZN11ha_innobase9write_rowEPh+0x3ae)[0x86cd59e]
/home2/openxs/dbs/6.0/libexec/mysqld(_ZN7handler12ha_write_rowEPh+0x57)[0x8427bd9]
/home2/openxs/dbs/6.0/libexec/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x98c)[0x8396bc8]
/home2/openxs/dbs/6.0/libexec/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0xad5)[0x83950bb]
/home2/openxs/dbs/6.0/libexec/mysqld(_Z21mysql_execute_commandP3THD+0x2d51)[0x8307551]
/home2/openxs/dbs/6.0/libexec/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x1d0)[0x830e1fc]
/home2/openxs/dbs/6.0/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x84f)[0x8302f4d]
/home2/openxs/dbs/6.0/libexec/mysqld(_Z10do_commandP3THD+0x229)[0x83024f5]
/home2/openxs/dbs/6.0/libexec/mysqld(handle_one_connection+0x125)[0x8300105]
/lib/tls/libpthread.so.0[0x40050aa7]
/lib/tls/libc.so.6(__clone+0x5e)[0x40211c2e]
081030 14:43:52 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388572
read_buffer_size=131072
max_used_connections=2
max_threads=151
thread_count=2
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 337809 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x9b99388
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x46bcf030 thread_stack 0x30c00
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x9bab890 = insert into t values(2,2,2)
thd->thread_id=2
thd->killed=NOT_KILLED

Crash looks related to InnoDB. Please, check.

another known problematic area is engine condition pushdown.
Could you repeat your test with

engine_condition_pushdown=OFF;

and see if there's still a crash.
What does -debug in 6.0.9-debug mean?

Thank you for the bug report. Verified as described:

miguel@hegel:~/dbs$ 6.0/bin/mysql -uroot --prompt='conn 2 >' test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 6.0.9-alpha-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

conn 2 >use test;
Database changed
conn 2 >set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

conn 2 >start transaction;
Query OK, 0 rows affected (0.00 sec)

conn 2 >select 1;
+---+
| 1 |
+---+
| 1 | 
+---+
1 row in set (0.00 sec)

conn 2 >insert into t values(2,2,2);
Query OK, 1 row affected (0.00 sec)

conn 2 >select 2;
+---+
| 2 |
+---+
| 2 | 
+---+
1 row in set (0.00 sec)

conn 2 >insert into t values(0,0,0);
Query OK, 1 row affected (0.00 sec)

conn 2 >select 3;
+---+
| 3 |
+---+
| 3 | 
+---+
1 row in set (0.00 sec)

conn 2 >select @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ | 
+-----------------+
1 row in set (0.00 sec)

conn 2 >select * from t;
+-------+------+------+
| dummy | a    | b    |
+-------+------+------+
|     0 |    0 |    0 | 
|     1 |    1 |    1 | 
|     2 |    2 |    2 | 
|     3 |    3 |    3 | 
|     5 |    5 |    5 | 
+-------+------+------+
5 rows in set (0.00 sec)

conn 2 >exit
Bye
miguel@hegel:~/dbs$ 5.1/bin/mysql -uroot --prompt='conn 2 >' test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.31-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

conn 2 >use test;
Database changed
conn 2 >set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

conn 2 >start transaction;
Query OK, 0 rows affected (0.00 sec)

conn 2 >select 1;
+---+
| 1 |
+---+
| 1 | 
+---+
1 row in set (0.00 sec)

conn 2 >insert into t values(2,2,2);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
conn 2 >select 2;
+---+
| 2 |
+---+
| 2 | 
+---+
1 row in set (0.00 sec)

conn 2 >insert into t values(0,0,0);
Query OK, 1 row affected (0.00 sec)

conn 2 >select 3;
+---+
| 3 |
+---+
| 3 | 
+---+
1 row in set (0.00 sec)

conn 2 >select @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ | 
+-----------------+
1 row in set (0.00 sec)

conn 2 >select * from t;
+-------+------+------+
| dummy | a    | b    |
+-------+------+------+
|     0 |    0 |    0 | 
|     1 |    1 |    1 | 
|     3 |    3 |    3 | 
|     5 |    5 |    5 | 
+-------+------+------+
4 rows in set (0.00 sec)

conn 2 >

goes away / comes back when disabling/enabling MRR in InnoDB

When MRR is used a second handler object is created. This handler object is a "clone" of the first handler object and the MRR code use this for executing some of the operations. 

This second handler object is created in DsMRR_impl::dsmrr_init() (in handler.cc) by "cloning" the initial handler object:

   if (!(new_h2= h->clone(thd->mem_root)) || 
        new_h2->ha_external_lock(thd, F_RDLCK))

The important thing about the above code is that on this cloned handler object a call to ha_external_lock() is done with a request to use a read lock (F_RDLCK).

Later, this handler (h2) object is used in DsMrr_impl::dsmrr_fill_buffer()
for doing the initial search for the first record in the range interval. 
Since the call to ha_external_lock() only requested a read lock the
index search done by InnoDB will be done without setting an exclusive lock
on the search interval. This allows the second transaction to succeed to insert
data within the "read interval" of the first transaction.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108878

3182 Olav Sandstaa	2010-05-21
      Fix for Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table)
      
      This problem occurs due to MRR is creating a second handler
      object. This handler object is a "clone" of the first/initial handler
      object and the MRR code use this second handler object for executing
      some of the operations.
      
      The second handler object is created in DsMRR_impl::dsmrr_init() (in
      handler.cc) by "cloning" the initial handler object:
      
         if (!(new_h2= h->clone(thd->mem_root)) || 
              new_h2->ha_external_lock(thd, F_RDLCK))
      
      The important thing about the above code is that on this cloned
      handler object a call to ha_external_lock() is done with a request to
      use a read lock (F_RDLCK). When this handler object (h2) is used in
      DsMrr_impl::dsmrr_fill_buffer() for doing the initial search for the
      first record in the range interval, the index search done by InnoDB
      will be done without setting any exclusive lock on the search
      interval. This allows a second transaction to succeed to insert data
      within the "read interval" of the first transaction.
      
      The fix for this problem is to use the same lock type when calling
      new_h2->ha_external_lock() as was done on the original handler
      object. To do this we store (in handler::ha_external_lock()) the
      original lock type in a new member variable in the handler object
      (m_lock_type) and use this lock type when "cloning" the handler object
      in DsMRR_impl::dsmrr_init().
      
      Note to reviewers (and myself):
      
      1. The included test case should be will be included in the innodb_mrr
         test before the fix is pushed but due to innodb_mrr test being unstable
         I have included as a separate test case.
      
      2. The final commit/push should also remove the test and result files for
         this bug from the optimizer_unfixed_bugs suite.
     @ mysql-test/r/bug41029.result
        Result file for test of Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table)
     @ mysql-test/t/bug41029-master.opt
        Option file for reducing the default lock timeout in order to reduce the time for running the test.
     @ mysql-test/t/bug41029.test
        Test for Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table)
     @ sql/handler.cc
        When MRR is used a second handler object is cloned based on the original 
        handler object. This second handler object is used for the initial navigation 
        of the underlying storage engine's data. When this second handler object
        was created a call to ha_external_lock() was done with F_RDLCK as the
        lock type. Since F_RDLCK was always used this made InnoDB
        not put gap lock on the search interval and allowed concurrent transactions
        to insert data into the search interval of transasactions running SELECT 
        FOR UPDATE statements in REPEATABLE READ mode.
        
        To ensure the second handler object created when MRR is used is getting the
        same lock type as the initial handler the following changes are implemented:
        
        1. Extend handler::ha_external_lock() to save the lock type in a member variable
           (m_lock_type).
        2. Extend DsMRR_impl::dsmrr_init() so that it uses the same lock type when
           cloning a second handler object as used for the initial handler object
           (instead of always using F_RDLCK).
     @ sql/handler.h
        Add a new member variable to the handler class to store the lock type set by
        the call to handler::ha_external_lock(). This lock type is needed by the MRR
        implementation when creating a second handler object by cloning the initial
        handler.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110319

3186 Olav Sandstaa	2010-06-07
      Fix for Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table)
            
      This problem occurs due to MRR is creating a second handler
      object. This handler object is a "clone" of the first/initial handler
      object and the MRR code use this second handler object for executing
      some of the operations.
            
      The second handler object is created in DsMRR_impl::dsmrr_init() (in
      handler.cc) by "cloning" the initial handler object:
            
           if (!(new_h2= h->clone(thd->mem_root)) || 
               new_h2->ha_external_lock(thd, F_RDLCK))
            
      The important thing about the above code is that on this cloned
      handler object a call to ha_external_lock() is done with a request to
      use a read lock (F_RDLCK). When this handler object (h2) is used in
      DsMrr_impl::dsmrr_fill_buffer() for doing the initial search for the
      first record in the range interval, the index search done by InnoDB
      will be done without setting any exclusive lock on the search
      interval. This allows a second transaction to succeed to insert data
      within the "read interval" of the first transaction.
            
      The fix for this problem is to use the same lock type when calling
      new_h2->ha_external_lock() as was done on the original handler
      object. To do this we store (in handler::ha_external_lock()) the
      original lock type in a new member variable in the handler object
      (m_lock_type) and use this lock type when "cloning" the handler object
      in DsMRR_impl::dsmrr_init().
     @ mysql-test/r/innodb_mrr.result
        Result file for test of Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table).
     @ mysql-test/suite/optimizer_unfixed_bugs/r/bug41029.result
        Test case moved to innodb_mrr test.
     @ mysql-test/suite/optimizer_unfixed_bugs/t/bug41029-master.opt
        Test case moved to innodb_mrr test.
     @ mysql-test/suite/optimizer_unfixed_bugs/t/bug41029.test
        Test case moved to innodb_mrr test.
     @ mysql-test/t/innodb_mrr-master.opt
        Add an option file for the innodb_mrr test. For testing of Bug#41029 
        we reduce the default InnoDB lock timeout in order to reduce the time
        for running the test.
     @ mysql-test/t/innodb_mrr.test
        Test for Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table).
     @ sql/handler.cc
        When MRR is used, a second handler object is cloned based on the original 
        handler object. This second handler object is used for the initial navigation 
        of the underlying storage engine's data. When this second handler object
        was created a call to ha_external_lock() was done with F_RDLCK as the
        lock type. Since F_RDLCK was always used, this made InnoDB
        not put gap lock on the search interval and allowed concurrent transactions
        to insert data into the search interval of transasactions running SELECT 
        FOR UPDATE statements in REPEATABLE READ mode.
                
        To ensure the second handler object created when MRR is used is getting the
        same lock type as the initial handler, the following changes are implemented:
                
         1. Extend handler::ha_external_lock() to save the lock type in a member variable
            (m_lock_type).
         2. Extend DsMRR_impl::dsmrr_init() so that it uses the same lock type when
            cloning a second handler object as used for the initial handler object
            (instead of always using F_RDLCK).
     @ sql/handler.h
        Add a new member variable to the handler class to store the lock type set by
        the call to handler::ha_external_lock(). This lock type is needed by the MRR
        implementation when creating a second handler object by cloning the initial
        handler.

Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)

Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)

Bug is not in any released 5.6.x version. No changelog entry needed.