| Bug #41029 | MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table) | ||
|---|---|---|---|
| Submitted: | 25 Nov 2008 19:37 | Modified: | 22 Nov 2010 1:31 |
| Reporter: | Mikhail Izioumtchenko | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 6.0-bzr | OS: | Any |
| Assigned to: | Olav Sandstå | CPU Architecture: | Any |
| Tags: | mrr, optimizer_switch | ||
[26 Nov 2008 9:08]
Valeriy Kravchuk
With 6.0.9-debug I've got a crash on INSERT (2,2,2) from the second session (even with MRR disabled): Version: '6.0.9-alpha-debug' socket: '/tmp/mysql.sock' port: 3306 Source distribution /home2/openxs/dbs/6.0/libexec/mysqld(my_print_stacktrace+0x29)[0x890486c] /home2/openxs/dbs/6.0/libexec/mysqld(handle_segfault+0x271)[0x82f43b7] [0xffffe420] /home2/openxs/dbs/6.0/libexec/mysqld[0x872a067] /home2/openxs/dbs/6.0/libexec/mysqld(row_ins+0x8d)[0x872a261] /home2/openxs/dbs/6.0/libexec/mysqld(row_ins_step+0x144)[0x872a3f5] /home2/openxs/dbs/6.0/libexec/mysqld(row_insert_for_mysql+0x201)[0x872ba8f] /home2/openxs/dbs/6.0/libexec/mysqld(_ZN11ha_innobase9write_rowEPh+0x3ae)[0x86cd59e] /home2/openxs/dbs/6.0/libexec/mysqld(_ZN7handler12ha_write_rowEPh+0x57)[0x8427bd9] /home2/openxs/dbs/6.0/libexec/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x98c)[0x8396bc8] /home2/openxs/dbs/6.0/libexec/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0xad5)[0x83950bb] /home2/openxs/dbs/6.0/libexec/mysqld(_Z21mysql_execute_commandP3THD+0x2d51)[0x8307551] /home2/openxs/dbs/6.0/libexec/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x1d0)[0x830e1fc] /home2/openxs/dbs/6.0/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x84f)[0x8302f4d] /home2/openxs/dbs/6.0/libexec/mysqld(_Z10do_commandP3THD+0x229)[0x83024f5] /home2/openxs/dbs/6.0/libexec/mysqld(handle_one_connection+0x125)[0x8300105] /lib/tls/libpthread.so.0[0x40050aa7] /lib/tls/libc.so.6(__clone+0x5e)[0x40211c2e] 081030 14:43:52 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388572 read_buffer_size=131072 max_used_connections=2 max_threads=151 thread_count=2 connection_count=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 337809 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x9b99388 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x46bcf030 thread_stack 0x30c00 Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x9bab890 = insert into t values(2,2,2) thd->thread_id=2 thd->killed=NOT_KILLED Crash looks related to InnoDB. Please, check.
[26 Nov 2008 13:35]
Mikhail Izioumtchenko
another known problematic area is engine condition pushdown. Could you repeat your test with engine_condition_pushdown=OFF; and see if there's still a crash. What does -debug in 6.0.9-debug mean?
[5 Dec 2008 14:53]
MySQL Verification Team
Thank you for the bug report. Verified as described: miguel@hegel:~/dbs$ 6.0/bin/mysql -uroot --prompt='conn 2 >' test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 6.0.9-alpha-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. conn 2 >use test; Database changed conn 2 >set autocommit=0; Query OK, 0 rows affected (0.00 sec) conn 2 >start transaction; Query OK, 0 rows affected (0.00 sec) conn 2 >select 1; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.00 sec) conn 2 >insert into t values(2,2,2); Query OK, 1 row affected (0.00 sec) conn 2 >select 2; +---+ | 2 | +---+ | 2 | +---+ 1 row in set (0.00 sec) conn 2 >insert into t values(0,0,0); Query OK, 1 row affected (0.00 sec) conn 2 >select 3; +---+ | 3 | +---+ | 3 | +---+ 1 row in set (0.00 sec) conn 2 >select @@tx_isolation; +-----------------+ | @@tx_isolation | +-----------------+ | REPEATABLE-READ | +-----------------+ 1 row in set (0.00 sec) conn 2 >select * from t; +-------+------+------+ | dummy | a | b | +-------+------+------+ | 0 | 0 | 0 | | 1 | 1 | 1 | | 2 | 2 | 2 | | 3 | 3 | 3 | | 5 | 5 | 5 | +-------+------+------+ 5 rows in set (0.00 sec) conn 2 >exit Bye miguel@hegel:~/dbs$ 5.1/bin/mysql -uroot --prompt='conn 2 >' test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.1.31-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. conn 2 >use test; Database changed conn 2 >set autocommit=0; Query OK, 0 rows affected (0.00 sec) conn 2 >start transaction; Query OK, 0 rows affected (0.00 sec) conn 2 >select 1; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.00 sec) conn 2 >insert into t values(2,2,2); ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction conn 2 >select 2; +---+ | 2 | +---+ | 2 | +---+ 1 row in set (0.00 sec) conn 2 >insert into t values(0,0,0); Query OK, 1 row affected (0.00 sec) conn 2 >select 3; +---+ | 3 | +---+ | 3 | +---+ 1 row in set (0.00 sec) conn 2 >select @@tx_isolation; +-----------------+ | @@tx_isolation | +-----------------+ | REPEATABLE-READ | +-----------------+ 1 row in set (0.00 sec) conn 2 >select * from t; +-------+------+------+ | dummy | a | b | +-------+------+------+ | 0 | 0 | 0 | | 1 | 1 | 1 | | 3 | 3 | 3 | | 5 | 5 | 5 | +-------+------+------+ 4 rows in set (0.00 sec) conn 2 >
[8 Oct 2009 14:10]
Guilhem Bichot
goes away / comes back when disabling/enabling MRR in InnoDB
[21 May 2010 13:11]
Olav Sandstå
When MRR is used a second handler object is created. This handler object is a "clone" of the first handler object and the MRR code use this for executing some of the operations.
This second handler object is created in DsMRR_impl::dsmrr_init() (in handler.cc) by "cloning" the initial handler object:
if (!(new_h2= h->clone(thd->mem_root)) ||
new_h2->ha_external_lock(thd, F_RDLCK))
The important thing about the above code is that on this cloned handler object a call to ha_external_lock() is done with a request to use a read lock (F_RDLCK).
Later, this handler (h2) object is used in DsMrr_impl::dsmrr_fill_buffer()
for doing the initial search for the first record in the range interval.
Since the call to ha_external_lock() only requested a read lock the
index search done by InnoDB will be done without setting an exclusive lock
on the search interval. This allows the second transaction to succeed to insert
data within the "read interval" of the first transaction.
[21 May 2010 14:46]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/108878 3182 Olav Sandstaa 2010-05-21 Fix for Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table) This problem occurs due to MRR is creating a second handler object. This handler object is a "clone" of the first/initial handler object and the MRR code use this second handler object for executing some of the operations. The second handler object is created in DsMRR_impl::dsmrr_init() (in handler.cc) by "cloning" the initial handler object: if (!(new_h2= h->clone(thd->mem_root)) || new_h2->ha_external_lock(thd, F_RDLCK)) The important thing about the above code is that on this cloned handler object a call to ha_external_lock() is done with a request to use a read lock (F_RDLCK). When this handler object (h2) is used in DsMrr_impl::dsmrr_fill_buffer() for doing the initial search for the first record in the range interval, the index search done by InnoDB will be done without setting any exclusive lock on the search interval. This allows a second transaction to succeed to insert data within the "read interval" of the first transaction. The fix for this problem is to use the same lock type when calling new_h2->ha_external_lock() as was done on the original handler object. To do this we store (in handler::ha_external_lock()) the original lock type in a new member variable in the handler object (m_lock_type) and use this lock type when "cloning" the handler object in DsMRR_impl::dsmrr_init(). Note to reviewers (and myself): 1. The included test case should be will be included in the innodb_mrr test before the fix is pushed but due to innodb_mrr test being unstable I have included as a separate test case. 2. The final commit/push should also remove the test and result files for this bug from the optimizer_unfixed_bugs suite. @ mysql-test/r/bug41029.result Result file for test of Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table) @ mysql-test/t/bug41029-master.opt Option file for reducing the default lock timeout in order to reduce the time for running the test. @ mysql-test/t/bug41029.test Test for Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table) @ sql/handler.cc When MRR is used a second handler object is cloned based on the original handler object. This second handler object is used for the initial navigation of the underlying storage engine's data. When this second handler object was created a call to ha_external_lock() was done with F_RDLCK as the lock type. Since F_RDLCK was always used this made InnoDB not put gap lock on the search interval and allowed concurrent transactions to insert data into the search interval of transasactions running SELECT FOR UPDATE statements in REPEATABLE READ mode. To ensure the second handler object created when MRR is used is getting the same lock type as the initial handler the following changes are implemented: 1. Extend handler::ha_external_lock() to save the lock type in a member variable (m_lock_type). 2. Extend DsMRR_impl::dsmrr_init() so that it uses the same lock type when cloning a second handler object as used for the initial handler object (instead of always using F_RDLCK). @ sql/handler.h Add a new member variable to the handler class to store the lock type set by the call to handler::ha_external_lock(). This lock type is needed by the MRR implementation when creating a second handler object by cloning the initial handler.
[7 Jun 2010 9:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/110319 3186 Olav Sandstaa 2010-06-07 Fix for Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table) This problem occurs due to MRR is creating a second handler object. This handler object is a "clone" of the first/initial handler object and the MRR code use this second handler object for executing some of the operations. The second handler object is created in DsMRR_impl::dsmrr_init() (in handler.cc) by "cloning" the initial handler object: if (!(new_h2= h->clone(thd->mem_root)) || new_h2->ha_external_lock(thd, F_RDLCK)) The important thing about the above code is that on this cloned handler object a call to ha_external_lock() is done with a request to use a read lock (F_RDLCK). When this handler object (h2) is used in DsMrr_impl::dsmrr_fill_buffer() for doing the initial search for the first record in the range interval, the index search done by InnoDB will be done without setting any exclusive lock on the search interval. This allows a second transaction to succeed to insert data within the "read interval" of the first transaction. The fix for this problem is to use the same lock type when calling new_h2->ha_external_lock() as was done on the original handler object. To do this we store (in handler::ha_external_lock()) the original lock type in a new member variable in the handler object (m_lock_type) and use this lock type when "cloning" the handler object in DsMRR_impl::dsmrr_init(). @ mysql-test/r/innodb_mrr.result Result file for test of Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table). @ mysql-test/suite/optimizer_unfixed_bugs/r/bug41029.result Test case moved to innodb_mrr test. @ mysql-test/suite/optimizer_unfixed_bugs/t/bug41029-master.opt Test case moved to innodb_mrr test. @ mysql-test/suite/optimizer_unfixed_bugs/t/bug41029.test Test case moved to innodb_mrr test. @ mysql-test/t/innodb_mrr-master.opt Add an option file for the innodb_mrr test. For testing of Bug#41029 we reduce the default InnoDB lock timeout in order to reduce the time for running the test. @ mysql-test/t/innodb_mrr.test Test for Bug#41029 MRR: SELECT FOR UPDATE fails to lock gaps (InnoDB table). @ sql/handler.cc When MRR is used, a second handler object is cloned based on the original handler object. This second handler object is used for the initial navigation of the underlying storage engine's data. When this second handler object was created a call to ha_external_lock() was done with F_RDLCK as the lock type. Since F_RDLCK was always used, this made InnoDB not put gap lock on the search interval and allowed concurrent transactions to insert data into the search interval of transasactions running SELECT FOR UPDATE statements in REPEATABLE READ mode. To ensure the second handler object created when MRR is used is getting the same lock type as the initial handler, the following changes are implemented: 1. Extend handler::ha_external_lock() to save the lock type in a member variable (m_lock_type). 2. Extend DsMRR_impl::dsmrr_init() so that it uses the same lock type when cloning a second handler object as used for the initial handler object (instead of always using F_RDLCK). @ sql/handler.h Add a new member variable to the handler class to store the lock type set by the call to handler::ha_external_lock(). This lock type is needed by the MRR implementation when creating a second handler object by cloning the initial handler.
[16 Aug 2010 6:38]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:14]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[22 Nov 2010 1:31]
Paul DuBois
Bug is not in any released 5.6.x version. No changelog entry needed.

Description: in the test case below, SELECT FOR UPDATE in session 1 results in two record (no gap) locks, when MRR is enabled, in gap locks, when MRR is disabled, resulting in different behavior of session 2. It also means that session 2 behaves differently in 5.1 and 6.0+MRR. The fact is, MRR or no MRR, this kind of statement must lock the gap if transaction isolation level is REPEATABLE READ. How to repeat: The test was provided by Paul DuBois, with my minor modifications. Session 1: set global binlog_format=row; set autocommit=0; use test; drop table if exists t; create table t (dummy int primary key, a int unique, b int) engine innodb; insert into t values(1,1,1),(3,3,3),(5,5,5); commit; set session transaction isolation level repeatable read; select @@tx_isolation; start transaction; select * from t where a > 2 for update; now we start session 2 and execute: use test; set autocommit=0; start transaction; select 1; insert into t values(2,2,2); select 2; insert into t values(0,0,0); select 3; select @@tx_isolation; select * from t; in 5.1, this will timeout on the first insert, and will never insert the (2,2,2) record. in 6.0, it will not timeout and will insert the (2,2,2) record. And, if you add set optimizer_use_mrr=disable; to the beginning of session 1, session 2 will timeout on the first insert even in 6.0 in both 5.1 and 6.0 cases the latest launchpad code was used. Suggested fix: None at the moment. Here's an explanation, however: MRR tricks InnoDB into thinking the search is unique so no gap locks are necessary, as seen from the following stack: (1:28:51 PM) michael: #0 sel_set_rec_lock (rec=0xb556809d "\200", index=0xb62b8468, offsets=0xaf4ff1e0, mode=5, type=1024, thr=0xb62ac0e8) at row/row0sel.c:841 ### in 5.1 or with no MRR sel_set_rec_lock is called with type=0 #1 0x0846086d in row_search_for_mysql (buf=0xa3a1d78 "ÿ", mode=2, prebuilt=0xb62b6468, match_mode=1, direction=0) at row/row0sel.c:4028 #2 0x083ed9a8 in ha_innobase::index_read (this=0xa3a1be0, buf=0xa3a1d78 "ÿ", key_ptr=0xaf450021 "\003", key_len=4, find_flag=HA_READ_KEY_EXACT) at handler/ha_innodb.cc:4481 ### note HA_READ_KEY_EXACT resulting in unique == 1 in row_search_for_mysql() ### in turn leading to type=1024 passed to sel_set_rec_lock() #3 0x083ee1b8 in ha_innobase::rnd_pos (this=0xa3a1be0, buf=0xa3a1d78 "ÿ", pos=0xaf450021 "\003") at handler/ha_innodb.cc:4896 #4 0x082fc725 in DsMrr_impl::dsmrr_next (this=0xa3a1d58, h=0xa3a1be0, range_info=0xaf4ff914) at handler.cc:4494 #5 0x083f2631 in ha_innobase::multi_range_read_next (this=0xa3a1be0, range_info=0xaf4ff914) at handler/ha_innodb.cc:8539 #6 0x082eaf48 in QUICK_RANGE_SELECT::get_next (this=0xa3a4ca8) at opt_range.cc:8596 #7 0x082f35aa in rr_quick (info=0xa3d2de4) at records.cc:322 #8 0x08280b5e in join_init_read_record (tab=0xa3d2da0) at sql_select.cc:14557 #9 0x0827f242 in sub_select (join=0xa3c9118, join_tab=0xa3d2da0, end_of_records=false) at sql_select.cc:13723 #10 0x0827eeb1 in do_select (join=0xa3c9118, fields=0xa3a6ad4, table=0x0, procedure=0x0) at sql_select.cc:13470 #11 0x0826a139 in JOIN::exec (this=0xa3c9118) at sql_select.cc:2835 #12 0x0826a61a in mysql_select (thd=0xa3a5648, rref_pointer_array=0xa3a6b44, tables=0xa3c8a78, wild_num=1, fields=@0xa3a6ad4, conds=0xa3c8fb8, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2149337600, result=0xa3c9108, unit=0xa3a67b0, select_lex=0xa3a6a40) at sql_select.cc:3025 some suggestions/comments from Heikki: (1:33:33 PM) Heikki: SELECT ... LOCK IN SHARE MODE, ... FOR UPDATE (1:34:42 PM) Heikki: MRR can break REPEATABLE READ (1:35:08 PM) Heikki: It must not be used for UPDATE or DELETE in REPEATABLE READ (1:35:52 PM) Heikki: MRR forgets that a serializable database must lock EVERYTHING it reads (1:36:17 PM) Heikki: MRR thinks it is allowed to peek at rows, without locking (1:36:32 PM) Heikki: That is not allowed in serializable execution