Description:
Initially the bug was found on slave during replication. The bug caused "Lock wait timeout", as a result slave thread stopped. But the following test case does not use replication at all because it's not so important if the sequence of transactions and statements is executed by slave thread or master, what is important is the sequence itself and records layout inside of B+-tree. As slave and master can have different data layout inside of B+-tree, there can be a situation when the same sequence of transactions and statements does not cause the lock on master, but causes the lock on slaves, and vice-versa.
The isolation level is READ COMMITTED, what means there should not be gap locks except certain cases.
The sequence is the following:
1) Set S-lock on some record R1 (which can be caused by foreign checking, for example).
2) Cause page split for the page which contains R1. In the current test right-split is caused, what means that the new page is created to the right of the old page. R1 must be located so that during split process it is moved to the new right page at the first position, i.e., just after infinum.
See:
---------------
rec_t*
btr_page_split_and_insert(...) {
...
} else if (btr_page_get_split_rec_to_right(cursor, &split_rec)) {
direction = FSP_UP;
hint_page_no = page_no + 1;
} else if (btr_page_get_split_rec_to_left(cursor, &split_rec)) {
...
left_block = block;
right_block = new_block;
if (!dict_table_is_locking_disabled(cursor->index->table)) {
lock_update_split_right(right_block, left_block);
}
....
}
----------------
3) If the above condition is true, then R1 S-lock will be copied to the supremum of left-half page, which will cause gap lock.
See
---------------
void
lock_update_split_right(...)
{
...
/* Inherit the locks to the supremum of left page from the successor
of the infimum on right page */
lock_rec_inherit_to_gap(left_block, right_block,
PAGE_HEAP_NO_SUPREMUM, heap_no);
...
}
---------------
There are several questions which have not yet been covered:
1) Is it possible to repeat the case with non-XA transactions? What is about the same but on slave? The XA transactions in the following test case were used to simplify work on the test, because the initial case used XA-transactions too.
2) This test shows only right-split case, the same issue might also be for left-split case, it needs to be checked.
See also https://bugs.launchpad.net/percona-server/+bug/1735555.
How to repeat:
Just copy the following test case in mysql-test/suite/innodb/t and run it with mtr.
=====================================
--source include/have_innodb.inc
--eval SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;
CREATE TABLE t1 (t1_pk DECIMAL(20,0) PRIMARY KEY , t1_blob BLOB) ENGINE=InnoDB;
--echo # Initial filling
INSERT INTO t1 VALUES (10000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (20000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (30000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (40000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (50000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (60000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (70000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (80000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (90000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (100000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (110000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (120000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (130000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (140000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (150000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (160000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (170000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (180000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (190000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (200000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (210000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (220000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (230000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (240000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (250000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (260000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (270000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (280000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (290000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (300000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (310000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (320000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (330000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (340000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (350000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (360000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (370000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (380000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (390000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (400000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (410000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (420002, REPEAT("a", 1082));
--echo # Insert records somewhere in the middle to cause page split
INSERT INTO t1 VALUES (18100, REPEAT("a", 1082));
INSERT INTO t1 VALUES (18200, REPEAT("a", 1082));
INSERT INTO t1 VALUES (18300, REPEAT("a", 1082));
INSERT INTO t1 VALUES (18400, REPEAT("a", 1082));
INSERT INTO t1 VALUES (18500, REPEAT("a", 1082));
INSERT INTO t1 VALUES (18600, REPEAT("a", 1082));
INSERT INTO t1 VALUES (18700, REPEAT("a", 1082));
-- echo # The page is full now, the next insert will cause page split
--connect(con2,localhost,root)
CREATE TABLE t2 (t2_pk INT PRIMARY KEY, t1_pk DECIMAL(20,0),
FOREIGN KEY (t1_pk) REFERENCES t1 (t1_pk)) ENGINE=InnoDB;
XA START '2';
--echo # Create S-lock on t1 record with pk = 30000, this record will be the
--echo # first record on the right-half page during future splitting,
--echo # pay attention, this is not a gap lock,
--echo # so this lock is permitted for READ COMMITTED
INSERT INTO t2 VALUES (1, 30000);
XA END '2';
XA PREPARE '2';
--connect (con1,localhost,root)
XA START '1';
--echo # Page right-split is here, the first record of new right-half page
--echo # is the record with pk = 30000, which was previosly S-locked,
--echo # the S-lock from this page is copied to the supremum of left-half page
--echo # causing gap-lock which is forbidden for READ COMMITED
INSERT INTO t1 VALUES (18800, REPEAT("a", 1082));
XA END '1';
XA PREPARE '1';
--connection default
XA START 'default';
--echo # Trying to insert something into the locked gap,
--echo # getting lock wait timeout here despite READ COMMITTED
INSERT INTO t1 VALUES (29900, REPEAT("a", 1082));
XA END 'default';
XA PREPARE 'default';
XA COMMIT 'default'
--connection con1
XA COMMIT '1';
--connection con2
XA COMMIT '2';
--connection con0
DROP TABLE t2;
DROP TABLE t1;
=============================
Suggested fix:
I'm not sure if my suggestion is correct because I don't fully understand the following logic:
=====================
static
void
lock_rec_inherit_to_gap()
{
...
if (!lock_rec_get_insert_intention(lock)
&& !((srv_locks_unsafe_for_binlog
|| lock->trx->isolation_level
<= TRX_ISO_READ_COMMITTED)
&& lock_get_mode(lock) ==
(lock->trx->duplicates ? LOCK_S : LOCK_X))) {
lock_rec_add_to_queue(
LOCK_REC | LOCK_GAP | lock_get_mode(lock),
heir_block, heir_heap_no, lock->index,
lock->trx, FALSE);
}
...
}
===================
What was the intention for the following condition:
lock->trx->isolation_level <= TRX_ISO_READ_COMMITTED
Why not:
lock->trx->isolation_level < TRX_ISO_READ_COMMITTED