Bug #112723 InnoDB SELECT ... ORDER BY ... DESC FOR UPDATE exist extra lock
Submitted: 14 Oct 2023 3:30 Modified: 16 Oct 2023 13:03
Reporter: Zhenye Wu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[14 Oct 2023 3:30] Zhenye Wu
Description:
Let's say table t contains an index on (key_part1, key_part2), when using following query, InnoDB engine adds extra lock which doesn't match the WHERE condition.

SELECT * FROM t
  WHERE key_part1 = constant
  ORDER BY key_part2 DESC FOR UPDATE;

How to repeat:
1. Create table and prepare data

CREATE TABLE `order_by_lock_test_tab` (`id` BIGINT, `collection_id` INT, `revision` INT, PRIMARY KEY (`id`), KEY `idx_collection_id_revision` (`collection_id`, `revision`));
INSERT INTO `order_by_lock_test_tab` (`id`, `collection_id`, `revision`) VALUES (1, 10, 1), (2, 11, 1), (3, 11, 2), (4, 12, 1), (5, 13, 1);

2. Verify that ORDER BY ... ASC query locks added are expected

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM `order_by_lock_test_tab` WHERE `collection_id`=11 ORDER BY `revision` ASC FOR UPDATE;
+----+---------------+----------+
| id | collection_id | revision |
+----+---------------+----------+
|  2 |            11 |        1 |
|  3 |            11 |        2 |
+----+---------------+----------+
2 rows in set (0.00 sec)

mysql> SELECT OBJECT_NAME, INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_DATA, LOCK_STATUS FROM performance_schema.data_locks;
+------------------------+----------------------------+-----------+---------------+-----------+-------------+
| OBJECT_NAME            | INDEX_NAME                 | LOCK_TYPE | LOCK_MODE     | LOCK_DATA | LOCK_STATUS |
+------------------------+----------------------------+-----------+---------------+-----------+-------------+
| order_by_lock_test_tab | NULL                       | TABLE     | IX            | NULL      | GRANTED     |
| order_by_lock_test_tab | idx_collection_id_revision | RECORD    | X             | 11, 1, 2  | GRANTED     |
| order_by_lock_test_tab | idx_collection_id_revision | RECORD    | X             | 11, 2, 3  | GRANTED     |
| order_by_lock_test_tab | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 2         | GRANTED     |
| order_by_lock_test_tab | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 3         | GRANTED     |
| order_by_lock_test_tab | idx_collection_id_revision | RECORD    | X,GAP         | 12, 1, 4  | GRANTED     |
+------------------------+----------------------------+-----------+---------------+-----------+-------------+
6 rows in set (0.00 sec)

mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

3. Verify that ORDER BY ... DESC cause extra record lock

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM `order_by_lock_test_tab` WHERE `collection_id`=11 ORDER BY `revision` DESC FOR UPDATE;
+----+---------------+----------+
| id | collection_id | revision |
+----+---------------+----------+
|  3 |            11 |        2 |
|  2 |            11 |        1 |
+----+---------------+----------+
2 rows in set (0.00 sec)

mysql> SELECT OBJECT_NAME, INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_DATA, LOCK_STATUS FROM performance_schema.data_locks;
+------------------------+----------------------------+-----------+---------------+-----------+-------------+
| OBJECT_NAME            | INDEX_NAME                 | LOCK_TYPE | LOCK_MODE     | LOCK_DATA | LOCK_STATUS |
+------------------------+----------------------------+-----------+---------------+-----------+-------------+
| order_by_lock_test_tab | NULL                       | TABLE     | IX            | NULL      | GRANTED     |
| order_by_lock_test_tab | idx_collection_id_revision | RECORD    | X,GAP         | 12, 1, 4  | GRANTED     |
| order_by_lock_test_tab | idx_collection_id_revision | RECORD    | X             | 10, 1, 1  | GRANTED     |
| order_by_lock_test_tab | idx_collection_id_revision | RECORD    | X             | 11, 1, 2  | GRANTED     |
| order_by_lock_test_tab | idx_collection_id_revision | RECORD    | X             | 11, 2, 3  | GRANTED     |
| order_by_lock_test_tab | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 1         | GRANTED     |
| order_by_lock_test_tab | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 2         | GRANTED     |
| order_by_lock_test_tab | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 3         | GRANTED     |
+------------------------+----------------------------+-----------+---------------+-----------+-------------+
8 rows in set (0.00 sec)

mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

As you can see the data_locks table query result in step 2 and 3, if we use ORDER BY ... DESC, it will cause extra next-key lock on `idx_collection_id_revision` index data (10, 1, 1) and record lock on `PRIMARY` index data 1.

Suggested fix:
From the source code, we can see that ORDER BY ... DESC query corresponding reverse RefIterator(https://github.com/mysql/mysql-server/blob/mysql-cluster-8.0.34/sql/iterators/ref_row_iter...) doesn't push down the condition to InnoDB engine.

At InnoDB engine side, if there exist ICP and condition not matched, it will add gap lock and return for normal index scan when select lock type isn't none.

However, for ORDER BY ... DESC query, there is no ICP, InnoDB engine will add next-key lock, then return record (10, 1, 1) to upper plan execution layer, which finds the returned record doesn't match condition(https://github.com/mysql/mysql-server/blob/mysql-cluster-8.0.34/sql/iterators/ref_row_iter...) and stop index reverse reading.

Maybe we can have `ha_index_prev_same` method in reverse RefIterator and push down condition, so that there won't exist extra lock?
[16 Oct 2023 12:20] MySQL Verification Team
Hi Mr. Wu,

Thank you for your bug report.

We have managed to repeat the behaviour that you reported. However, we have also added EXPLAIN and EXPLAIN ANALYZE before each of SELECT ..... FOR UPDATE.

Here it is what we have got:

--------------------------------------

+----+-------------+-------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys              | key                        | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | idx_collection_id_revision | idx_collection_id_revision | 5       | const |    2 |   100.00 | Using index |
+----+-------------+-------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------------+
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                              |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Covering index lookup on t1 using idx_collection_id_revision (collection_id=11)  (cost=0.451 rows=2) (actual time=0.0301..0.0425 rows=2 loops=1)
 |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
+----+---------------+----------+
| id | collection_id | revision |
+----+---------------+----------+
|  2 |            11 |        1 |
|  3 |            11 |        2 |
+----+---------------+----------+
+-------------+----------------------------+-----------+---------------+-----------+-------------+
| OBJECT_NAME | INDEX_NAME                 | LOCK_TYPE | LOCK_MODE     | LOCK_DATA | LOCK_STATUS |
+-------------+----------------------------+-----------+---------------+-----------+-------------+
| t1          | NULL                       | TABLE     | IX            | NULL      | GRANTED     |
| t1          | idx_collection_id_revision | RECORD    | X             | 11, 1, 2  | GRANTED     |
| t1          | idx_collection_id_revision | RECORD    | X             | 11, 2, 3  | GRANTED     |
| t1          | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 2         | GRANTED     |
| t1          | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 3         | GRANTED     |
| t1          | idx_collection_id_revision | RECORD    | X,GAP         | 12, 1, 4  | GRANTED     |
+-------------+----------------------------+-----------+---------------+-----------+-------------+
+----+-------------+-------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys              | key                        | key_len | ref   | rows | filtered | Extra                            |
+----+-------------+-------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+----------------------------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | idx_collection_id_revision | idx_collection_id_revision | 5       | const |    2 |   100.00 | Backward index scan; Using index |
+----+-------------+-------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+----------------------------------+
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Covering index lookup on t1 using idx_collection_id_revision (collection_id=11) (reverse)  (cost=0.451 rows=2) (actual time=0.0051..0.00651 rows=2 loops=1)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
+----+---------------+----------+
| id | collection_id | revision |
+----+---------------+----------+
|  3 |            11 |        2 |
|  2 |            11 |        1 |
+----+---------------+----------+
+-------------+----------------------------+-----------+---------------+-----------+-------------+
| OBJECT_NAME | INDEX_NAME                 | LOCK_TYPE | LOCK_MODE     | LOCK_DATA | LOCK_STATUS |
+-------------+----------------------------+-----------+---------------+-----------+-------------+
| t1          | NULL                       | TABLE     | IX            | NULL      | GRANTED     |
| t1          | idx_collection_id_revision | RECORD    | X,GAP         | 12, 1, 4  | GRANTED     |
| t1          | idx_collection_id_revision | RECORD    | X             | 10, 1, 1  | GRANTED     |
| t1          | idx_collection_id_revision | RECORD    | X             | 11, 1, 2  | GRANTED     |
| t1          | idx_collection_id_revision | RECORD    | X             | 11, 2, 3  | GRANTED     |
| t1          | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 1         | GRANTED     |
| t1          | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 2         | GRANTED     |
| t1          | PRIMARY                    | RECORD    | X,REC_NOT_GAP | 3         | GRANTED     |
+-------------+----------------------------+-----------+---------------+-----------+-------------+
--------------------------------------

This fully explains the behaviour, by watching index usage by both ORDER ASC and ORDER DESC.

Sorting in the ascending order by ASC is simple using index as it is physically ordered, that is in ascending order. Hence, it is just using index.

However, sorting on DESC is going bottom to top. it is doing  "Backward index scan" ...... hence, extra X lock on 10, 1, 1 is required because optimiser does not know that it is first tuple in the index (since it is scanning from bottom to top)  and extra REC_NOT_GAP lock on 1 is necessary for the same reason, as it is a PRIMARY for the same row.

Hence, this is expected behaviour.
[16 Oct 2023 12:55] MySQL Verification Team
Hi,

After analysing further your report, seems that one lock could be avoided.

Hence, we are considering that there is indeed a possible improvement.

Verified as reported.
[16 Oct 2023 13:03] Zhenye Wu
Hi, thanks for the update.

Please kindly take consideration to see whether we can improve it, so that we can avoid potential unexpected deadlock issue due to extra lock caused by ORDER BY DESC clause.

For example

session 1
BEGIN;
SELECT * FROM `order_by_lock_test_tab` WHERE `collection_id`=11 ORDER BY `revision` DESC FOR UPDATE;
INSERT INTO `order_by_lock_test_tab` (`id`, `collection_id`, `revision`) VALUES (100, 11, 3);

session 2
BEGIN;
SELECT * FROM `order_by_lock_test_tab` WHERE `collection_id`=10 ORDER BY `revision` DESC FOR UPDATE;
INSERT INTO `order_by_lock_test_tab` (`id`, `collection_id`, `revision`) VALUES (101, 10, 2);

The deadlock will happens in following steps.
1. session 2 execute SELECT, which obtains
   - gap lock on idx_collection_id_revision record (11, 1, 2)
   - next-key lock on idx_collection_id_revision record (10, 1, 1)
2. session 1 execute SELECT, which obtains
   - gap lock on idx_collection_id_revision record (12, 1, 4)
   - next-key lock on idx_collection_id_revision record (11, 2, 3)
   - next-key lock on idx_collection_id_revision record (11, 1, 2)
   and wait to obtain next-key lock on idx_collection_id_revision record (10, 1, 1)
3. session 2 execute INSERT, and wait to obtain INSERT intention lock on idx_collection_id_revision gap (11, 1, 2)

Then deadlock occurs. If we can avoid session 1's SELECT query extra next-key lock on idx_collection_id_revision record (10, 1, 1), then we can avoid such unexpected deadlock issue.
[16 Oct 2023 13:11] MySQL Verification Team
Hi Mr. Wu,

Yes, it is true that extra locking can contribute to deadlocking .....

However, it is far more important that within the transaction, from BEGIN until COMMIT (or ROLLBACK), there is nothing but SQL statements and checks on the output of those. 

There should be no waits on any other processing, user input, collecting data from other sources and similar.

Just shooting down all transaction statements is the best way to avoid lock waits and deadlocks.