MySQL Bugs: #112850: Deadlock happens in RC but not in higher RR isolation level

Bug #112850	Deadlock happens in RC but not in higher RR isolation level
Submitted:	27 Oct 2023 4:33	Modified:	27 Oct 2023 10:10
Reporter:	Wen He (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S3 (Non-critical)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	deadlock, Isolation level, lock

Description:
Deadlock happens in RC isolation level. However, it won't happen in a higher RR isolation level.

How to repeat:
-- Example 1
set global transaction_isolation='READ-COMMITTED';
drop table if exists t1;
create table t1 (id int primary key);
insert into t1 values (1),(2),(3),(4),(5),(6),(7),(8),(10);

-- trx1
begin;
select * from t1 where id = 10 for update;

-- trx2
begin;
select * from t1 where id >= 7 and id <= 8 for update;

-- trx1
select * from t1 where id = 7 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

-- Example 2
set global transaction_isolation='REPEATABLE-READ';
drop table if exists t1;
create table t1 (id int primary key);
insert into t1 values (1),(2),(3),(4),(5),(6),(7),(8),(10);

-- trx1
begin;
select * from t1 where id = 10 for update;

-- trx2
begin;
select * from t1 where id >= 7 and id <= 8 for update;

-- trx1
select * from t1 where id = 7 for update;   -- Lock Wait

Hi Mr. He,

Thank you very much for your bug report.

We repeated it and got a deadlock too, but only in READ-COMMITTED.

Verified as reported.

Hi, Wen He,
IIUC the thing you find surprising here is that a more strict isolation level actually does less locking.
(I.e. `select * from t1 where id >= 7 and id <= 8 for update` locks the row with id=10 in READ COMMITTED but not in REPEATABLE READ)
Your intuition here is in general correct, and thank you for reporting this, because this kind of counter-intuitive behaviour might indeed indicate a bug and should be looked into.

However, in this particular case, the reason the REPEATABLE READ can avoid taking locks, is because there's a special heuristic (which was added through the fix for Bug#29508068 UNNECESSARY NEXT-KEY LOCK TAKEN) which lets InnoDB avoid taking locks on rows (and/or gaps before them) in cases where I (the author of the patch) could prove to my and reviewer's satisfaction, that the lock wasn't needed for correctness.
This heuristic can be found in row_compare_row_to_range():
https://github.com/mysql/mysql-server/blob/mysql-cluster-8.0.35/storage/innobase/row/row0s... 
And it so happens, that READ COMMITTED is quite peculiar mode, in which we support the so called "semi-consistent read", (CTRL+F for "semi-consistent" at https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html for some explanation).
InnoDB doesn't use semi-consistent reads in REPEATABLE READ, but it does use them in READ COMMITTED. The existence of this mechanism has to be taken into account while proving that avoiding a lock on a record is a right thing to do.
I am not saying this is impossible to do in principle - perhaps someone like you can propose such a proof. But at the time the patch was developed it was considered by us not an optimal use of developers' time to try to cover even more cases - there's a trade-off at play in row_compare_row_to_range(): the more cases we try to cover, the more complicated the proof, and the higher risk the system will work in a wrong way. So, the current solution is conservative:
```
...
  if (!set_also_gap_locks || trx->skip_gap_locks() ||
      (unique_search && !rec_get_deleted_flag(rec, comp)) ||
      dict_index_is_spatial(index) ||
      (index == clust_index && mode == PAGE_CUR_GE && direction == 0 &&
       dtuple_get_n_fields_cmp(search_tuple) ==
           dict_index_get_n_unique(index) &&
       0 == cmp_dtuple_rec(search_tuple, rec, index, offsets))) {
    row_to_range_relation.gap_can_intersect_range = false;
    return (row_to_range_relation);
  }
...
  /* While I believe that we handle semi-consistent reads correctly, the proof
  is quite complicated and lingers on the fact that semi-consistent reads are
  used only if we don't use gap locks. And fortunately, we've already checked
  above that trx->skip_gap_locks() is false, so we don't have to go through the
  whole reasoning about what exactly happens in case the row which is at the
  end of the range got locked by another transaction, removed, purged, and while
  we were doing semi-consistent read on it. */
  ut_ad(!trx->skip_gap_locks());
```
In READ COMMITTED trx->skip_gap_locks() is true so we return quickly, and this is why the assert holds, and this is why the only optimization uses to avoid the gap lock before the row, but the lock on the row is taken.

So, the way I see it, it is not that READ COMMITTED does something wrong, it's just it doesn't have the same optimization which was developed for REPEATABLE READ, i.e. it is not a bug, it is a missing feature. A feature, which might be risky to implement and certainly would require a proof of correctness.

In case you or anyone else would like to contribute a patch, here are some pointers:
1. https://github.com/mysql/mysql-server/blob/mysql-cluster-8.0.35/mysql-test/suite/innodb/t/... contains a test which currently runs in REPEATABLE READ and demonstrates the behaviour of locking at the end of range. One should create another test for READ COMMITTED and verify that the behaviour is as it should be
2. Learn about semi-consistent read in sufficient detail to be able to prove correctness in case of concurrent access to the same row, like the ones described in the comment. Ideally, write tests which cover scenarios in which other threads are delete or undelete the row so that it stops/starts belonging to the result set etc.

Thank you Jakub,

For your valuable contribution.