Bug #113751 Non repeatable results in REPEATABLE-READ isolation level
Submitted: 25 Jan 7:03 Modified: 26 Jan 10:40
Reporter: Shlomi Noach (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: DML, locking, REPEATABLE-READ, transaction

[25 Jan 7:03] Shlomi Noach
Description:
It is possible for a transaction to issue the same query twice and get different results in a `REPEATABLE-READ` isolation level.

How to repeat:
# Connection1
-- ---
set transaction_isolation='REPEATABLE-READ';
drop table if exists some_table;
create table some_table (
  id bigint,
  col varchar(10),
  primary key (id),
  unique index(col)
);
insert into some_table(id, col) values (3, null), (4, 5);

start transaction;
select col from some_table where col in (5) or id in (3)  for update;
-- ---

# Connection2
-- ---
set transaction_isolation='REPEATABLE-READ';
start transaction;
-- This will block, and then return an empty result:
select col from some_table where col in (('5')) for update;
-- Identical SELECT will now have 1 result:
select col from some_table where col in (('5')) for update;
rollback;
-- ---

# Connection1
-- ---
delete from some_table where col in (5) or id in (3);
insert into some_table(id, col) values (3, 5);
commit; -- Will unblock Conn1 query and reveal the bug
-- ---

You will see that Connection2 results with:

> select col from some_table where col in (('5')) for update;
Empty set (8.86 sec)

> -- Identical SELECT will print different result:
> select col from some_table where col in (('5')) for update;
+------+
| col  |
+------+
| 5    |
+------+
1 row in set (0.00 sec)

Suggested fix:
-
[25 Jan 10:57] MySQL Verification Team
Hi Mr. Noach,

Thank you for our bug report.

These are the results that we get:

Connection 1:

set transaction_isolation='REPEATABLE-READ';
Query OK, 0 rows affected (0.00 sec)

drop table if exists some_table;
Query OK, 0 rows affected, 1 warning (0.00 sec)

create table some_table (
   id bigint,
   col varchar(10),
   primary key (id),
  unique index(col)
);
Query OK, 0 rows affected (0.01 sec)

insert into some_table(id, col) values (3, null), (4, "5");
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

start transaction;
Query OK, 0 rows affected (0.00 sec)

select col from some_table where col in (5) or id in (3)  for update;
+------+
| col  |
+------+
| NULL |
| 5    |
+------+

Connection 2:

set transaction_isolation='REPEATABLE-READ';
Query OK, 0 rows affected (0.00 sec)

start transaction;
Query OK, 0 rows affected (0.00 sec)

select col from some_table where col in (('5')) for update;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
start transaction;
Query OK, 0 rows affected (0.00 sec)

select col from some_table where col in (('5')) for update;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
rollback;
Query OK, 0 rows affected (0.00 sec)

select col from some_table where col in (('5')) for update;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
rollback;
Query OK, 0 rows affected (0.00 sec)

We did not manage to repeat your results with our 8.0.36 production binaries, downloaded from dev.mysql.com.

We tried without re-starting transaction, without rollback and we still get "lock wait timeout". In any possible variant.

Connection 1:
-------------

delete from some_table where col in (5) or id in (3);
Query OK, 2 rows affected (0.00 sec)

insert into some_table(id, col) values (3, 5);
Query OK, 1 row affected (0.00 sec)

commit;
Query OK, 0 rows affected (0.00 sec)

Everything is OK here ......

Connection 2:
--------------

mysql> select col from some_table where col in (('5')) for update;
+------+
| col  |
+------+
| 5    |
+------+
1 row in set (0.01 sec)

mysql> select col from some_table where col in (('5')) for update;
+------+
| col  |
+------+
| 5    |
+------+
1 row in set (0.00 sec)

Again, we are not able to repeat your results.

Can't repeat.
[25 Jan 11:14] Shlomi Noach
I see. I suspect your lock wait timeout is pretty low. Could you try:

- Increasing lock wait timeout
- And running Connection1's delete+insert+commit all within the wait timeout margins, before Connection2 times out?
[25 Jan 11:22] MySQL Verification Team
Hi,

Actually, no .......

Our innodb_lock_wait_timeout is at 100.

And yes, we ran Connection1's delete+insert+commit all immediately one after the other (copy / paste) ...  We ran them all within a single second.

We also repeated the same procedure once again, with the same result.

Can't repeat.
[25 Jan 11:36] Shlomi Noach
That is strange indeed. This reproduces for my right now on 8.0.32 on my laptop. I can test later on on 8.0.36. Otherwise I don't know why this would not reproduce for you.
[25 Jan 21:56] Jean-François Gagné
My result with 8.0.36 and scripting that makes timing somehow reproducible.

As reported by Shlomi, the 1st select of t2 (t2a) returns nothing, but the 2nd (t2b) returns one row.

dbdeployer deploy single mysql_8.0.36

./use <<< "
  create database test_jfg;
  create table test_jfg.some_table (
    id bigint,
    col varchar(10),
    primary key (id),
    unique index(col) );
  insert into test_jfg.some_table(id, col) values (3, null), (4, 5);"

(
./use -N test_jfg <<< "
  start transaction;
  select 't1', now(), col from some_table where col in (5) or id in (3)  for update;
  do sleep(3);
  delete from some_table where col in (5) or id in (3);
  insert into some_table(id, col) values (3, 5);
  select 't1 done', now();
  commit;" &

./use -N test_jfg <<< "
  do sleep(1);
  set transaction_isolation='REPEATABLE-READ';
  start transaction;
  select 't2 starting', now();
  select 't2a', now(), col from some_table where col in (('5')) for update;
  select 't2b', now(), col from some_table where col in (('5')) for update;
  rollback;"
)
t1      2024-01-25 21:55:41     NULL
t1      2024-01-25 21:55:41     5
t2 starting     2024-01-25 21:55:42
t1 done 2024-01-25 21:55:44
t2b     2024-01-25 21:55:44     5
[25 Jan 22:15] Jean-François Gagné
Interestingly, when I change the query from "where col in (('5'))" to "where col in (5)", I get a result closer to what is expected.

./use <<< "
  drop database if exists test_jfg;
  create database test_jfg;
  create table test_jfg.some_table (
    id bigint,
    col varchar(10),
    primary key (id),
    unique index(col) );
  insert into test_jfg.some_table(id, col) values (3, null), (4, 5);"

(
./use -N test_jfg <<< "
  start transaction;
  select 't1', now(), col from some_table where col in (5) or id in (3)  for update;
  do sleep(3);
  delete from some_table where col in (5) or id in (3);
  insert into some_table(id, col) values (3, 5);
  select 't1 done', now();
  commit;" &

./use -N test_jfg <<< "
  do sleep(1);
  set transaction_isolation='REPEATABLE-READ';
  start transaction;
  select 't2 starting', now();
  select 't2a', now(), col from some_table where col in (5) for update;
  select 't2b', now(), col from some_table where col in (5) for update;
  rollback;"
)
t1      2024-01-25 22:13:55     NULL
t1      2024-01-25 22:13:55     5
t2 starting     2024-01-25 22:13:56
t1 done 2024-01-25 22:13:58
t2a     2024-01-25 22:13:56     5
t2b     2024-01-25 22:13:58     5
[26 Jan 10:40] MySQL Verification Team
Hi All,

When I used the method as provided by Monsieur Gagne, I managed to reproduce the results as reported:

t1      2024-01-26 11:48:14     NULL
t1      2024-01-26 11:48:14     5
t2 starting     2024-01-26 11:48:16
t1 done 2024-01-26 11:48:19
t2b     2024-01-26 11:48:19     5

The other problem of using 5 instead of '5' should not be considered a bug, since using the wrong constant for the column type leads to more complex problems.

Verified as reported.
[26 Jan 15:48] Richard Burnison
By modifying Jean-François Gagné's test case a bit, the same problem also shows up on an `UPDATE` statement. In the following script, a `version` column is added to the table. I believe that the expected results for t2 in all cases would be `col='5',version=1` followed by `col='5',version=2`, however, the bug yields `col='5',version=0` followed by `col='5',version=1`.

```
mysql -h 127.0.0.1 --port 35801 -u root <<< "
  drop database if exists test_jfg;
  create database test_jfg;
  create table test_jfg.some_table (
    id bigint,
    col varchar(10),
    version int default 0,
    primary key (id),
    unique index(col) );
  insert into test_jfg.some_table(id, col) values (3, null), (4, 5);"

(
mysql -h 127.0.0.1 --port 35801 -u root test_jfg <<< "
  start transaction;
  select 't1', now(), col from some_table where col in ('5') or id in (3)  for update;
  do sleep(3);
  delete from some_table where col in ('5') or id in (3);
  insert into some_table(id, col) values (3, '5');
  select 't1 done', now();
  commit;" &

mysql -h 127.0.0.1 --port 35801 -u root test_jfg <<< "
  do sleep(1);
  set transaction_isolation='REPEATABLE-READ';
  start transaction;
  select 't2 starting', now();

  update some_table set version = 1 where col in ('5');
  select 't2a', now(), col, version from some_table;

  update some_table set version = 2 where col in ('5');
  select 't2b', now(), col, version from some_table;

  rollback;"
)
```

In this case, the expected result is 

```
t1      now()   col
t1      2024-01-26 10:33:44     NULL
t1      2024-01-26 10:33:44     5
t2 starting     now()
t2 starting     2024-01-26 10:33:45
t1 done now()
t1 done 2024-01-26 10:33:47
t2a     now()   col     version
t2a     2024-01-26 10:33:47     5       1
t2b     now()   col     version
t2b     2024-01-26 10:33:47     5       2
```

However, the actual result in MySQL 8.0.36 is

```
t1      now()   col
t1      2024-01-26 10:34:34     NULL
t1      2024-01-26 10:34:34     5
t2 starting     now()
t2 starting     2024-01-26 10:34:35
t1 done now()
t1 done 2024-01-26 10:34:37
t2a     now()   col     version
t2a     2024-01-26 10:34:37     5       0
t2b     now()   col     version
t2b     2024-01-26 10:34:37     5       2
```

From my analysis of the code, it appears the problem stems from the lack of a gap lock on `col='5'`, which is allowing the `DELETE` and `INSERT` to insert `('5',3)` before the delete-marked record being found in transaction 2, `('5',4)`.  In this fashion, changing the test case to use `id=5` also yields the expected result:

```
mysql -h 127.0.0.1 --port 35801 -u root <<< "
  drop database if exists test_jfg;
  create database test_jfg;
  create table test_jfg.some_table (
    id bigint,
    col varchar(10),
    version int default 0,
    primary key (id),
    unique index(col) );
  insert into test_jfg.some_table(id, col) values (3, null), (4, 5);"

(
mysql -h 127.0.0.1 --port 35801 -u root test_jfg <<< "
  start transaction;
  select 't1', now(), col from some_table where col in ('5') or id in (3)  for update;
  do sleep(3);
  delete from some_table where col in ('5') or id in (3);
  insert into some_table(id, col) values (5, '5');
  select 't1 done', now();
  commit;" &

mysql -h 127.0.0.1 --port 35801 -u root test_jfg <<< "
  do sleep(1);
  set transaction_isolation='REPEATABLE-READ';
  start transaction;
  select 't2 starting', now();

  update some_table set version = 1 where col in ('5');
  select 't2a', now(), col, version from some_table;

  update some_table set version = 2 where col in ('5');
  select 't2b', now(), col, version from some_table;

  rollback;"
)
```

shows the expected results,

```
t1      now()   col
t1      2024-01-26 10:37:48     NULL
t1      2024-01-26 10:37:48     5
t2 starting     now()
t2 starting     2024-01-26 10:37:49
t1 done now()
t1 done 2024-01-26 10:37:51
t2a     now()   col     version
t2a     2024-01-26 10:37:51     5       1
t2b     now()   col     version
t2b     2024-01-26 10:37:52     5       2
```

When stepping through a debugger, I'm led to believe that this is happening because of the `goto next_rec` in https://github.com/mysql/mysql-server/blob/49ef33f7edadef3ae04665e73d1babd40179a4f1/storag... skipping the delete-marked record and advancing the cursor _ahead_ to `('5',5)` rather than backwards to `('5',3)`.
[29 Jan 10:53] MySQL Verification Team
Thank you, Mr. Burnison ......

We managed to repeat your test case too ..........

We shall copy it to your internal bugs database.