MySQL Bugs: #115941: [RR]query data not locked when used as sub-query in update statement

Bug #115941	[RR]query data not locked when used as sub-query in update statement
Submitted:	28 Aug 2024 5:47	Modified:	10 Sep 2024 23:57
Reporter:	Jesse Xi	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.7.26	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
when using REPEATABLE-READ isolation level, query like  `update t3 set a=(select c from t1 where a=3);` will set a to data commit by other txn.

How to repeat:
Step 1: prepare data 
CREATE TABLE `t1` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL
);
insert into t1 values( 1, 2);

CREATE TABLE `t2` (
  `a` int(11) DEFAULT NULL
);
insert into t2 values(3);

Step 2: reproduce

mysql> begin;
Query OK, 0 rows affected (0.05 sec)

mysql> select a from t2;
+------+
| a    |
+------+
|    3 |
+------+
1 row in set (0.05 sec)

mysql>  //Create another connection, and run this sql: `update t2 set a = 4;`

mysql> select a from t2;
+------+
| a    |
+------+
|    3 |
+------+
1 row in set (0.04 sec)

mysql> select * from t1;
+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
+------+------+
1 row in set (0.04 sec)

mysql> update t1 set b = (select a from t2);
Query OK, 1 row affected (0.06 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from t2;
+------+
| a    |
+------+
|    3 |
+------+
1 row in set (0.05 sec)

mysql> select * from t1;
+------+------+
| a    | b    |
+------+------+
|    1 |    4 |
+------+------+
1 row in set (0.05 sec)

mysql> commit;
Query OK, 0 rows affected (0.05 sec)

mysql> select * from t2;
+------+
| a    |
+------+
|    4 |
+------+
1 row in set (0.04 sec)

mysql> 
 

Expected: 
t1.b should be '3', but actually it been updated to commited data '4'

Hi,
Thanks for your report. I am sure we already have this report but I'm having issue finding it now so when I do I'll set this one as duplicate but for now it is verified.

Posted by developer:
 
I am sorry to say that, but this is not really a bug, this is just how REPEATABLE READ works in InnoDB.
This is perhaps a matter of writing a better documentation.
The current documentation at https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html (and any other version, not just 8.0) says:

> REPEATABLE READ
> 
> This is the default isolation level for InnoDB. Consistent reads within the same transaction read the snapshot established by the first read. This means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other. See Section 17.7.2.3, “Consistent Nonlocking Reads”.
> 
> For locking reads (SELECT with FOR UPDATE or FOR SHARE), UPDATE, and DELETE statements, locking depends on whether the statement uses a unique index with a unique search condition, or a range-type search condition.
> 
>     For a unique index with a unique search condition, InnoDB locks only the index record found, not the gap before it.
> 
>     For other search conditions, InnoDB locks the index range scanned, using gap locks or next-key locks to block insertions by other sessions into the gaps covered by the range. For information about gap locks and next-key locks, see Section 17.7.1, “InnoDB Locking”. 

In particular one (correct) reading of it is that if you issue anything other than plain (nonlocking) SELECT statements within the same transaction, we don't promise you anything.
InnoDB's REPEATABLE READ is nowhere precisely defined, except in the code, and frankly our documentation always followed what the code does, not the other way around.
The REPEATABLE READ is meant to be a good enough, but faster, approximation of SERIALIZABLE which works for 90% of use cases, by handling:
- non-locking SELECT via read view, which give illusion of being serialized at the moment of read view's creation
- any locking statement like UPDATE, INSERT, DELETE or SELECT ... FOR .. by temporarily handling it as if it was SERIALIZABLE, i.e. by obtaining locks on most recent versions of rows and using them
This works great for auto-commit, single statement transactions, which do not mix these two realities.

In particular the current title of this bug report, "[RR]query data not locked when used as sub-query in update statement" is missleading.
The behaviour which you observe is happening precisely because RR *does* lock the data in the sub-query of update statement, which is why `update t1 set b = (select a from t2);` sees a==4 in the sub-query - it has locked the t2's row and looked at the most recent version of it, which has a==4, as opposed to avoiding locking by using a read-view which would see a==3.

This behaviour is not something we can, or want to fix.
What we can do is to improve documentation to help people avoid this pit-fall.

I'd propose to add following text to our documentation:
"It is not recommended to mix locking statements (UPDATE, INSERT, DELETE or SELECT ... FOR ..) with non-locking SELECTS in a single REPEATABLE READ transaction, because more often than not, what you really want in such case is SERIALIZABLE. This is because non-locking SELECTs will present the state of the database from a read-view which consists of effects of transactions which committed before the read-view was created and current transaction's own writes, while the locking statements will see and modify the most recent state of the DB and use locking. In general these are two different states of tables, and hence not consistent with each other and mixing them is difficult to reason about."