MySQL Bugs: #105042: Absence of expected locks result in different table states under RC

Bug #105042	Absence of expected locks result in different table states under RC
Submitted:	26 Sep 2021 7:24	Modified:	28 Sep 2021 14:37
Reporter:	Dai Dinary	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S3 (Non-critical)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	lock, transaction

Description:
When two transactions are committed concurrently under the Read Committed isolation level, 
an UPDATE is not blocked even it conflicts with the UPDATE of another concurrent transaction from the user's point of view, 
resulting in a different result from the counterpart under the Repeatable Read isolation level.

How to repeat:
create table t(a int, b int);
insert into t values(null, 1), (null, 2), (null, 3);

/* s1 */ begin; 
/* s1 */ update t set a = 10; 
/* s2 */ begin; 
/* s2 */ update t set b = 20 where a;
/* s1 */ commit;
/* s2 */ commit;

==================== Read Committed ==========================

the UPDATE of s2 is NOT blocked and the final table state is

mysql> select * from t;
+------+------+
| a    | b    |
+------+------+
|   10 |    1 |
|   10 |    2 |
|   10 |    3 |
+------+------+

=================== Repeatable Read ==========================

the UPDATE of s2 is blocked and the final table state is

mysql> select * from t;
+------+------+
| a    | b    |
+------+------+
|   10 |   20 |
|   10 |   20 |
|   10 |   20 |
+------+------+

This may be caused by the "semi-consistent" read described in MySQL manual. But I think it is not in line with expectations from the user's point of view.

I notice the example and explanation of "semi-consistent" read in MySQL reference manual and think the following expression is incorrect:

"If the row matches (must be updated), MySQL reads the row again and this time InnoDB either locks it or waits for a lock on it."

Actually, if the transaction uses the latest committed version to judge whether a target row matches the WHERE condition, it cannot claim that the row matches "must" be updated. Because there may be another uncommitted transaction modifying the same target row to make it unmatch the WHERE condition.

In addition, I think the example used to explain the "semi-consistent" read is not representative enough. Specifically, modifications in transaction A do not affect the judgment of the WHERE clause in transaction B, and vice versa. Consider the following example:

/* init */ CREATE TABLE t (a INT) ENGINE = InnoDB;
/* init */ INSERT INTO t VALUES (1),(2);

Transaction A:
/* s1 */ begin;
/* s1 */ update t set a = 2 where a = 1;
/* s1 */ commit;

Transaction B:
/* s2 */ begin;
/* s2 */ update t set a = 1 where a = 2;
/* s2 */ commit;

From the user's point of view, the final state of table should be [(1),(1)] or [(2),(2)] after they are executed serially or concurrently.

But if we use "semi-consistent" read as described above, and these two transactions are submitted in the following order

/* s1 */ begin;
/* s1 */ update t set a = 2 where a = 1;
/* s2 */ begin;
/* s2 */ update t set a = 1 where a = 2;
/* s1 */ commit;
/* s2 */ commit;

the final state of the table will be [(2), (1)], which violates the intuition of users.

Hello Dai Dinary,
The https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html says:

> Using READ COMMITTED has additional effects: 
> [...]
> For UPDATE statements, if a row is already locked, InnoDB performs a “semi-consistent” read, 
> returning the latest committed version to MySQL so that MySQL can determine whether the row 
> matches the WHERE condition of the UPDATE. If the row matches (must be updated), MySQL reads 
> the row again and this time InnoDB either locks it or waits for a lock on it. 

I believe this explains the behaviour you see. 

If you believe otherwise, please elaborate on what you've expected to see (in the light of above documentation) and how it's different from observations.

Hi Dai Dinary,
I wrote my previous comment before refreshing the page, and thus wasn't aware yet of your comment.

I agree the example in the docs is not representative and could be made more interesting/representative by involving updates which change the row from non-matching to matching the criteria of the second transaction. And your bug report indeed presents such interesting case.

I even suspect current documentation is a bit wrong as it lists "x-lock(2,3)" as a step of the second transaction which AFAIU can't happen as the first transaction still holds an x-lock on this row due to "x-lock(2,3); update(2,3) to (2,5); retain x-lock". But this is minor wording/semantical issue, because the code indeed tries to perform x-lock(2,3) then realises it has to wait, and aborts the wait by unlocking, so one could argue that this is mostly-correct description of what happens.

However, I don't think any of this as a bug in implementation. Rather it looks like request for improvement of the documentation.

You can't expect serializability from RC. If one needs the database to behave "intuitively" then one needs to use SERIALIZABLE, which is precisely the isolation level which promises the behaviour to be explainable as a sequence of individual transactions happening in one serial order or another.
The whole purpose of RC is to gain speed by relaxing this requirement, and this is what semi-consistent read gives you: it doesn't take time to wait on locks, at the expense of results more difficult to interpret as a series of events.

Hi Mr. Dinary,

Thank you for your interesting bug report.

We have to agree fully with our colleague and therefore we suggest this report is verified as a Documentation bug.

We hope that you agree with this proposal ......

Thank you for the detailed explanation and I agree with it. Thank you!

By the way, RR is not serializable either, why is this behavior not allowed under RR as well?

Hi Mr. Dinary,

You are correct, RR is not designed for a full serialisability. What you are looking for is SERIALIZABLE MVCC mode. RR is designed mostly for consistency of the individual views and for speed. READ-COMMITTED has the advantage of lesser row locking.

However, isolation level is truly chosen mostly by what is it that your application requires, as defined by the application design.

We do not recommend you to try that, since it will slowdown all of your concurrent DML  transactions.

This report is now verified as a documentation bug.