MySQL Bugs: #116067: Upsert not visible within own transaction when querying by unique index

Bug #116067	Upsert not visible within own transaction when querying by unique index
Submitted:	11 Sep 2024 21:51	Modified:	13 Sep 2024 5:56
Reporter:	Aaron Staley	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	8.0.28	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When executing queries in REPEATABLE_READ isolation, I expect to be able to always see my own writes as covered at https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html.  ("The exception to this rule is that the query sees the changes made by earlier statements within the same transaction.").

I am finding that selects after an upsert (INSERT INTO .. ON DUPLICATE KEY UPDATE) of a record with a randomized primary key but fixed unique key fail to return the just upserted record.   See how to repeat, specifically the "fails to see just upserted record".

How to repeat:
-- Running with isolation REPEATABLE_READ

-- In conn1
-- Create the table
CREATE TABLE simple_table (
    id VARCHAR(20) PRIMARY KEY,
    unique_col VARCHAR(20),
    status VARCHAR(20),
    UNIQUE INDEX idx_unique_col (unique_col)
);

-- Start txn1
START TRANSACTION;

SELECT * FROM simple_table WHERE unique_col = 'unique1';

-- Switch to conn 2
START TRANSACTION;

INSERT INTO simple_table (id, unique_col, status)
VALUES ('id1', 'unique1', 'in_progress')
ON DUPLICATE KEY UPDATE
    unique_col = VALUES(unique_col);

 COMMIT;

-- Resume conn 1 (txn 1)

INSERT INTO simple_table (id, unique_col, status)
VALUES ('id2', 'unique1', 'in_progress')
ON DUPLICATE KEY UPDATE
    unique_col = VALUES(unique_col);

 -- Fails to see just upserted record
SELECT * FROM simple_table WHERE unique_col = 'unique1'

COMMIT;

-- now visible
SELECT * FROM simple_table WHERE unique_col = 'unique1';

Drop Table simple_table;

Suggested fix:
I'm not sure if this is a mysql bug or a documentation bug.

1. If it's a bug, it suggests MVCC is not properly handling the semantics around INSERT INTO.. ON DUPLICATE KEY UPDATE (upsert).  After executing the upsert, it is expected that a record exists with my unique value and yet the select in the same txn fails to show it.   I expect subsequent selects to actually show in the current transaction.

2. Or if this is expected, documentation should note this behavior. I couldn't find anything saying that INSERT INTO ON DUPLICATE KEY is an exception to read after write consistency in either https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html nor https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html

I studied this a bit more and I think this is a MVCC bug where upserts (INSERT INTO.. ON DUPLICATE KEY) are no-ops with respect to the ground-truth of the database.

If I modify some other column in the ON DUPLICATE KEY UPDATE, the row is visible.  But when no columns are modified (with respect to committed state of db), read-after-write consistency is off.

This behavior is fine for a pure UPDATE function (the snapshot can't see rows before-hand anyway), but not for an upsert, because if the transaction cannot see the rows, from its POV, there should be an insert.  i.e. a record should be there after execution.

Hello Aaron,
I am not sure what it is, but it is definitely an interesting behaviour and edge-case! 
Thank you for reporting it.

Some observations:
1. this "upsert" is not "really modifying" anything, because the new value it tries to assign is the same as the old one, therefore there's no "physical change" to the row
2. perhaps it adds to the complexity of this edge-case that the column you "try" to update is the one which is the reason for uniqueness conflict. Anyway, in my testing replacing that part with status=VALUES(status), doesn't change the merit of the report: the read-view doesn't see the row
3. this particular "no-op" upsert seems conceptually equivalent to INSERT IGNORE. Anyway, I've tried replacing the upsert with `INSERT IGNORE INTO simple_table (id, unique_col, status) VALUES ('id1', 'unique1', 'in_progress')` and it doesn't change the merit of the report: the read-view doesn't see the row
4. it is also *conceputally* equivalent to `REPLACE INTO simple_table (id, unique_col, status) VALUES ('id1', 'unique1', 'in_progress')`. Surprisingly (given above findings), this time the conn1 can see the row! 

I do understand why you, or your app can expect the row to be visible, given it just did the INSERT.
I do understand your preferred reading of the documentation.

Still, the reality is: the documentation tries to capture what the code does, and what the code does is it checks who last modified the row, and it so happens that conn1 did not, in fact, authored this row.
In particular, if you change the scenario to use something like:
```
INSERT INTO simple_table (id, unique_col, status)
VALUES ('id1', 'unique1', 'in_progress')
ON DUPLICATE KEY UPDATE
    status = 'more progress!';
```
Then conn1 will see the row:
```
mysql> SELECT * FROM simple_table WHERE unique_col = 'unique1';
+-----+------------+----------------+
| id  | unique_col | status         |
+-----+------------+----------------+
| id1 | unique1    | more progress! |
+-----+------------+----------------+
```
because now there is no doubt that it has authored a new version of it, so it is "a change".

So, the way I see it, this bug report is about a grey area of: what exactly constitutes "a change"?
There are clear cases like "in_progress" -> "more progress!", where it is clearly "a change", and InnoDB will show you your "change".
There are cases like "unique1" -> "unique1", which might (INSERT ON DUPLICATE, INSERT IGNORE) or might not (REPLACE INTO) be recognized by InnoDB as not really requiring a physical change of the row, so InnoDB takse a short cut and doesn't touch anything, and then there is no "change".

I think this (admittedly messy) situation is one more reason for my recommendation from other bug reports:
Do not mix non-locking SELECTs with locking operations such as: INSERT,UPDATE,DELETE, SELECT..FOR UPDATE/SHARE.
I think we should update the documentation to be more frank about this.
REPEATABLE READ is good at pretending to be like SERIALIZABLE as long as you stick to either of these groups of statements, but terrible at mixing them.
Note that the problem would not appear if you:
a) used SERIALIZABLE instead of REPEATABLE READ
b) used SELECT...FOR UPDATE instead of SELECT
c) split the transaction into separate parts, so that one uses UPDATE and the other uses SELECT

Thanks for the thoughts!

> 4. it is also *conceputally* equivalent to `REPLACE INTO simple_table (id, unique_col, status) VALUES ('id1', 'unique1', 'in_progress')`. Surprisingly (given above findings), this time the conn1 can see the row! 

REPLACE INTO isn't quite the same as an op as by definition it is a DELETE plus an INSERT.  That is it has to modify the underlying row in the db.  (For this reason I also cannot use REPLACE INTO within my application).

Agreed on the workaround being to change the isolation level (which I can't do as this is part of a large system), add a "dummy column" to force a write (viable I suppose) or ensure anywhere I read in this transaction I use FOR SHARE to read committed state.

Personally, I think this is a bug in the MVCC.  It seems to determine row visibility based on "was the underlying row modified by this transaction", when IMO the more reasonable interpretation of read-after-write consistency should be "from the POV of this transaction was the underlying row modified".