MySQL Bugs: #63870: Repeatable-read isolation violated in UPDATE

Bug #63870	Repeatable-read isolation violated in UPDATE
Submitted:	29 Dec 2011 4:30	Modified:	8 Jul 2013 11:12
Reporter:	th nb	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.1.42-log, 5.5.20	OS:	Linux ( 2.6.39.1 kernel i686)
Assigned to:		CPU Architecture:	Any
Tags:	ISOLATION, REPEATABLE-READ, transaction

Description:
When running in a transaction with isolation-level Repeatable-read (the default), the MySQL guarantee is that all actions within that transaction are as if the transaction has its own 'private snapshot' of a database.

And for the most part that is true and works great.

However, when issuing an UPDATE of the form:

UPDATE table set column = column + 1 WHERE other-column = 'value', the newly set value reads the last commited value of [column] when another later-started transaction has updated and committed the same row.

This shouldn't be: see the "How to repeat" for exact sample case

I stumbled upon this issue in a MySQL forum and then independently verified it.
cf:  http://forums.mysql.com/read.php?97,222388,222388

How to repeat:
CREATE TABLE char_encode (
   glyph CHAR(1) NOT NULL,
   codepoint TINYINT(3) NOT NULL
) ENGINE=InnoDB

INSERT INTO char_encode VALUES ('a', 97), ('b', 98);

SESSION-1>SHOW SESSION VARIABLE LIKE 'tx_isolation'
 tx_isolation | REPEATABLE-READ

SESSION-1>START TRANSACTION;

SESSION-1>SELECT * FROM char_encode;

  a  | 97
  b  | 98

SESSION-2>SHOW SESSION VARIABLE LIKE 'tx_isolation'
 tx_isolation | REPEATABLE-READ

SESSION-2>START TRANSACTION;

SESSION-2>SELECT * FROM char_encode;
  a  | 97
  b  | 98

SESSION-2>UPDATE char_encode SET codepoint = 100 WHERE glyph = 'a';
1 Rows affected;
SESSION-2>SELECT * FROM char_encode;
  a  | 100
  b  | 98
//ok, as expected, a tx can always see its OWN changes
SESSION-2>COMMIT;

SESSION-1>SELECT * FROM char_encode WHERE glyph = 'a';
   a  | 97  //Perfect! as expected; even though session-2 committed, its not part of my snapshot

SESSION-1>UPDATE char_encode SET codepoint = codepoint + 1 WHERE glyph = 'a';
1 Rows affected;

SESSION-1>SELECT * FROM char_encode WHERE glyph = 'a';
   a | 101 //Huh?? It was just 97 an instance ago in my snapshot ==> 97 + 1 = 98 last time I checked

Suggested fix:
Just a wild guess:
maybe there is some optimized code to deal with UPDATEs that set a column's value with an expression that references a column value from the same row (as in SET column = column + 1), but this code doesn't respect the isolation-level?

Verified just as described with 5.5.20 on Mac OS X also:

...
mysql> SELECT * FROM char_encode WHERE glyph = 'a';
+-------+-----------+
| glyph | codepoint |
+-------+-----------+
| a     |        97 |
+-------+-----------+
1 row in set (0.03 sec)

mysql> UPDATE char_encode SET codepoint = codepoint + 1 WHERE glyph
    -> = 'a';
Query OK, 1 row affected (0.07 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT * FROM char_encode WHERE glyph = 'a';
+-------+-----------+
| glyph | codepoint |
+-------+-----------+
| a     |       101 |
+-------+-----------+
1 row in set (0.00 sec)

Looks like an serious bug.

This is not a bug, but works as the design. Even under repeatable read, "Phantom reads" are still possible. This behavior is about the same as all other major databases. See http://en.wikipedia.org/wiki/Isolation_%28database_systems%29 for "Phantom reads".

Surely everyone can agree with the following:

Take the two statements included in the testcase:

SESSION-1>SELECT * FROM char_encode WHERE glyph = 'a';
   a  | 97  //Perfect! as expected; even though session-2 committed, its not part of my
snapshot

SESSION-1>UPDATE char_encode SET codepoint = codepoint + 1 WHERE glyph = 'a';
1 Rows affected;

and replace them with the following statements:
SESSION-1>SELECT codepoint FROM char_encode WHERE glyph = 'a' INTO @x; //@x is just a custom session variable
SESSION-1>SELECT @x;
+------+
| @x   |
+------+
|   97 |
+------+
//as expected with session-1 snapshot
SESSION-1>UPDATE char_encode SET codepoint = @x + 1 WHERE glyph = 'a';
1 row affected
SESSION-1>SELECT * FROM char_encode;
   a | 98

Those two sets of statememts are SEMANTICALLY equivalent (anyone disagree with that?) and yet they produce two different results.  

Since I'm unable to tell who is who within the bug application, if someone from the mysql team claims that this behavior is a feature not a bug, OK, I'll just accept that.  But the previous comment from Calvin Sun just muddies the water.

Sorry, but phantom reads is about seeing NEW (phantom) rows, inserted and committed by other transactions, while in our case we have concurrent UPDATE of the row that we already read once in frames of transaction with repeatable read isolation level. Our next SELECT gets the same values as we had already seen, while UPDATE for some unclear reason gets value that had never been there in this row for OUR transaction. 

IMHO this is a bug.

Guys, this seems like a pretty serious issue here.
Is this going into any target version?

Hello!

Thanks for your report!

This is not a bug but an intended and documented behavior.

Our manual explicitly says that DML statements (UPDATE, DELETE, INSERT) and locking-reads (SELECT ... IN SHARE MODE, SELECT ... FOR UPDATE) do not necessarily use the same consistent view/snapshot in REPEATABLE READ mode as ordinary SELECTs.

Let me quote our manual page: http://dev.mysql.com/doc/refman/5.1/en/innodb-consistent-read.html

"
Suppose that you are running in the default REPEATABLE READ isolation level. When you issue a consistent read (that is, an ordinary SELECT statement), InnoDB gives your transaction a timepoint according to which your query sees the database. If another transaction deletes a row and commits after your timepoint was assigned, you do not see the row as having been deleted. Inserts and updates are treated similarly.

Note

The snapshot of the database state applies to SELECT statements within a transaction, not necessarily to DML statements. If you insert or modify some rows and then commit that transaction, a DELETE or UPDATE statement issued from another concurrent REPEATABLE READ transaction could affect those just-committed rows, even though the session could not query them. If a transaction does update or delete rows committed by a different transaction, those changes do become visible to the current transaction. For example, you might encounter a situation like the following:

SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz'; -- Returns 0: no rows match.
DELETE FROM t1 WHERE c1 = 'xyz'; -- Deletes several rows recently committed by other transaction.

SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc'; -- Returns 0: no rows match.
UPDATE t1 SET c2 = 'cba' WHERE c2 = 'abc'; -- Affects 10 rows: another txn just committed 10 rows with 'abc' values.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'cba'; -- Returns 10: this txn can now see the rows it just updated.
"

Thus behavior which is described in your first comment is expected.

Indeed, this also makes the following two code snippets, which you have mentioned later:

SELECT * FROM char_encode WHERE glyph = 'a';
UPDATE char_encode SET codepoint = codepoint + 1 WHERE glyph = 'a';

and

SELECT codepoint FROM char_encode WHERE glyph = 'a' INTO @x;
UPDATE char_encode SET codepoint = @x + 1 WHERE glyph = 'a';

not equivalent.

Taking into account the above I am closing this report as "Not a bug".

Best regards,
Dmitry Lenev