Bug #112446 Fractured read at SERIALIZABLE
Submitted: 24 Sep 2023 23:24 Modified: 2 Oct 2023 16:15
Reporter: Kyle Kingsbury Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:8.0.34 OS:Debian (Debian 11)
Assigned to: CPU Architecture:x86 (x86-64)
Tags: ISOLATION, safety, SERIALIZABLE

[24 Sep 2023 23:24] Kyle Kingsbury
Description:
At SERIALIZABLE isolation MySQL 8.0.34 allows histories in which a transaction performs multiple reads and observes some, but not all, of a previous transaction's effects. This anomaly is known as "fractured read", and is a violation of prefix consistency--not to mention repeatable read, snapshot isolation, and serializability.

For example, consider the following schema:

create table if not exists txn0 (
  `system` int not null,
  id int not null,
  `value` int not null,
  primary key (`system`, id)
);
create index system_idx_0 on txn0 (`system`)

We create two additional tables txn1 and txn2 with identical schemas.

Next, we perform a randomly generated series of transactions whose internal micro-operations are of three forms:

1. Insert a row into one of these tables, 
2. Perform a read of a single row by system and id.
3. Select all rows matching some generated predicate within a specific system.

For each system, we run approximately 40 transactions using two separate, concurrent threads in a JVM program, each with their own JDBC client. 30 total threads performing ~700 txns/sec is sufficient to reproduce this issue within a few seconds of overall runtime. Using this approach, we reliably find dependency graph cycles like the following:

http://jepsen.io.s3.amazonaws.com/analyses/mysql-8.0.34/serializable-g-single.svg

G-single #0
Let:
  T1 = {:index 49790, :time 39498753347, :type :ok, :process 5, :f :txn, :value [[:rp :true {12 5, 16 3, 19 2}]]}
  T2 = {:index 49787, :time 39498382826, :type :ok, :process 4, :f :txn, :value [[:rp [:= 0] {}] [:insert 16 3] [:insert 13 4] [:insert 12 5]]}

Then:
  - T1 < T2, because T1 performed a predicate read [:rp :true {12 5, 16 3, 19 2}] of key 13 and selected version :elle/unborn, which was overwritten by T2's write of 4, which also changed whether the predicate would have matched. Note that key 13's version order was [:elle/unborn 4].
  - However, T2 < T1, because T2 set key 12 to 5, which was in the version set of predicate read [:rp :true {12 5, 16 3, 19 2}]. Note that the version order of key 12 was [:elle/unborn 5]: a contradiction!

This is a fractured read anomaly observed in system 982.

The `[:rp :true {12 5 ...}]` micro-operation denotes a predicate read of every row (`:true` is the trivial predicate `WHERE TRUE`) in a given system, and that the results of those reads were that key 12's value was 5, key 16's value was 3, and key 19's value was 2. The exact SQL queries involved were:

[select * from txn0 where `system` = ? and TRUE 982]
[select * from txn1 where `system` = ? and TRUE 982]
[select * from txn2 where `system` = ? and TRUE 982]

The `[:insert 16 3]` micro-operation denotes an INSERT statement for row 16 of value 3. The corresponding SQL statement was:

[insert into txn1 (`system`, id, `value`) values (?, ?, ?) 982 16 3]

T2 also performed a predicate read of every row in system 982 whose value was 0, but this does not appear to be important for this anomaly. What matters is that T2 inserted keys 12, 13, and 16, and T1 observed keys 12, 16, and 19--but *not* 13.

The full logs of this test run, including several additional G-single anomalies, are here: http://jepsen.io.s3.amazonaws.com/analyses/mysql-8.0.34/serializable-g-single.zip. This particular anomaly is in independent/982.

Early testing suggests this issue requires multiple tables: it hasn't shown up with single-table tests so far. It does not appear sensitive to the `system` index; we can remove that and get similar outcomes.

The MySQL options we apply on top of the defaults from the official Debian packages are:

[mysqld]
bind-address = 192.168.122.101
max_connections = 4096

log-bin
server_id = 1
binlog-format = MIXED
replica-preserve-commit-order	= ON

How to repeat:
An early Jepsen test suite to reproduce this issue is available at https://github.com/jepsen-io/mysql. On git commit 1da8600d902bc91a13cbd1b5bf5ccd458e449155, you should be able to run:

lein run test --db mysql --nodes n1 -w closed-predicate --concurrency 30n --rate 1000 --time-limit 120 -i serializable --nemesis none --key-count 40 --insert-only --log-sql

See https://github.com/jepsen-io/jepsen#readme for details on setting up a Jepsen environment.

This relies on some work-in-progress versions of underlying Jepsen libraries; I've pushed these to Clojars so this code should work today, but it may break as we evolve those libraries towards proper releases.

If you don't want to run Jepsen, you should be able to reproduce this issue by creating 2-3 tables with the above schema, spamming them with transactions that insert multiple keys, and concurrently performing predicate reads over all tables.

Suggested fix:
I'm sorry, but I haven't the faintest idea why MySQL does this or how to fix it! I imagine whatever fix gets written should make `SERIALIZABLE` prevent this behavior, though. :-)
[25 Sep 2023 11:40] MySQL Verification Team
Hi Mr. Kingsbury,

Thank you very much for your bug report.

Your bug looks very interesting to us and we would like to attempt to repeat it. We need to fully repeat what you experience, in order to be able to fix it.

Hence, we need a fully repeatable test case. It should consist of the set of the SQL statements that will always lead to the fractured read at the Serializable isolation level.

We are patiently waiting on your full feedback.
[25 Sep 2023 15:09] Kyle Kingsbury
Sure. Here's a pair of transactions that exhibited fractured read on the very first write to a system. These are the same transactions I outlined above, but I've filled in the parameterized values and added start/commit commands for you.

Schema:

create table if not exists txn0 (`system` int not null,
                            id int not null,
                           `value` int not null,
                            primary key (`system`, id));
create table if not exists txn1 (`system` int not null,
                            id int not null,
                           `value` int not null,
                            primary key (`system`, id));
create table if not exists txn2 (`system` int not null,
                            id int not null,
                           `value` int not null,
                            primary key (`system`, id));

T1:

set transaction isolation level serializable;
start transaction;
insert into txn0 (`system`, id, `value`) values (18, 0, 0);
insert into txn2 (`system`, id, `value`) values (18, 1, 1);
commit;

T2:

set transaction isolation level serializable;
start transaction;
select * from txn2 where `system` = 0 and TRUE;
select * from txn0 where `system` = 0 and TRUE;
select * from txn1 where `system` = 0 and TRUE;
commit;

This gives the following cycle:

https://s3.amazonaws.com/jepsen.io/analyses/mysql-8.0.34/serializable-g-single-2.svg

Here's the precise sequence of events from a test run with only two threads total. Note that this interleaves with parts of additional transactions performed by the same threads--I've included them just in case they're relevant.

2023-09-25 09:35:25,156{GMT}    INFO    [jepsen worker 1] jepsen.mysql.closed-predicate: next.jdbc/execute! ["insert into txn0 (`system`, id, `value`) values (?, ?, ?)" 18 0 0]
2023-09-25 09:35:25,156{GMT}    INFO    [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn0 where `system` = ? and TRUE" 18]
2023-09-25 09:35:25,157{GMT}    INFO    [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn2 where `system` = ? and TRUE" 18]
2023-09-25 09:35:25,158{GMT}    INFO    [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn1 where `system` = ? and TRUE" 18]
2023-09-25 09:35:25,159{GMT}    INFO    [jepsen worker 0] jepsen.util: 0        :ok     :txn    [18 [[:rp :true {}]]]
2023-09-25 09:35:25,159{GMT}    INFO    [jepsen worker 0] jepsen.util: 0        :invoke :txn    [18 [[:rp :true nil]]]
2023-09-25 09:35:25,164{GMT}    INFO    [jepsen worker 1] jepsen.mysql.closed-predicate: next.jdbc/execute! ["insert into txn2 (`system`, id, `value`) values (?, ?, ?)" 18 1 1]
2023-09-25 09:35:25,164{GMT}    INFO    [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn0 where `system` = ? and TRUE" 18]
2023-09-25 09:35:25,165{GMT}    INFO    [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn2 where `system` = ? and TRUE" 18]
2023-09-25 09:35:25,165{GMT}    INFO    [jepsen worker 1] jepsen.util: 1        :ok     :txn    [18 [[:insert 0 0] [:insert 1 1]]]
2023-09-25 09:35:25,166{GMT}    INFO    [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn1 where `system` = ? and TRUE" 18]
2023-09-25 09:35:25,166{GMT}    INFO    [jepsen worker 1] jepsen.util: 1        :invoke :txn    [18 [[:rp :true nil]]]
2023-09-25 09:35:25,167{GMT}    INFO    [jepsen worker 0] jepsen.util: 0        :ok     :txn    [18 [[:rp :true {1 1}]]]
[28 Sep 2023 9:55] MySQL Verification Team
Hi Mr. Kingsbury,

This is a very interesting bug report.

We have only a question regarding the output, that is the sequence of events.

We do not use Java tools for the bug reports in the Server. 

Can you duplicate the sequence of events with query logs or similar ????

Many thanks in advance.
[28 Sep 2023 16:08] Kyle Kingsbury
Sure. Here's a 10-second failing test run with two threads. The MySQL query log is in `n1/query.log`.

https://s3.amazonaws.com/jepsen.io/analyses/mysql-8.0.34/serializable-g-single-3.zip
[29 Sep 2023 9:51] MySQL Verification Team
Hi Mr. Kingsbury,

Sorry, but we can not download files from the Amazon.

Please, just send us a full test case as a set of SQL statements that always lead to the problem that you report.

This forum accepts only test cases of that type.

Waiting on your feedback.
[29 Sep 2023 13:28] Kyle Kingsbury
I have already provided the SQL statements you requested in three increasing levels of detail. You requested the query log, but you're... not willing to download a file from AWS, and your bug tracker doesn't have any way to upload files. Is it just AWS that you're unwilling to download files from? I could put it on a different domain...
[29 Sep 2023 13:33] MySQL Verification Team
Hi Mr. Kingsbury,

That seems to be our fault, for not explaining to you how to upload files to our site.

Below the header of the bug report you can notice 7 (seven) tabs.

Click on the "Files" tab and follow the instructions that you find there.

Good luck !!!!!!
[29 Sep 2023 13:38] Kyle Kingsbury
serializable-g-single-3.zip

Attachment: serializable-g-single-3.zip (application/zip, text), 1.24 MiB.

[29 Sep 2023 13:45] MySQL Verification Team
Hi Mr. Kingsbury,

Thank you for the upload.

Would you be so kind as to explain to us how to run the entire test case.

Your ZIP archive contains many files and subdirectories and we are not cognisant of the purpose of each one of them.

Also, please confirm that the all commands for the test case are those that you submitted on 25th of  September at 15:09 o'clock.

Many thanks in advance.
[29 Sep 2023 13:56] Kyle Kingsbury
Aha! It's not on the reply page the email links--separate tab altogether. I was searching for "attach", "upload", etc.
[29 Sep 2023 14:00] MySQL Verification Team
Hi Mr. Kingsbury,

Let us know when you finish with the upload.

Let us also know if you need any further assistance from us.
[29 Sep 2023 14:21] Kyle Kingsbury
> Would you be so kind as to explain to us how to run the entire test case.

Sure. Here's more detail on the repro instructions I initially provided:

1. Set up a Jepsen test cluster with at least one DB node: https://github.com/jepsen-io/jepsen#setting-up-a-jepsen-environment. There are pre-built AWS Marketplace images that ought to make this fairly easy: you can deploy a single instance and use `jepsen-lxc-start` to spool up a few LXC containers on that instance as DB nodes. Or you can deploy a few Debian Bookworm machines yourself, install a few packages, and set up SSH keys, as detailed in the README.

2. On the control node, clone https://github.com/jepsen-io/mysql

3. In that repository, check out commit 588b761861fd55d6711091c01a23b61ac9b6a014

4. In that repository, run:

lein run test --db mysql --nodes n1 -w closed-predicate --concurrency 2n --rate 1000 --time-limit 10 -i serializable --nemesis none --repro-112446

... where `n1` is the name of a DB node in the cluster. See `lein run test --help` for a full list of options.

> Your ZIP archive contains many files and subdirectories and we are not cognisant of the purpose of each one of them.

You'll find the query log in `n1/query.log`. `results.edn` includes data structures that describe any errors the test finds; `elle/` contains textual and visual representations of serializability violations found. There are also performance plots, various logs, data structures used for post-hoc analysis, etc--we can go into any of these in detail if you like, but I don't think it should be necessary--you understand why the pair of transactions I've described is illegal, right? All you need to do is see it happen for yourself. ;-)

> Also, please confirm that the all commands for the test case are those that you submitted on 25th of  September at 15:09 o'clock.

They are not: you asked me to provide a query log, so I re-ran the test with the query log enabled. The transactions are essentially unchanged, though. I've added the --repro-112446 option to generate smaller, simpler transactions that make good examples for purposes of this bug.
[2 Oct 2023 11:56] MySQL Verification Team
Sorry Mr. Kingsbury,

We do not support Jepsen test cluster.

We also only accept repeatable test cases with our official binaries only.

A test case should be in the form of a set of SQL queries that always lead to the problem that you are reporting.

Can' repeat.
[2 Oct 2023 16:15] Kyle Kingsbury
Well, this has been disheartening! I'm really disappointed that y'all aren't interested in this kind of thing.

I've since managed to track down an interaction between the test's transactional-vs-non-transactional decision procedure and the implementation of multi-table predicate reads; I believe it's responsible for the observed anomaly. I'm going to go ahead and close this bug.
[3 Oct 2023 10:07] MySQL Verification Team
Hi Mr. Kingsbury,

We are very much interested in observing the fractured reads as you report.

But, we use our own tools and we need a repeatable test case that we can reproduce with our own tools and our own diagnostic and monitoring software.

This is simply the procedure that was and still is very successful in discovering what is a bug and what is not a bug.