Bug #112446 | Fractured read at SERIALIZABLE | ||
---|---|---|---|
Submitted: | 24 Sep 2023 23:24 | Modified: | 2 Oct 2023 16:15 |
Reporter: | Kyle Kingsbury | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 8.0.34 | OS: | Debian (Debian 11) |
Assigned to: | CPU Architecture: | x86 (x86-64) | |
Tags: | ISOLATION, safety, SERIALIZABLE |
[24 Sep 2023 23:24]
Kyle Kingsbury
[25 Sep 2023 11:40]
MySQL Verification Team
Hi Mr. Kingsbury, Thank you very much for your bug report. Your bug looks very interesting to us and we would like to attempt to repeat it. We need to fully repeat what you experience, in order to be able to fix it. Hence, we need a fully repeatable test case. It should consist of the set of the SQL statements that will always lead to the fractured read at the Serializable isolation level. We are patiently waiting on your full feedback.
[25 Sep 2023 15:09]
Kyle Kingsbury
Sure. Here's a pair of transactions that exhibited fractured read on the very first write to a system. These are the same transactions I outlined above, but I've filled in the parameterized values and added start/commit commands for you. Schema: create table if not exists txn0 (`system` int not null, id int not null, `value` int not null, primary key (`system`, id)); create table if not exists txn1 (`system` int not null, id int not null, `value` int not null, primary key (`system`, id)); create table if not exists txn2 (`system` int not null, id int not null, `value` int not null, primary key (`system`, id)); T1: set transaction isolation level serializable; start transaction; insert into txn0 (`system`, id, `value`) values (18, 0, 0); insert into txn2 (`system`, id, `value`) values (18, 1, 1); commit; T2: set transaction isolation level serializable; start transaction; select * from txn2 where `system` = 0 and TRUE; select * from txn0 where `system` = 0 and TRUE; select * from txn1 where `system` = 0 and TRUE; commit; This gives the following cycle: https://s3.amazonaws.com/jepsen.io/analyses/mysql-8.0.34/serializable-g-single-2.svg Here's the precise sequence of events from a test run with only two threads total. Note that this interleaves with parts of additional transactions performed by the same threads--I've included them just in case they're relevant. 2023-09-25 09:35:25,156{GMT} INFO [jepsen worker 1] jepsen.mysql.closed-predicate: next.jdbc/execute! ["insert into txn0 (`system`, id, `value`) values (?, ?, ?)" 18 0 0] 2023-09-25 09:35:25,156{GMT} INFO [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn0 where `system` = ? and TRUE" 18] 2023-09-25 09:35:25,157{GMT} INFO [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn2 where `system` = ? and TRUE" 18] 2023-09-25 09:35:25,158{GMT} INFO [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn1 where `system` = ? and TRUE" 18] 2023-09-25 09:35:25,159{GMT} INFO [jepsen worker 0] jepsen.util: 0 :ok :txn [18 [[:rp :true {}]]] 2023-09-25 09:35:25,159{GMT} INFO [jepsen worker 0] jepsen.util: 0 :invoke :txn [18 [[:rp :true nil]]] 2023-09-25 09:35:25,164{GMT} INFO [jepsen worker 1] jepsen.mysql.closed-predicate: next.jdbc/execute! ["insert into txn2 (`system`, id, `value`) values (?, ?, ?)" 18 1 1] 2023-09-25 09:35:25,164{GMT} INFO [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn0 where `system` = ? and TRUE" 18] 2023-09-25 09:35:25,165{GMT} INFO [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn2 where `system` = ? and TRUE" 18] 2023-09-25 09:35:25,165{GMT} INFO [jepsen worker 1] jepsen.util: 1 :ok :txn [18 [[:insert 0 0] [:insert 1 1]]] 2023-09-25 09:35:25,166{GMT} INFO [jepsen worker 0] jepsen.mysql.closed-predicate: next.jdbc/execute! ["select * from txn1 where `system` = ? and TRUE" 18] 2023-09-25 09:35:25,166{GMT} INFO [jepsen worker 1] jepsen.util: 1 :invoke :txn [18 [[:rp :true nil]]] 2023-09-25 09:35:25,167{GMT} INFO [jepsen worker 0] jepsen.util: 0 :ok :txn [18 [[:rp :true {1 1}]]]
[28 Sep 2023 9:55]
MySQL Verification Team
Hi Mr. Kingsbury, This is a very interesting bug report. We have only a question regarding the output, that is the sequence of events. We do not use Java tools for the bug reports in the Server. Can you duplicate the sequence of events with query logs or similar ???? Many thanks in advance.
[28 Sep 2023 16:08]
Kyle Kingsbury
Sure. Here's a 10-second failing test run with two threads. The MySQL query log is in `n1/query.log`. https://s3.amazonaws.com/jepsen.io/analyses/mysql-8.0.34/serializable-g-single-3.zip
[29 Sep 2023 9:51]
MySQL Verification Team
Hi Mr. Kingsbury, Sorry, but we can not download files from the Amazon. Please, just send us a full test case as a set of SQL statements that always lead to the problem that you report. This forum accepts only test cases of that type. Waiting on your feedback.
[29 Sep 2023 13:28]
Kyle Kingsbury
I have already provided the SQL statements you requested in three increasing levels of detail. You requested the query log, but you're... not willing to download a file from AWS, and your bug tracker doesn't have any way to upload files. Is it just AWS that you're unwilling to download files from? I could put it on a different domain...
[29 Sep 2023 13:33]
MySQL Verification Team
Hi Mr. Kingsbury, That seems to be our fault, for not explaining to you how to upload files to our site. Below the header of the bug report you can notice 7 (seven) tabs. Click on the "Files" tab and follow the instructions that you find there. Good luck !!!!!!
[29 Sep 2023 13:38]
Kyle Kingsbury
serializable-g-single-3.zip
Attachment: serializable-g-single-3.zip (application/zip, text), 1.24 MiB.
[29 Sep 2023 13:45]
MySQL Verification Team
Hi Mr. Kingsbury, Thank you for the upload. Would you be so kind as to explain to us how to run the entire test case. Your ZIP archive contains many files and subdirectories and we are not cognisant of the purpose of each one of them. Also, please confirm that the all commands for the test case are those that you submitted on 25th of September at 15:09 o'clock. Many thanks in advance.
[29 Sep 2023 13:56]
Kyle Kingsbury
Aha! It's not on the reply page the email links--separate tab altogether. I was searching for "attach", "upload", etc.
[29 Sep 2023 14:00]
MySQL Verification Team
Hi Mr. Kingsbury, Let us know when you finish with the upload. Let us also know if you need any further assistance from us.
[29 Sep 2023 14:21]
Kyle Kingsbury
> Would you be so kind as to explain to us how to run the entire test case. Sure. Here's more detail on the repro instructions I initially provided: 1. Set up a Jepsen test cluster with at least one DB node: https://github.com/jepsen-io/jepsen#setting-up-a-jepsen-environment. There are pre-built AWS Marketplace images that ought to make this fairly easy: you can deploy a single instance and use `jepsen-lxc-start` to spool up a few LXC containers on that instance as DB nodes. Or you can deploy a few Debian Bookworm machines yourself, install a few packages, and set up SSH keys, as detailed in the README. 2. On the control node, clone https://github.com/jepsen-io/mysql 3. In that repository, check out commit 588b761861fd55d6711091c01a23b61ac9b6a014 4. In that repository, run: lein run test --db mysql --nodes n1 -w closed-predicate --concurrency 2n --rate 1000 --time-limit 10 -i serializable --nemesis none --repro-112446 ... where `n1` is the name of a DB node in the cluster. See `lein run test --help` for a full list of options. > Your ZIP archive contains many files and subdirectories and we are not cognisant of the purpose of each one of them. You'll find the query log in `n1/query.log`. `results.edn` includes data structures that describe any errors the test finds; `elle/` contains textual and visual representations of serializability violations found. There are also performance plots, various logs, data structures used for post-hoc analysis, etc--we can go into any of these in detail if you like, but I don't think it should be necessary--you understand why the pair of transactions I've described is illegal, right? All you need to do is see it happen for yourself. ;-) > Also, please confirm that the all commands for the test case are those that you submitted on 25th of September at 15:09 o'clock. They are not: you asked me to provide a query log, so I re-ran the test with the query log enabled. The transactions are essentially unchanged, though. I've added the --repro-112446 option to generate smaller, simpler transactions that make good examples for purposes of this bug.
[2 Oct 2023 11:56]
MySQL Verification Team
Sorry Mr. Kingsbury, We do not support Jepsen test cluster. We also only accept repeatable test cases with our official binaries only. A test case should be in the form of a set of SQL queries that always lead to the problem that you are reporting. Can' repeat.
[2 Oct 2023 16:15]
Kyle Kingsbury
Well, this has been disheartening! I'm really disappointed that y'all aren't interested in this kind of thing. I've since managed to track down an interaction between the test's transactional-vs-non-transactional decision procedure and the implementation of multi-table predicate reads; I believe it's responsible for the observed anomaly. I'm going to go ahead and close this bug.
[3 Oct 2023 10:07]
MySQL Verification Team
Hi Mr. Kingsbury, We are very much interested in observing the fractured reads as you report. But, we use our own tools and we need a repeatable test case that we can reproduce with our own tools and our own diagnostic and monitoring software. This is simply the procedure that was and still is very successful in discovering what is a bug and what is not a bug.