Bug #115608 Inplace ALTER TABLE might cause lost rows if concurrent purge
Submitted: 16 Jul 2024 13:10 Modified: 16 Jan 16:53
Reporter: Dmitry Lenev (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[16 Jul 2024 13:10] Dmitry Lenev
Description:
Inplace ALTER TABLE might sometimes lead to lost rows from a table (which uses InnoDB SE) if concurrent purge happens on the table being ALTERed.

This problem has been observed in the wild under load which involved lots of concurrent INSERTs and UPDATES to table being ALTERed, but recipe for reproducing this situation was not available.

However, while investigating bug #115511 "Inplace ALTER TABLE might fail with duplicate key error if concurrent insertions" I was able to come up with a test case using debug server and sync point facility.

How to repeat:
Apply the following patch to MySQL Server code (I have used 8.0.38 source from the Github) to introduce new sync point:

diff --git a/storage/innobase/ddl/ddl0par-scan.cc b/storage/innobase/ddl/ddl0par-scan.cc
index f7f948a0f74..f334a7d2f5c 100644
--- a/storage/innobase/ddl/ddl0par-scan.cc
+++ b/storage/innobase/ddl/ddl0par-scan.cc
@@ -237,6 +237,7 @@ dberr_t Parallel_cursor::scan(Builders &builders) noexcept {
             thread_ctx->get_state() != Parallel_reader::State::THREAD) {
           thread_ctx->savepoint();
           latches_released = true;
+          DEBUG_SYNC_C("ddl_bulk_inserter_latches_released");
         }
         return DB_SUCCESS;
       });

Build debug version of server and run the following test case for MTR framework:

CREATE TABLE t1 (pk CHAR(5) PRIMARY KEY);
INSERT INTO t1 VALUES ('aaaaa'), ('bbbbb'), ('bbbcc'), ('ccccc'), ('ddddd'), ('eeeee');

set global innodb_purge_stop_now=ON;
DELETE FROM t1 WHERE pk = 'bbbcc';

--connect (con1, localhost, root,,)
SET DEBUG='+d,ddl_buf_add_two';
SET DEBUG_SYNC='ddl_bulk_inserter_latches_released SIGNAL latches_released WAIT_FOR go';
--send ALTER TABLE t1 ENGINE=InnoDB, ALGORITHM=INPLACE

--connection default
SET DEBUG_SYNC='now WAIT_FOR latches_released';
SET GLOBAL innodb_purge_run_now=ON;
--source include/wait_innodb_all_purged.inc
SET DEBUG_SYNC='now SIGNAL go';

--connection con1
--reap
SELECT * FROM t1;
# Returns:
# pk
# aaaaa
# bbbbb
# ccccc
# eeeee

SET DEBUG='-d,ddl_buf_add_two';

--connection default
DROP TABLE t1;

########################################

Notice that SELECT in the above test case misses the row 'ddddd' even though it was not deleted and supposed to be present in table after it is ALTERed.

Suggested fix:
Fix implementation of parallel ALTER TABLE INPLACE in InnoDB engine not to miss the rows or process the same rows twice (see bug #115511).

Particularly, PCursor::restore_position() and PCursor::resume() implementations look wrong to me in cases when they are used in the context of pausing/resume table scanning during parallel ALTER TABLE.
[16 Jul 2024 13:35] MySQL Verification Team
Hi Mr. Lenev,

Thank you for your bug report.

We already had a debug mysqld binary built with the same DBUG_SYNC as in the previous bug, so we just ran the test.

We repeated your results:

--------------------------------------------------
ogging: ./mysql-test-run.pl  t/dmitry_115608.test
mysqld(4334,0x7ff850649fc0) malloc: nano zone abandoned due to inability to reserve vm space.
MySQL Version 8.0.38
Checking supported features
 - Binaries are debug compiled
mysqltest(4336,0x7ff850649fc0) malloc: nano zone abandoned due to inability to reserve vm space.
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '...../mysql-test/var'
Installing system database
Using parallel: 1

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] main.dmitry_115608                        [ fail ]  Found warnings/errors in error log file!
        Test ended at 2024-07-16 16:29:08
include/load_error_log.inc
line

Mysqltest client output from logfile
----------- MYSQLTEST OUTPUT START -----------

CREATE TABLE t1 (pk CHAR(5) PRIMARY KEY);
INSERT INTO t1 VALUES ('aaaaa'), ('bbbbb'), ('bbbcc'), ('ccccc'), ('ddddd'), ('eeeee');
set global innodb_purge_stop_now=ON;
DELETE FROM t1 WHERE pk = 'bbbcc';
SET DEBUG='+d,ddl_buf_add_two';
SET DEBUG_SYNC='ddl_bulk_inserter_latches_released SIGNAL latches_released WAIT_FOR go';
ALTER TABLE t1 ENGINE=InnoDB, ALGORITHM=INPLACE;
SET DEBUG_SYNC='now WAIT_FOR latches_released';
SET GLOBAL innodb_purge_run_now=ON;
SET DEBUG_SYNC='now SIGNAL go';
SELECT * FROM t1;
pk
aaaaa
bbbbb
ccccc
eeeee
SET DEBUG='-d,ddl_buf_add_two';
DROP TABLE t1;

------------ MYSQLTEST OUTPUT END -----------

 - the logfile can be found in ......./log/main.dmitry_115608/dmitry_115608.log'

[100%] shutdown_report                           [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 0.000 of 58 seconds executing testcases

Completed: Failed 1/2 tests, 50.00% were successful.

Failing test(s): main.dmitry_115608

--------------------------------------------------

This bug is now verified for 8.0, 8.4 and 9.0

Thank you for your contribution.

Many regards .......
[18 Jul 2024 21:54] Dmitry Lenev
Tentative fix for the lost rows problem.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: ps-9144-bug115608-contrib.patch (text/x-patch), 10.55 KiB.

[18 Jul 2024 21:57] Dmitry Lenev
Hello!

I am adding a patch with tentative fix to the problem described in this bug above. It would be nice if this area/code got some attention from the MySQL team as it seems that there are more bugs lurking there.
[18 Jul 2024 21:58] Dmitry Lenev
Forgot to add that the submitted patch is against MySQL Server 8.0.38 sources.
[19 Jul 2024 9:45] MySQL Verification Team
Thank you, Mr. Lenev, for your contribution.
[12 Dec 2024 4:13] Rahul Sisondia
Posted by developer:
 
Thank you Dmitry for the contribution along with beautiful test case.

Updates:  This bug is fixed in upcoming 8.0.41 release.
[12 Dec 2024 11:02] MySQL Verification Team
Thank you, Rahul.

It is our opinion that this bug report can be closed now ......
[16 Jan 16:53] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Server 8.0.41, 8.4.4, and 9.2.0 releases, and here's the proposed changelog entry from the documentation team for review:

An ALTER TABLE operation that rebuilt an InnoDB table using the INPLACE
algorithm potentially led to losing a row of data if a concurrent purge
happened on the altered table that contained a clustered or spatial index.

Our thanks to Dmitry Lenev and the team at Percona for
contributing to this fix.

Thank you for the bug report.
[16 Jan 17:06] MySQL Verification Team
Thank you, Philip.