| Bug #113812 | alter table tablename engine=innodb cause data loss | ||
|---|---|---|---|
| Submitted: | 31 Jan 2024 2:06 | Modified: | 5 Feb 2024 9:12 |
| Reporter: | chengkang Liao | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: DDL | Severity: | S2 (Serious) |
| Version: | 8.0.33, 8.0.36 | OS: | Linux |
| Assigned to: | CPU Architecture: | x86 | |
| Tags: | 8.0.33, ALTER TABLE tablename engine=innodb | ||
[31 Jan 2024 2:09]
chengkang Liao
my.cnf and Numbering tool
Attachment: mysql_random_data_load_0.1.12_Linux_x86_64.tar.gz (application/x-gzip, text), 3.18 MiB.
[31 Jan 2024 5:17]
LeYuan Zhong
I have repeatted the bug . my test log as follow: count before delete data +------+ | 2192 | +------+ delete data... count after delete table 1209 alter table... count after alter table 1208 --wrong
[31 Jan 2024 9:43]
chengkang Liao
Test 8.0.26 found no data loss. Test 8.0.27 is missing. Check the update of change log about innodb_ddl_buffer_size. Adjusting the time when innodb_ddl_buffer_size appears to be 5M will delay the occurrence of lost data.
[5 Feb 2024 9:12]
MySQL Verification Team
Hello chengkang Liao, Thank you for the report and test case. Verified as described. regards, Umesh
[25 Apr 2024 17:59]
LeYuan Zhong
When user call "alter table tablename engine=innodb", MySQL will rebuild the table online. MySQL scan the original table, add record to a key_buffer. when key_buffer is full, page latch will be release and use PCursor to store cursor. PCursor::savepoint() call m_pcur->move_to_prev_on_page() before store position to aviod storing "supremum". PCursor::resume() call m_pcur->move_to_next_on_page() after restore position. PCursor::resume() use restore_position() to restore from persistent cursor. When record store in persistent cursor has been purge, restore_position() will set cursor to a previous record. On this situation, PCursor::resume() will call page_cur_move_to_next() to move cursor to its right position. So, PCursor::resume() may move cursor to next record twice and it cause cursor skip a record. Here is a example: (.) a (X) b (.) c <--cursor (.) d (.) e 1. Cursor point to record 'c' and copy 'c' to builder and builder try to add it to key_buffer, but key_buffer is full. Start bulk insert. 2. To release page latch, PCursor::savepoint() will be called. Cursor move to record 'b' and store 'b' to persistent cursor. 3. During bulk insert, purge thread remove record 'b'. 4. Bulk insert is done, go on record scan. PCursor::resume() will be called. 5. First, PCursor::resume() calls restore_position(). restore_position() makes cursor restore to record 'a', then move to record 'c'. 6. Second, PCursor::resume() calls m_pcur->move_to_next_on_page(). Cursor move to recode 'd'. 7. Builder contiunes to add record 'c' to key_buffer. after that, cursor move to next record, which is record 'e'. 8. Record 'd' is skipped
[25 Apr 2024 18:14]
LeYuan Zhong
PCursor::savepoint() use m_pcur->store_position() to store cursor. m_pcur->store_position() will store previous record when cursor pointing to "supremum". so, it is no need to call m_pcur->move_to_prev_on_page() in m_pcur->move_to_prev_on_page(). Thus, calling m_pcur->move_to_next_on_page() in PCursor::resume() is unnecessary too.
[25 Apr 2024 18:24]
LeYuan Zhong
When record store in persistent cursor has been purge, restore_position() will set cursor to a previous record. On this situation, PCursor::resume() will call page_cur_move_to_next() to move cursor to its right position. "PCursor::resume() will call" should be "restore_position() will call"
[25 Apr 2024 18:41]
LeYuan Zhong
Fix bug #113812 alter table tablename engine=innodb cause data loss (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: fix_bug_#113812.txt (text/plain), 796 bytes.
[26 Apr 2024 19:27]
OCA Admin
Contribution submitted via Github - Fix bug #113812 alter table tablename engine=innodb cause data loss (*) Contribution by LeYuan Zhong (Github dgzhongly, mysql-server/pull/540#issuecomment-2078405060): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: git_patch_1841482866.txt (text/plain), 3.14 KiB.
[9 May 2024 12:40]
LeYuan Zhong
How to reproduce:
*In order to increase the peace in which this issue happens. Modify a/storage/innobase/ddl/ddl0builder.cc
diff --git a/storage/innobase/ddl/ddl0builder.cc b/storage/innobase/ddl/ddl0builder.cc
index 59edd0df340..b4d35703c63 100644
--- a/storage/innobase/ddl/ddl0builder.cc
+++ b/storage/innobase/ddl/ddl0builder.cc
@@ -1488,6 +1488,8 @@ dberr_t Builder::bulk_add_row(Cursor &cursor, Row &row, size_t thread_id,
err = insert_direct(cursor, thread_id);
+ std::this_thread::sleep_for(std::chrono::seconds(3));
+
key_buffer->clear();
if (err != DB_SUCCESS) {
diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc
index ad616805904..6c12c8a15ef 100644
--- a/storage/innobase/srv/srv0srv.cc
+++ b/storage/innobase/srv/srv0srv.cc
@@ -3117,6 +3117,8 @@ void srv_purge_coordinator_thread() {
srv_purge_coordinator_suspend(slot, rseg_history_len);
}
+ std::this_thread::sleep_for(std::chrono::seconds(1));
+
if (srv_purge_should_exit(n_total_purged)) {
ut_a(!slot->suspended);
break;
# create a table
CREATE TABLE `test` (
`id` int NOT NULL AUTO_INCREMENT,
`c1` varchar(1000) DEFAULT NULL,
PRIMARY KEY (`id`)
);
# create a procedure
delimiter $$
CREATE PROCEDURE load_data(IN n int)
BEGIN
DECLARE i INT DEFAULT 1;
WHILE (i <= n) DO
insert into test (c1) values(repeat('a',1000));
SET i = i + 1;
END WHILE;
END $$
delimiter ;
# insert data
call load_data(1000);
Reproduce:
#make sure innodb_ddl_buffer_size is defautl value
set session innodb_ddl_buffer_size=1048576;
#excute quickly
select count(*) from test;
delete from test where id=502;
select count(*) from test;
alter table test engine=innodb;
select count(*) from test;
[13 Jul 2024 12:28]
LeYuan Zhong
hello. I can still reproduce the problem on MySQL 9.0.0 When will it be fixed?

Description: Hi Team, After I delete the table, I execute 'alter table table table name engine = innodb;' on the table. Data will be lost when delete and alter table repeatedly. How to repeat: liaock@mysql3306.sock 08:49: [db1]> show create table his_schedule_job_run\G *************************** 1. row *************************** Table: his_schedule_job_run Create Table: CREATE TABLE `his_schedule_job_run` ( `id` varchar(36) NOT NULL, `job_id` varchar(36) DEFAULT NULL, `job_type` varchar(2) DEFAULT NULL, `job_group` varchar(45) DEFAULT NULL, `job_obiect` varchar(256) DEFAULT NULL, `shell_path` varchar(256) DEFAULT NULL, `cron_expression` varchar(128) DEFAULT NULL, `run_start_time` datetime DEFAULT NULL, `run_end_time` datetime DEFAULT NULL, `run_result` varchar(2) DEFAULT NULL, `run_message` varchar(4000) DEFAULT NULL, PRIMARY KEY (`id`) USING BTREE, KEY `fk_reference_02` (`job_id`) USING BTREE, KEY `fk_reference_03` (`run_start_time`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COMMENT='logtable' 1 row in set (0.00 sec) simulated dataļ¼ [root@testdb03 ~]# cat /server/scripts/gen_del.sh #!/bin/bash for i in {1..10} do echo "call db1.his_schedule_job_run $i time" >> /tmp/his_schedule_job_run.log /opt/mysql_random_data_load -uliaock -pliaock -P3306 --max-threads=2 db1 his_schedule_job_run 1000 /usr/local/mysql/bin/mysql -uliaock -pliaock -S /data/socket/mysql3306.sock -e "select count(*) from db1.his_schedule_job_run;" >> /tmp/his_schedule_job_run.log /usr/local/mysql/bin/mysql -uliaock -pliaock -S /data/socket/mysql3306.sock -e "delete from db1.his_schedule_job_run where run_start_time < curdate() - interval 7 day;" echo "call $i time alter before" >> /tmp/his_schedule_job_run.log /usr/local/mysql/bin/mysql -uliaock -pliaock -S /data/socket/mysql3306.sock -e "select count(*) from db1.his_schedule_job_run;" >> /tmp/his_schedule_job_run.log /usr/local/mysql/bin/mysql -uliaock -pliaock -h172.16.0.2 -P3306 -e "select count(*) from db1.his_schedule_job_run;" >> /tmp/his_schedule_job_run.log /usr/local/mysql/bin/mysql -uliaock -pliaock -S /data/socket/mysql3306.sock -e "alter table db1.his_schedule_job_run engine = innodb;" echo "call $i time alter after" >> /tmp/his_schedule_job_run.log /usr/local/mysql/bin/mysql -uliaock -pliaock -S /data/socket/mysql3306.sock -e "select count(*) from db1.his_schedule_job_run;" >> /tmp/his_schedule_job_run.log /usr/local/mysql/bin/mysql -uliaock -pliaock -h172.16.0.2 -P3306 -e "select count(*) from db1.his_schedule_job_run;" >> /tmp/his_schedule_job_run.log echo "call db1.his_schedule_job_run $i time" >> /tmp/his_schedule_job_run.log done After multiple scripts are executed, there will be less data in the table. cat /tmp/his_schedule_job_run.log call db1.his_schedule_job_run 1 time call 1 time delete before count(*) 1000 call 1 time alter before and delete after count(*) 25 call 1 time alter after count(*) 25 call db1.his_schedule_job_run 2 time call 2 time delete before count(*) 1025 call 2 time alter before and delete after count(*) 58 call 2 time alter after count(*) 58 call db1.his_schedule_job_run 3 time call 3 time delete before count(*) 1058 call 3 time alter before and delete after count(*) 83 call 3 time alter after count(*) 83 call db1.his_schedule_job_run 4 time call 4 time delete before count(*) 1083 call 4 time alter before and delete after count(*) 113 call 4 time alter after count(*) 113 call db1.his_schedule_job_run 5 time call 5 time delete before count(*) 1113 call 5 time alter before and delete after count(*) 129 call 5 time alter after count(*) 129 call db1.his_schedule_job_run 6 time call 6 time delete before count(*) 1129 call 6 time alter before and delete after count(*) 146 call 6 time alter after count(*) 146 call db1.his_schedule_job_run 7 time call 7 time delete before count(*) 1146 call 7 time alter before and delete after count(*) 169 call 7 time alter after count(*) 169 call db1.his_schedule_job_run 8 time call 8 time delete before count(*) 1169 call 8 time alter before and delete after count(*) 191 call 8 time alter after count(*) 191 call db1.his_schedule_job_run 9 time call 9 time delete before count(*) 1191 call 9 time alter before and delete after count(*) 212 call 9 time alter after count(*) 212 call db1.his_schedule_job_run 10 time call 10 time delete before count(*) 1212 call 10 time alter before and delete after count(*) 227 call 10 time alter after count(*) 227 call db1.his_schedule_job_run 1 time call 1 time delete before count(*) 1227 call 1 time alter before and delete after count(*) 249 call 1 time alter after count(*) 249 call db1.his_schedule_job_run 2 time call 2 time delete before count(*) 1249 call 2 time alter before and delete after count(*) 275 call 2 time alter after count(*) 275 call db1.his_schedule_job_run 3 time call 3 time delete before count(*) 1275 call 3 time alter before and delete after count(*) 284 call 3 time alter after count(*) 284 call db1.his_schedule_job_run 4 time call 4 time delete before count(*) 1284 call 4 time alter before and delete after count(*) 303 call 4 time alter after count(*) 303 call db1.his_schedule_job_run 5 time call 5 time delete before count(*) 1303 call 5 time alter before and delete after count(*) 332 call 5 time alter after count(*) 332 call db1.his_schedule_job_run 6 time call 6 time delete before count(*) 1332 call 6 time alter before and delete after count(*) 355 call 6 time alter after count(*) 355 call db1.his_schedule_job_run 7 time call 7 time delete before count(*) 1355 call 7 time alter before and delete after count(*) 375 call 7 time alter after count(*) 375 call db1.his_schedule_job_run 8 time call 8 time delete before count(*) 1375 call 8 time alter before and delete after count(*) 398 call 8 time alter after count(*) 398 call db1.his_schedule_job_run 9 time call 9 time delete before count(*) 1398 call 9 time alter before and delete after count(*) 421 call 9 time alter after count(*) 421 call db1.his_schedule_job_run 10 time call 10 time delete before count(*) 1421 call 10 time alter before and delete after count(*) 447 call 10 time alter after count(*) 447 call db1.his_schedule_job_run 1 time call 1 time delete before count(*) 1447 call 1 time alter before and delete after count(*) 461 call 1 time alter after count(*) 461 call db1.his_schedule_job_run 2 time call 2 time delete before count(*) 1461 call 2 time alter before and delete after count(*) 482 call 2 time alter after count(*) 482 call db1.his_schedule_job_run 3 time call 3 time delete before count(*) 1482 call 3 time alter before and delete after count(*) 496 call 3 time alter after count(*) 496 call db1.his_schedule_job_run 4 time call 4 time delete before count(*) 1496 call 4 time alter before and delete after count(*) 512 call 4 time alter after count(*) 512 call db1.his_schedule_job_run 5 time call 5 time delete before count(*) 1512 call 5 time alter before and delete after count(*) 531 call 5 time alter after count(*) 531 call db1.his_schedule_job_run 6 time call 6 time delete before count(*) 1531 call 6 time alter before and delete after count(*) 554 call 6 time alter after count(*) 554 call db1.his_schedule_job_run 7 time call 7 time delete before count(*) 1554 call 7 time alter before and delete after count(*) 577 call 7 time alter after count(*) 577 call db1.his_schedule_job_run 8 time call 8 time delete before count(*) 1577 call 8 time alter before and delete after count(*) 595 call 8 time alter after count(*) 595 call db1.his_schedule_job_run 9 time call 9 time delete before count(*) 1595 call 9 time alter before and delete after count(*) 616 call 9 time alter after count(*) 616 call db1.his_schedule_job_run 10 time call 10 time delete before count(*) 1616 call 10 time alter before and delete after count(*) 627 call 10 time alter after count(*) 627 call db1.his_schedule_job_run 1 time call 1 time delete before count(*) 1627 call 1 time alter before and delete after count(*) 646 call 1 time alter after count(*) 646 call db1.his_schedule_job_run 2 time call 2 time delete before count(*) 1646 call 2 time alter before and delete after count(*) 666 call 2 time alter after count(*) 666 call db1.his_schedule_job_run 3 time call 3 time delete before count(*) 1666 call 3 time alter before and delete after count(*) 687 call 3 time alter after count(*) 687 call db1.his_schedule_job_run 4 time call 4 time delete before count(*) 1687 call 4 time alter before and delete after count(*) 701 call 4 time alter after count(*) 701 call db1.his_schedule_job_run 5 time call 5 time delete before count(*) 1701 call 5 time alter before and delete after count(*) 722 call 5 time alter after count(*) 722 call db1.his_schedule_job_run 6 time call 6 time delete before count(*) 1722 call 6 time alter before and delete after count(*) 743 call 6 time alter after count(*) 743 call db1.his_schedule_job_run 7 time call 7 time delete before count(*) 1743 call 7 time alter before and delete after count(*) 766 call 7 time alter after count(*) 766 call db1.his_schedule_job_run 8 time call 8 time delete before count(*) 1766 call 8 time alter before and delete after count(*) 796 call 8 time alter after count(*) 796 call db1.his_schedule_job_run 9 time call 9 time delete before count(*) 1796 call 9 time alter before and delete after count(*) 816 call 9 time alter after count(*) 816 call db1.his_schedule_job_run 10 time call 10 time delete before count(*) 1816 call 10 time alter before and delete after count(*) 838 call 10 time alter after count(*) 838 call db1.his_schedule_job_run 1 time call 1 time delete before count(*) 1838 call 1 time alter before and delete after count(*) 854 call 1 time alter after count(*) 854 call db1.his_schedule_job_run 2 time call 2 time delete before count(*) 1854 call 2 time alter before and delete after count(*) 872 call 2 time alter after count(*) 872 call db1.his_schedule_job_run 3 time call 3 time delete before count(*) 1872 call 3 time alter before and delete after ---There are differences between the data before and after alter. count(*) 897 call 3 time alter after ---There are differences between the data before and after alter. count(*) 896 call db1.his_schedule_job_run 4 time call 4 time delete before count(*) 1896 call 4 time alter before and delete after count(*) 914 call 4 time alter after count(*) 914 call db1.his_schedule_job_run 5 time call 5 time delete before count(*) 1914 call 5 time alter before and delete after ---There are differences between the data before and after alter. count(*) 932 call 5 time alter after ---There are differences between the data before and after alter. count(*) 931 call db1.his_schedule_job_run 6 time call 6 time delete before count(*) 1931 call 6 time alter before and delete after count(*) 951 call 6 time alter after count(*) 951 call db1.his_schedule_job_run 7 time call 7 time delete before count(*) 1951 call 7 time alter before and delete after count(*) 970 call 7 time alter after count(*) 970 call db1.his_schedule_job_run 8 time call 8 time delete before count(*) 1970 call 8 time alter before and delete after count(*) 988 call 8 time alter after count(*) 988 call db1.his_schedule_job_run 9 time call 9 time delete before count(*) 1988 call 9 time alter before and delete after count(*) 1014 call 9 time alter after count(*) 1014 call db1.his_schedule_job_run 10 time call 10 time delete before count(*) 2014 call 10 time alter before and delete after count(*) 1040 call 10 time alter after count(*) 1040 Also, if there is a replication environment, it will be easier for the slave library to see that the data lost in the master library is still on the slave library.