| Bug #117135 | When executing online ddl rebuilt table, concurrent delete operations may cause row_log memory to be read out of bounds. | ||
|---|---|---|---|
| Submitted: | 8 Jan 11:54 | Modified: | 10 Jan 12:03 |
| Reporter: | fan devin (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: DDL | Severity: | S2 (Serious) |
| Version: | 8.0 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | Online DDL | ||
[8 Jan 16:46]
fan devin
There are some errors in the previous comments. If you want to repeat the problem, you should add ut_a (mrec < mrec_end) after mrec += extra_size at storage/innobase/row/row0log.cc:2467. Otherwise, it is difficult to cause a segfault, because although the read address is out of bounds, it may be valid allocated elsewhere.
[9 Jan 16:17]
MySQL Verification Team
Hi Mr. devin, Thank you for your bug report. However, we can not repeat it. Simply, innodb_sort_buffer_size is a read-only variable. You should send us a mysql-test/ script or stand-alone test case that we can reproduce. Hence , your test case is not complete. Can't repeat.
[10 Jan 8:38]
fan devin
I rewrote a test case that can be reproduced. use: mtr innodb-table-online-delete-release --valgrind --record
Attachment: test_case.zip (application/zip, text), 1.02 KiB.
[10 Jan 8:41]
fan devin
I used the above script to reproduce the problem in 8.0.33. core stack: ==31530== Thread 40 connection: ==31530== Invalid read of size 1 ==31530== at 0x26CA300: rec_init_offsets_comp_ordinary(unsigned char const*, bool, dict_index_t const*, unsigned long*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x25FC777: row_log_table_apply_op(que_thr_t*, unsigned long, unsigned long, ddl::Dup*, dberr_t*, mem_block_info_t*, mem_block_info_t*, unsigned char const*, unsigned char const*, unsigned long*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x25FFEAB: row_log_table_apply_ops(que_thr_t*, ddl::Dup*, Alter_stage*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x26022A7: row_log_table_apply(que_thr_t*, dict_table_t*, TABLE*, Alter_stage*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x24B5179: bool ha_innobase::inplace_alter_table_impl<dd::Table>(TABLE*, Alter_inplace_info*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x24B53EE: ha_innobase::inplace_alter_table(TABLE*, Alter_inplace_info*, dd::Table const*, dd::Table*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xF51546: mysql_inplace_alter_table(THD*, dd::Schema const&, dd::Schema const&, dd::Table const*, dd::Table*, Table_ref*, TABLE*, TABLE*, Alter_inplace_info*, enum_alter_inplace_result, Alter_table_ctx*, std::set<std::string, std::less<std::string>, Stateless_allocator<std::string, histograms::Histogram_psi_key_alloc, My_free_functor> >&, FOREIGN_KEY*, unsigned int, Foreign_key_parents_invalidator*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xF66B7D: mysql_alter_table(THD*, char const*, char const*, HA_CREATE_INFO*, Table_ref*, Alter_info*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x13B6B97: Sql_cmd_alter_table::execute(THD*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xEA3FE1: mysql_execute_command(THD*, bool) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xEA8C83: dispatch_sql_command(THD*, Parser_state*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xEA9F4C: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xEAC17E: do_command(THD*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x1017B77: handle_connection (in /usr/local/mysql/bin/mysqld) ==31530== by 0x2BF2A64: pfs_spawn_thread (in /usr/local/mysql/bin/mysqld) ==31530== by 0x5960EA4: start_thread (in /usr/lib64/libpthread-2.17.so) ==31530== Address 0x4d2f002 is not stack'd, malloc'd or (recently) free'd ==31530== 2025-01-10T08:20:54Z UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. ... Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (14ad4a70): alter table my_table engine=innodb Connection ID (thread ID): 9 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Writing a core file ==31530== ==31530== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==31530== at 0x5965AA1: pthread_kill (in /usr/lib64/libpthread-2.17.so) ==31530== by 0x102777C: handle_fatal_signal (in /usr/local/mysql/bin/mysqld) ==31530== by 0x596862F: ??? (in /usr/lib64/libpthread-2.17.so) ==31530== by 0x26CA2FF: rec_init_offsets_comp_ordinary(unsigned char const*, bool, dict_index_t const*, unsigned long*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x25FC777: row_log_table_apply_op(que_thr_t*, unsigned long, unsigned long, ddl::Dup*, dberr_t*, mem_block_info_t*, mem_block_info_t*, unsigned char const*, unsigned char const*, unsigned long*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x25FFEAB: row_log_table_apply_ops(que_thr_t*, ddl::Dup*, Alter_stage*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x26022A7: row_log_table_apply(que_thr_t*, dict_table_t*, TABLE*, Alter_stage*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x24B5179: bool ha_innobase::inplace_alter_table_impl<dd::Table>(TABLE*, Alter_inplace_info*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x24B53EE: ha_innobase::inplace_alter_table(TABLE*, Alter_inplace_info*, dd::Table const*, dd::Table*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xF51546: mysql_inplace_alter_table(THD*, dd::Schema const&, dd::Schema const&, dd::Table const*, dd::Table*, Table_ref*, TABLE*, TABLE*, Alter_inplace_info*, enum_alter_inplace_result, Alter_table_ctx*, std::set<std::string, std::less<std::string>, Stateless_allocator<std::string, histograms::Histogram_psi_key_alloc, My_free_functor> >&, FOREIGN_KEY*, unsigned int, Foreign_key_parents_invalidator*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xF66B7D: mysql_alter_table(THD*, char const*, char const*, HA_CREATE_INFO*, Table_ref*, Alter_info*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0x13B6B97: Sql_cmd_alter_table::execute(THD*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xEA3FE1: mysql_execute_command(THD*, bool) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xEA8C83: dispatch_sql_command(THD*, Parser_state*) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xEA9F4C: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in /usr/local/mysql/bin/mysqld) ==31530== by 0xEAC17E: do_command(THD*) (in /usr/local/mysql/bin/mysqld)
[10 Jan 8:47]
fan devin
Note: you need to start mtr with --valgrind and innodb-table-online-delete-release-master.opt configurations, otherwise it will be very difficult to reproduce the problem.
[10 Jan 12:03]
MySQL Verification Team
Hi Mr. devin, Thank you for your bug report. After much effort, we have managed to repeat the bug. Thank you very much. Our stacktrace is very, very similar to yours. This is now a verified bug report for version 8.0 and all higher versions.

Description: In the static const ddl::mrec_t *row_log_table_apply_op function, for the row_log of the ROW_T_INSERT and ROW_T_UPDATE types, the extra_info spans two blocks is avoided (returning nullptr), but for the row_log of the ROW_T_DELETE type, the pointer out-of-bounds problem is not judged, which may lead to Read out of bounds memory. // source code switch (*mrec++) { default: *error = DB_CORRUPTION; ut_d(ut_error); ut_o(return (nullptr)); case ROW_T_INSERT: extra_size = *mrec++; if (extra_size >= 0x80) { /* Read another byte of extra_size. */ extra_size = (extra_size & 0x7f) << 8; extra_size |= *mrec++; } mrec += extra_size; // insert op check here if (mrec > mrec_end) { return (nullptr); } ... ... ... case ROW_T_DELETE: /* 1 (extra_size) + at least 1 (payload) */ if (mrec + 2 >= mrec_end) { return (nullptr); } extra_size = *mrec++; ut_ad(mrec < mrec_end); /* We assume extra_size < 0x100 for the PRIMARY KEY prefix. For fixed-length PRIMARY key columns, it is 0. */ mrec += extra_size; //delete op not check mrec out of bound // will use illegal address in rec_deserialize_init_offsets rec_offs_set_n_fields(offsets, new_index->n_uniq + 2); rec_deserialize_init_offsets(mrec, new_index, offsets); next_mrec = mrec + rec_offs_data_size(offsets); if (log->table->n_v_cols) { if (next_mrec + 2 > mrec_end) { return (nullptr); } next_mrec += mach_read_from_2(next_mrec); } How to repeat: mtr case: start --mysqld=--innodb-sort-buffer-size=65536 --source include/have_debug_sync.inc use test; CREATE TABLE my_table ( ID int NOT NULL AUTO_INCREMENT, NAME varchar(200) NOT NULL, TIME timestamp DEFAULT CURRENT_TIMESTAMP, NAME2 varchar(64) NOT NULL, NAME3 varchar(64) NOT NULL, value int NOT NULL, PRIMARY KEY (ID, NAME, TIME, NAME2, NAME3) ); DELIMITER $$; CREATE PROCEDURE InsertData() BEGIN DECLARE i INT DEFAULT 0; WHILE i < 10 DO INSERT INTO my_table (NAME, TIME, NAME2, NAME3, value) SELECT CONCAT('Name_', FLOOR(RAND() * 1000)) AS NAME, NOW() AS TIME, CONCAT('Name2_', FLOOR(RAND() * 100)) AS NAME2, CONCAT('Name3_', FLOOR(RAND() * 100)) AS NAME3, FLOOR(RAND() * 100) AS value FROM information_schema.COLUMNS LIMIT 3000; SET i = i + 1; END WHILE; END$$ DELIMITER ;$$ call InsertData; connect (con1,localhost,root,,); connection con1; use test; SET DEBUG_SYNC = 'row_log_table_apply1_before signal rebuild wait_for delete'; --send alter table my_table engine=innodb connection default; delete from my_table; SET DEBUG_SYNC= 'now wait_for rebuild'; SET DEBUG_SYNC= 'now signal delete'; connection con1; --reap connection default; drop table my_table; drop procedure InsertData; disconnect con1; =================== It does not cause a crash every time, because although an out-of-bounds read occurs, the memory read may have been allocated. Even if the wrong value is read, it wiil check out-of-memory in subsequent processing(if (next_mrec > mrec_end) return nullptr). It is best to add ut_a(mrec < mrec_end) after extra_size = *mrec++; in the code, or use valgrind, the exception will reappear quickly. // source code extra_size = *mrec++; ut_ad(mrec < mrec_end); mrec += extra_size; rec_offs_set_n_fields(offsets, new_index->n_uniq + 2); rec_deserialize_init_offsets(mrec, new_index, offsets); next_mrec = mrec + rec_offs_data_size(offsets); if (log->table->n_v_cols) { if (next_mrec + 2 > mrec_end) { return (nullptr); } next_mrec += mach_read_from_2(next_mrec); } // Even if the wrong value is read, after adding it, nullptr will be returned. if (next_mrec > mrec_end) { return (nullptr); } Suggested fix: add mrec out of bounds check in row_log_table_apply_op case ROW_T_DELETE: /* 1 (extra_size) + at least 1 (payload) */ if (mrec + 2 >= mrec_end) { return (nullptr); } extra_size = *mrec++; ut_ad(mrec < mrec_end); /* We assume extra_size < 0x100 for the PRIMARY KEY prefix. For fixed-length PRIMARY key columns, it is 0. */ mrec += extra_size; + if (mrec > mrec_end) { + return (nullptr); + } rec_offs_set_n_fields(offsets, new_index->n_uniq + 2); /* info bits is not recorded, see row_log_table_delete for more information. In fact, the new index has no instantly modified column, so in this function, the last param of rec_deserialize_init_offsets can be true or false(both are ok). */ rec_deserialize_init_offsets(mrec, new_index, offsets, true); next_mrec = mrec + rec_offs_data_size(offsets); if (log->table->n_v_cols) { if (next_mrec + 2 > mrec_end) { return (nullptr); }