| Bug #96224 | The MTR case innodb.innodb_corrupt_bit get a randomly coredump | ||
|---|---|---|---|
| Submitted: | 17 Jul 2019 3:12 | Modified: | 21 Oct 2019 12:21 |
| Reporter: | shangshang yu | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S1 (Critical) |
| Version: | 8.0.16 | OS: | CentOS |
| Assigned to: | CPU Architecture: | x86 | |
| Tags: | core dump, mtr | ||
[17 Jul 2019 10:40]
shangshang yu
change Severity
[18 Jul 2019 9:30]
MySQL Verification Team
Hello Shangshang yu, Thank you for the report. Verified as described with 8.0.16 source build(after patching to reproduce). regards, Umesh
[18 Jul 2019 9:31]
MySQL Verification Team
Test results - 8.0.16
Attachment: 96224_8.0.16.results (application/octet-stream, text), 187.03 KiB.
[12 Aug 2019 14:59]
Daniel Price
Posted by developer: Fixed as of the upcoming 8.0.18 release, and here's the changelog entry: It was possible for a corrupted table to be removed from the table cache before the reference count for the table reached zero.
[21 Oct 2019 12:21]
shangshang yu
I noticed that the solution for tha issue:
commit id: 2fa2c6729cce71e616a82fb3156215f6f8ffcc8f
/* Optionally remove this corrupted table from cache now
if no other thread is still using it. If not, the corrupted bit
will keep it from being used.*/
if (ib_table->get_ref_count() == 0) {
dict_table_remove_from_cache(ib_table);
}
I agree the fix, but I have another question:
Should we take the partition table into consider? The partition table have the same operations as normal table:
Ha_innopart_share::open_one_table_part(...){
...
if (part_table->is_corrupted()) {
dict_table_remove_from_cache(part_table);
part_table = nullptr;
} else if (part_table->discard_after_ddl) {
btr_drop_ahi_for_table(part_table);
dict_table_remove_from_cache(part_table);
part_table = nullptr;
cached = false;
} else {
if (!dd_table_match(part_table, dd_part)) {
dict_set_corrupted(part_table->first_index());
dict_table_remove_from_cache(part_table);
part_table = NULL;
} else {
part_table->acquire_with_lock();
}
}
...
}
[13 May 2022 6:43]
chengqing hu
Hi shangshang, I'm getting an error in 8.0.21, that looks like a problem with the partition table you said at the end. the error log: 2022-03-04T00:52:09.901886+08:00 602854 [ERROR] [MY-013183] [InnoDB] Assertion failure: dict0dict.cc:1888:table->get_ref_count() == 0 thread 140501494646528 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 16:52:09 UTC - mysqld got signal 6 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. Thread pointer: 0x7fc4740197f0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fc90dadbdc0 thread_stack 0x40000 /usr/local/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x1f40a3e] /usr/local/mysql/bin/mysqld(handle_fatal_signal+0x323) [0xfd02b3] /lib64/libpthread.so.0(+0xf630) [0x7fd23e3cc630] /lib64/libc.so.6(gsignal+0x37) [0x7fd23c3e3387] /lib64/libc.so.6(abort+0x148) [0x7fd23c3e4a78] /usr/local/mysql/bin/mysqld() [0xd4787a] /usr/local/mysql/bin/mysqld() [0x22e0330] /usr/local/mysql/bin/mysqld(dict_partitioned_table_remove_from_cache(char const*)+0xe9) [0x22e0929] /usr/local/mysql/bin/mysqld() [0x2015046] /usr/local/mysql/bin/mysqld() [0xf31164] /usr/local/mysql/bin/mysqld(mysql_alter_table(THD*, char const*, char const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*)+0x707f) [0xf45b5f] /usr/local/mysql/bin/mysqld(Sql_cmd_alter_table::execute(THD*)+0x481) [0x12c2a61] /usr/local/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x18ac) [0xea28cc] /usr/local/mysql/bin/mysqld(mysql_parse(THD*, Parser_state*)+0x348) [0xea69e8] /usr/local/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x15eb) [0xea844b] /usr/local/mysql/bin/mysqld(do_command(THD*)+0x174) [0xea99c4] /usr/local/mysql/bin/mysqld() [0xfc2058] /usr/local/mysql/bin/mysqld() [0x245661c] /lib64/libpthread.so.0(+0x7ea5) [0x7fd23e3c4ea5] /lib64/libc.so.6(clone+0x6d) [0x7fd23c4ab8dd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7fc474055f68): ALTER TABLE DTF_TRUNK_INFO ADD PARTITION (PARTITION p20220305 VALUES LESS THAN (UNIX_TIMESTAMP('2022-03-06'))) Connection ID (thread ID): 602854 Status: KILL_QUERY
[16 Jun 2022 3:01]
chengqing hu
./mtr my1 result
Attachment: my1.result (application/octet-stream, text), 6.90 KiB.

Description: When execute the innodb.innodb_corrupt_bit, a coredump occurred: (gdb) bt #0 0x00002b9cdeb9a9d1 in pthread_kill () from /lib64/libpthread.so.0 #1 0x00000000045af810 in my_write_core (sig=6) at /root/software/mysql-8.0.16/mysys/stacktrace.cc:284 #2 0x0000000003312551 in handle_fatal_signal (sig=6) at /root/software/mysql-8.0.16/sql/signal_handler.cc:168 #3 <signal handler called> #4 0x00002b9ce0798207 in raise () from /lib64/libc.so.6 #5 0x00002b9ce07998f8 in abort () from /lib64/libc.so.6 #6 0x00000000049ccb50 in ut_dbg_assertion_failed (expr=0x6377ea6 "table->get_ref_count() == 0", file=0x63775c0 "/root/software/mysql-8.0.16/storage/innobase/dict/dict0dict.cc", line=1885) at /root/software/mysql-8.0.16/storage/innobase/ut/ut0dbg.cc:92 #7 0x0000000004ae089f in dict_table_remove_from_cache_low (table=0x2b9d843af4d8, lru_evict=0) at /root/software/mysql-8.0.16/storage/innobase/dict/dict0dict.cc:1885 #8 0x0000000004ae1007 in dict_table_remove_from_cache (table=0x2b9d843af4d8) at /root/software/mysql-8.0.16/storage/innobase/dict/dict0dict.cc:1968 #9 0x00000000046ceb14 in ha_innobase::open (this=0x2b9d84fab318, name=0x2b9d8414e618 "./corrupted/t1", open_flags=2, table_def=0x2b9d84153538) at /root/software/mysql-8.0.16/storage/innobase/handler/ha_innodb.cc:6688 #10 0x00000000034a5744 in handler::ha_open (this=0x2b9d84fab318, table_arg=0x2b9d841640a0, name=0x2b9d8414e618 "./corrupted/t1", mode=2, test_if_locked=2, table_def=0x2b9d84153538) at /root/software/mysql-8.0.16/sql/handler.cc:2656 #11 0x0000000003296031 in open_table_from_share (thd=0x2b9d84025050, share=0x2b9d8414e2b8, alias=0x2b9d84fa4840 "t1", db_stat=39, prgflag=8, ha_open_flags=0, outparam=0x2b9d841640a0, is_create_table=false, table_def_param=0x2b9d84153538) at /root/software/mysql-8.0.16/sql/table.cc:3164 #12 0x0000000003099613 in open_table (thd=0x2b9d84025050, table_list=0x2b9d84fa4848, ot_ctx=0x2b9d13ee3ab0) at /root/software/mysql-8.0.16/sql/sql_base.cc:3345 #13 0x000000000309d0cf in open_and_process_table (thd=0x2b9d84025050, lex=0x2b9d8401d550, tables=0x2b9d84fa4848, counter=0x2b9d13ee587c, prelocking_strategy=0x2b9d13ee5880, has_prelocking_list=false, ot_ctx=0x2b9d13ee3ab0) at /root/software/mysql-8.0.16/sql/sql_base.cc:5001 #14 0x000000000309e51a in open_tables (thd=0x2b9d84025050, start=0x2b9d13ee3b88, counter=0x2b9d13ee587c, flags=0, prelocking_strategy=0x2b9d13ee5880) at /root/software/mysql-8.0.16/sql/sql_base.cc:5644 #15 0x000000000320d295 in mysql_alter_table (thd=0x2b9d84025050, new_db=0x2b9d84fa4de8 "corrupted", new_name=0x2b9d84fa45f0 "t1", create_info=0x2b9d13ee61e0, table_list=0x2b9d84fa4848, alter_info=0x2b9d13ee6090) at /root/software/mysql-8.0.16/sql/sql_table.cc:14839 #16 0x0000000003704756 in Sql_cmd_create_or_drop_index_base::execute (this=0x2b9d84fa4f58, thd=0x2b9d84025050) at /root/software/mysql-8.0.16/sql/sql_cmd_ddl_table.cc:378 #17 0x0000000003151f3c in mysql_execute_command (thd=0x2b9d84025050, first_level=true) at /root/software/mysql-8.0.16/sql/sql_parse.cc:3413 #18 0x0000000003157ce0 in mysql_parse (thd=0x2b9d84025050, parser_state=0x2b9d13ee7b20) at /root/software/mysql-8.0.16/sql/sql_parse.cc:5198 #19 0x000000000314d1ad in dispatch_command (thd=0x2b9d84025050, com_data=0x2b9d13ee8bd0, command=COM_QUERY) at /root/software/mysql-8.0.16/sql/sql_parse.cc:1746 #20 0x000000000314b73a in do_command (thd=0x2b9d84025050) at /root/software/mysql-8.0.16/sql/sql_parse.cc:1264 #21 0x00000000032fdc91 in handle_connection (arg=0x83d16e0) at /root/software/mysql-8.0.16/sql/conn_handler/connection_handler_per_thread.cc:302 #22 0x0000000004cbfe47 in pfs_spawn_thread (arg=0x843cdb0) at /root/software/mysql-8.0.16/storage/perfschema/pfs.cc:2836 #23 0x00002b9cdeb95dd5 in start_thread () from /lib64/libpthread.so.0 #24 0x00002b9ce085fead in clone () from /lib64/libc.so.6 It occurred when execute innodb.innodb_corrupt_bit at line 137. 103 CREATE DATABASE corrupted; 104 use corrupted; 105 106 CREATE TABLE t1 ( 107 a INT NOT NULL PRIMARY KEY, 108 b INT, 109 c CHAR(10), 110 v VARCHAR(100), 111 t TEXT, 112 p POINT NOT NULL); 113 114 CREATE TABLE t2 LIKE t1; 115 CREATE TEMPORARY TABLE t3 LIKE t2; 116 117 CREATE FULLTEXT INDEX ft ON t2(t); 118 CREATE INDEX idx_c ON t3(c); 119 120 INSERT INTO t1 VALUES (1, 2, 'aaa', 'abcdefghij', 'This is a', st_pointfromtext('POINT(0 0)')), 121 (2, 4, 'bb', 'qwerty', 'test case', st_pointfromtext('POINT(1 1)')), 122 (3, 6, 'ccccc', 'poiuy', 'for corrupted index', st_pointfromtext('POINT(2 2)')); 123 124 INSERT INTO t2 SELECT * FROM t1; 125 INSERT INTO t3 SELECT * FROM t1; 126 127 SELECT count(*) FROM t1; 128 SELECT count(*) FROM t2; 129 SELECT count(*) FROM t3; 130 131 --echo # Marking clustered index as corrupted results in ER_NO_SUCH_TABLE 132 set debug = "+d, dict_set_clust_index_corrupted"; 133 CHECK TABLE t1; 134 set debug = "-d, dict_set_clust_index_corrupted"; 135 136 --error ER_NO_SUCH_TABLE 137 CREATE INDEX b ON t1(b); The Reason: The assert ut_a(table->get_ref_count() == 0) want the dict_table_t::n_ref_count == 0, but n_ref_count=1. (gdb) f 7 #7 0x0000000004ae089f in dict_table_remove_from_cache_low (table=0x2b9d843af4d8, lru_evict=0) at /root/software/mysql-8.0.16/storage/innobase/dict/dict0dict.cc:1885 1885 ut_a(table->get_ref_count() == 0); (gdb) p table->n_ref_count $1 = { <std::__atomic_base<unsigned long>> = { static _S_alignment = 8, _M_i = 1 }, <No data fields>} and there are three function can change dict_table_t::n_ref_count. 1. dict_table_t::acquire : ++n_ref_count 2. dict_table_t::acquire_with_lock : ++n_ref_count 3. dict_table_t::release : --n_ref_count There are two thread acquire the dict_sys->mutex: thread-1: At dict0stats_bg.cc:328, the dict_stats_thread acquire the dict_sys->mutex, static void dict_stats_process_entry_from_recalc_pool(THD *thd) { 289 table = dd_table_open_on_id(table_id, thd, &mdl, true, true); ... 328 mutex_enter(&dict_sys->mutex); 329 330 /* Set back bg flag */ 331 table->stats_bg_flag = BG_STAT_NONE; 332 333 mutex_exit(&dict_sys->mutex); 334 335 /* This call can't be moved into dict_sys->mutex protection, 336 since it'll cause deadlock while release mdl lock. */ 337 dd_table_close(table, thd, &mdl, false); } thread-2: At ha_innodb.cc:6684 also acquire the dict_sys->mutex. int ha_innobase::open(const char *name, int, uint open_flags, const dd::Table *table_def) { ... 6683 if (ib_table == NULL) { 6684 mutex_enter(&dict_sys->mutex); 6685 ib_table = dict_table_check_if_in_cache_low(norm_name); 6686 if (ib_table != nullptr) { 6687 if (ib_table->is_corrupted()) { 6688 dict_table_remove_from_cache(ib_table); 6689 ib_table = nullptr; } The thread-1 call dd_table_open_on_id at dict0stats_bg.cc:289 and call dict_table_t::acquire_with_lock make n_ref_count++. If thread-1 acquired the dict_sys->mutex first, it will release n_ref_count when call dd_table_close (dict0stats_bg.cc:337), it is OK. If thread-2 acquired the dict_sys->mutex first, the thread-1 will wait for dict_sys->mutex and cannot release n_ref_count, so the n_ref_count is 1. thread-2 will call dict_table_remove_from_cache at ha_innodb.cc:6688 since the index_corrupt was injected at innodb_corrupt_bit.test:132, and the coredump occurred. How to repeat: In order to let thread-2 acquire the dict_sys->mutex first, let thread-1 sleep a while before acquire the dict_sys->mutex. if (my_strcasecmp(system_charset_info, table->name.m_name, "corrupted/t1") == 0){ os_thread_sleep(1000000); } mutex_enter(&dict_sys->mutex); /* Set back bg flag */ table->stats_bg_flag = BG_STAT_NONE; mutex_exit(&dict_sys->mutex); then execute the case innodb_corrupt_bit can repeat. Suggested fix: 6686 if (ib_table != nullptr) { bool ib_table_not_removed = false; 6687 if (ib_table->is_corrupted()) { if (table->get_ref_count() == 0 && table->n_rec_locks == 0) { 6688 dict_table_remove_from_cache(ib_table); } else { ib_table_not_removed = true; } 6689 ib_table = nullptr; 6690 cached = true; 6740 /* ib_table could be freed, reset the index_mapping */ 6741 if (ib_table == nullptr && m_share->idx_trans_tbl.index_count > 0 && !ib_table_not_removed) { 6742 ut_free(m_share->idx_trans_tbl.index_mapping); 6743 m_share->idx_trans_tbl.index_mapping = nullptr; 6744 m_share->idx_trans_tbl.index_count = 0; 6745 m_share->idx_trans_tbl.array_size = 0; 6746 }