Bug #95176 when search table in information_schema when delete table, there is a error
Submitted: 29 Apr 2019 7:03 Modified: 18 Jun 2019 12:15
Reporter: xiao Xiao Email Updates:
Status: Can't repeat Impact on me:
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.13 OS:CentOS
Assigned to: CPU Architecture:Any

[29 Apr 2019 7:03] xiao Xiao
when search table in information_schema when delete table, there is a error:
! ERROR 1049 (42000) at line 48: Unknown database ''

I debug the code and find race condition:

1)get_schema_tables_result get tableids, 
2)dd::cache::Storage_adapter::get   read table record with tableid.
it start a new transaction with read committed isolation:
  // Start a DD transaction to get the object.
  Transaction_ro trx(thd, isolation);

3) if the table is deleted between 1) and 2), there is a race condition and error happens.

I don't know why mysql need read the record again(after get tableids) and get this race condition.

error stack is as followed:
(gdb) bt
#0  Diagnostics_area::set_error_status (this=0x7ffeb0534760, mysql_errno=1049, message_text=0x7ffeb0532ea0 "Unknown database ''", returned_sqlstate=0x5f53b17 "42000")
    at /export/home/duyu/code/taurus-root/src/sql/sql/sql_error.cc:454
#1  0x0000000003070e8b in THD::raise_condition (this=0x7ffdcc0b1b20, sql_errno=1049, sqlstate=0x5f53b17 "42000", level=Sql_condition::SL_ERROR, msg=0x7ffeb0532ea0 "Unknown database ''",
    use_condition_handler=false) at /export/home/duyu/code/taurus-root/src/sql/sql/sql_class.cc:726
#2  0x0000000002f74ffa in my_message_sql (error=1049, str=0x7ffeb0532ea0 "Unknown database ''", MyFlags=0) at /export/home/duyu/code/taurus-root/src/sql/sql/mysqld.cc:3069
#3  0x00000000044edf07 in my_error (nr=1049, MyFlags=0) at /export/home/duyu/code/taurus-root/src/sql/mysys/my_error.cc:247
#4  0x0000000004236a16 in dd::cache::Dictionary_client::get_table_name_by_se_private_id (this=0x7ffdcc00ea50, engine="InnoDB", se_private_id=3268, schema_name=0x7ffeb0533990,
    table_name=0x7ffeb0533980) at /export/home/duyu/code/taurus-root/src/sql/sql/dd/impl/cache/dictionary_client.cc:1759
#5  0x00000000049a6fc3 in dd_table_open_on_id_low (thd=0x7ffdcc0b1b20, mdl=0x7ffeb05346f0, table_id=3268)
    at /export/home/duyu/code/taurus-root/src/sql/storage/innobase/dict/dict0dd.cc:420
#6  0x00000000049a7bb7 in dd_table_open_on_id (table_id=3268, thd=0x7ffdcc0b1b20, mdl=0x7ffeb05346f0, dict_locked=true, check_corruption=false)
    at /export/home/duyu/code/taurus-root/src/sql/storage/innobase/dict/dict0dd.cc:634
#7  0x00000000049b1941 in dd_process_dd_tables_rec_and_mtr_commit (heap=0x7ffdcc0180e8, rec=0x7fffc844219b "", table=0x7ffeb05346f8, dd_tables=0x7fffd8beeb38, mdl=0x7ffeb05346f0,
    mtr=0x7ffeb05341e0) at /export/home/duyu/code/taurus-root/src/sql/storage/innobase/dict/dict0dd.cc:4527
#8  0x00000000046977fa in i_s_innodb_tables_fill_table (thd=0x7ffdcc0b1b20, tables=0x7ffdcc019cc8) at /export/home/duyu/code/taurus-root/src/sql/storage/innobase/handler/i_s.cc:5287
#9  0x000000000317f44e in do_fill_table (thd=0x7ffdcc0b1b20, table_list=0x7ffdcc019cc8, qep_tab=0x7ffdcc01bbf0) at /export/home/duyu/code/taurus-root/src/sql/sql/sql_show.cc:4711
#10 0x000000000317f923 in get_schema_tables_result (join=0x7ffdcc01b260, executed_place=PROCESSED_BY_JOIN_EXEC) at /export/home/duyu/code/taurus-root/src/sql/sql/sql_show.cc:4824
#11 0x000000000315d591 in JOIN::prepare_result (this=0x7ffdcc01b260) at /export/home/duyu/code/taurus-root/src/sql/sql/sql_select.cc:1454
#12 0x0000000003097ff3 in JOIN::exec (this=0x7ffdcc01b260) at /export/home/duyu/code/taurus-root/src/sql/sql/sql_executor.cc:202
#13 0x000000000315bee7 in Sql_cmd_dml::execute_inner (this=0x7ffdcc01a4c8, thd=0x7ffdcc0b1b20) at /export/home/duyu/code/taurus-root/src/sql/sql/sql_select.cc:715
#14 0x000000000315b937 in Sql_cmd_dml::execute (this=0x7ffdcc01a4c8, thd=0x7ffdcc0b1b20) at /export/home/duyu/code/taurus-root/src/sql/sql/sql_select.cc:613


How to repeat:
reproduce steps:
there are 50 parallel test cases running and error happens randomly.
the test case is as followed:(different tablespaces and tables)

create tablespace `123456789012345678901234567890123456789012345678901234567890127` add datafile 'A027BCDEFGHIJA027BCDEFGHIJA027BCDEFGHIJA027BCDEFGHIJA027BCDEFGHIJA027BCDEFGHIJA027BCDEFGHIJA027BCDEFGHIJabcd>こん.ibd' file_block_size=8192 engine InnoDB;

--  检查表空间存在
select NAME,FLAG,ROW_FORMAT,PAGE_SIZE,ZIP_PAGE_SIZE,SPACE_TYPE,FS_BLOCK_SIZE,FILE_SIZE from information_schema.innodb_tablespaces where binary name='123456789012345678901234567890123456789012345678901234567890127';

--  检查文件存在
-- exec echo "Check file exist or not.."

--  开始往表空间中添加表
--  创建数据库
create database CREATE_TABLESPACE_027;

--  使用数据库

--  创建表
CREATE TABLE TABLE_TEST_01 (c1 INT PRIMARY KEY) TABLESPACE `123456789012345678901234567890123456789012345678901234567890127` ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
CREATE TABLE TABLE_TEST_02 (c1 INT PRIMARY KEY) TABLESPACE `123456789012345678901234567890123456789012345678901234567890127` ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

--  查看表的空间类型
--  ---------------------------------------------清理环境-------------------------------------------------
--  删除数据库

--  删除表空间
DROP TABLESPACE `123456789012345678901234567890123456789012345678901234567890127`;

and you can reproduce by gdb, block at bool Storage_adapter::get(THD *thd, const K &key, enum_tx_isolation isolation,
                          bool bypass_core_registry, const T **object) {

before   // Start a DD transaction to get the object.
  Transaction_ro trx(thd, isolation);

and then delete the table with this id. 
then continues.

Suggested fix:
I don't know why mysql need read the record again(after get tableids) and get this race condition.

this should not error.
[14 May 2019 14:16] MySQL Verification Team
Hi Mr. Xiao,

Thank you for your bug report.

We do need certain clarifications regarding the test case that you have provided.

First of all, would you please let us know which transactions are running in parallel, and which ones are not. This is especially necessary for the statement that is querying Information Schema. This is not visible from your test case.

Second, will your test case repeat if we would omit Chinese ideograms within table and tablespace names ???

Many thanks in advance.
[15 Jun 2019 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[29 Jul 2019 12:38] dave do
Xiao is my colleague, and we find that the issue still exist in 8.0.16.

Let me try to explain the issue race condition again.

First of all, before the query of information_schema start, there are already several tables belongs to different databases.
Then, the query start and in the function get_schema_tables_result, here all record of mysql.tables will fetched and SE_PRIVATE_ID got from the record in dd_process_dd_tables_rec_and_mtr_commit.
With the SE_PRIVATE_ID and the query drill down to function dd::cache::Dictionary_client::get_table_name_by_se_private_id. In this function, table object got succeed, because in the test case, table not dropped explicitly. And then will get schema object by tab_obj->schema_id(), find out the sch_obj is nullptr.
The main reason we get an invalid sch_obj, is because we get it with Dictionary_client::acquire_uncached and a new Transaction_ro trx started (Storage_adapter::get) in it with a ISO_READ_COMMITTED level. So if the schema dropped by another session (not the information_schema session), the error will occur.

For more information about re-produce this issue, you can add a DBUG_EXECUTE_IF to sleep 10ms before Transaction_ro trx start to make a GAP to increase the probability to re-produce the issue.
[29 Jul 2019 12:45] MySQL Verification Team
Hi Mr. Do,

Thank you for your comment. However, .....

First of all, our questions from the previous comments were not answered at all.

Second, we do require a repeatable test case, that will show the problem, without changing anything in our code.

Last, but not least, you should make tests ONLY with our 8.0 release, which is 8.0.17.