Bug #86607 InnoDB crashed when master thread evict dict_table_t object
Submitted: 7 Jun 2017 9:38 Modified: 7 Nov 2017 17:35
Reporter: jianwei zhao Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:MySQL 5.6, 5.7, 5.6.36, 5.7.18 OS:Red Hat
Assigned to: CPU Architecture:Any

[7 Jun 2017 9:38] jianwei zhao
Description:
Hi, dear friends

I have encountered InnoDB crash on production environment with MySQL 5.6

1.  the core file backstrace as below:

  ut_ad(dict_index_is_clust(index))
  row_merge_drop_indexes 
  dict_table_remove_from_cache_low
  dict_make_room_in_cache () 
  srv_master_evict_from_table_cache () 
  srv_master_do_idle_tasks ()
  srv_master_thread (arg=0x0)
  start_thread (arg=0x2b21c8200700) 
  clone () 

GDB the core file, we find the table->indexes == 0, so within function row_merge_drop_indexes() it crashed.

2. We have tried to replay this crash, but cann't achieve a complete mtr testcase,so give this analysis:

PREPARE:
	create table t(id int primary key, col1 int);
	insert into t values(1, 2);
	insert into t values(2, 2);

SESSION 1:  select * from t;
	In order to cache dict_table_t, increment table->n_ref_count=1;

SESSION 2: alter table t add unique key (col1);
	The statement will failed because of duplicate entry, then will rollback the operation like below:
		row_merge_drop_indexes()
		innobase_rollback_sec_index()
		rollback_inplace_alter_table() 

	Since table->n_ref_count > 1, then table->drop_abort will be set TRUE, it means that there will be lazy drop delay to dict_table_close() or next prepare_inplace_alter_table();

SESSION 3: flush tables
	In order to decrement table->n_ref_count to 0 and evict dict_table_t object.

	When table->n_ref_count == 0, it can be evict from dictionary cache through dict_table_remove_from_cache_low() by master thread.
	Firstly call dict_index_remove_from_cache_low() to remove all indexes.
	Secondly found table->drop_aborted is TRUE, then row_merge_drop_indexes() to remove aborted indexes.
	Since already removed all indexes, the second step crashed.

3.  Search the InnoDB codes, we found a problem on dict_table_close():

As comments described, when table->n_ref_count decrement to 0, dict_table_close() will help dropping aborted indexes, but the function code has comfused, like this:

dict_table_close():
 
   --table->n_ref_count;
   ......
	if (!dict_locked) {
		ibool 	drop_aborted;

		drop_aborted = try_drop
			&& table->drop_aborted
			&& table->n_ref_count == 1
			&& dict_table_get_first_index(table);
		
		mutex_exit(&dict_sys->mutex);
		if (drop_aborted) {
			dict_table_try_drop_aborted(NULL, table_id, 0);
}

Firstly decrement table->n_ref_count, but the drop_aborted state need table->n_ref_count == 1, but when call dict_table_try_drop_aborted(), the code again need the expect table->n_ref_count == 0,  a little contradictory.

so what's the truth?  special intention or just mistake? 

How to repeat:
as above analysis.
[7 Jun 2017 13:48] Sinisa Milivojevic
Hi,

Sorry, we can not accept test cases which contain changes in the values of our variables, fields , pointers , classes etc ....... I can create million different crashes , simply by changing any variable.

What we require is a test case containing only configuration, SQL commands and even operating systems or hardware. 

Waiting for the proper test case.
[7 Jun 2017 15:30] jianwei zhao
Sorry for the uncomplete bug report.

I have tried the test case,  but maybe DBUG_SYNC didn't work in InnoDB Engine, so I just add os_thread_sleep() to create a scenario:

before ALTER  ADD UNIQUE KEY statement begin to rollback, launch a SELECT statement.
so row_merge_drop_indexes() will keep the table->drop_aborted values as TRUE because of table->n_ref_count > 1.
At last when master thread evict this dict_table_t, it crashed.

The MySQL instance crashed every hour on production,  I didn't change any pointer or variable to mock it,  then I will continue to try the real test case to generate bug,
Very sorry to trouble you.
[7 Jun 2017 17:34] Sinisa Milivojevic
Hi!

I can provide you with few more pointers.

First, you need to determine whether rit is an assert() or a crash. Both lead to service being stopped, only first one is intentional.

Second, try running , what  ever it is that you run on some high quality hardware. Commodity hardware, very often have random glitches that leads to situations like your, under very high load, because lots of commodity hardware simply can't handle it.

Third, run debug binaries and try to see what statement was each of the threads running.

Fourth, use the latest release available for the  MySQL version that you are using.

Fifth, use our binaries.

There i no need to  apologise. If what you experience is some new bug, that would lead to the improvement of our product.
[8 Jun 2017 3:24] jianwei zhao
Hi,

Here are all the information about this case:

1. Test on MySQL 5.6.35

2. DBUG version will assert and Release version will crash.

3. the core backtrace:

#3  <signal handler called>
#4  0x00002b65596248a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#5  0x00002b6559626085 in abort () at abort.c:92
#6  0x00000000010deabe in dict_index_is_clust (index=0x0) at /u01/jianwei.zhao/mysql-server/storage/innobase/include/dict0dict.ic:269
#7  0x00000000010f1efb in row_merge_drop_indexes (trx=0x2b656c027b28, table=0x2b65840323e8, locked=1) at /u01/jianwei.zhao/mysql-server/storage/innobase/row/row0merge.cc:2880
#8  0x00000000012f41ea in dict_table_remove_from_cache_low (table=0x2b65840323e8, lru_evict=1) at /u01/jianwei.zhao/mysql-server/storage/innobase/dict/dict0dict.cc:2109
#9  0x00000000012efbdd in dict_make_room_in_cache (max_tables=400, pct_check=100) at /u01/jianwei.zhao/mysql-server/storage/innobase/dict/dict0dict.cc:1446
#10 0x0000000001197cad in srv_master_evict_from_table_cache (pct_check=100) at /u01/jianwei.zhao/mysql-server/storage/innobase/srv/srv0srv.cc:2012
#11 0x00000000011988ff in srv_master_do_idle_tasks () at /u01/jianwei.zhao/mysql-server/storage/innobase/srv/srv0srv.cc:2207
#12 0x000000000119930f in srv_master_thread (arg=0x0) at /u01/jianwei.zhao/mysql-server/storage/innobase/srv/srv0srv.cc:2355
#13 0x00002b65583f5851 in start_thread (arg=0x2b6560895700) at pthread_create.c:301
#14 0x00002b65596d967d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

4. For this case, I add some sleep() function,  the diff like below:
    
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index 41c767a..bfd7102 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -6779,6 +6779,7 @@ no_commit:

 		build_template(true);
 	}
+	os_thread_sleep(5000000);

 	innobase_srv_conc_enter_innodb(prebuilt->trx);

diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc
index e772208..dea7696 100644
--- a/storage/innobase/handler/handler0alter.cc
+++ b/storage/innobase/handler/handler0alter.cc
@@ -4138,6 +4138,7 @@ rollback_inplace_alter_table(
 		(almost) nothing has been or needs to be done. */
 		goto func_exit;
 	}
+	os_thread_sleep(2000000);

 	row_mysql_lock_data_dictionary(ctx->trx);

5. the testcases as below(Maybe not very stable):

connect(con1,localhost,root,,);
connect(con2,localhost,root,,);
let $MYSQLD_DATADIR= `SELECT @@datadir`;

connection con1;
set global table_definition_cache=400;
set global table_open_cache= 1024;

use test;
create table t_bug(id int primary key, col1 int) engine=innodb;
insert into t_bug values(1, 2);
insert into t_bug values(2, 2);
commit;

####
#Sleep 2s to wait INSERT statement increment table->n_ref_count.
####
--send
alter table t_bug add unique key(col1);

connection con2;
use test;
select * from t_bug;

####
#Sleep 5s to wait ALTER statement complete.
####
insert into t_bug values(3, 2);
commit;

####
#So ALTER statement failed, left table->drop_abort=true;
####
connection con1;
--error 1062
--reap

disconnect con2;

connection con1;
select sleep(10);

####
#Flush tables, so t_bug table can be evicted by master thread.
####
flush tables;

####
#Create 1000 tables, try the best to evict t_bug since table_def_size.
####
--disable_query_log
let $loop=1000;
while($loop)
{
  eval create table t_$loop(id int)engine=innodb;
  dec $loop;
}

select sleep(10);

let $loop=1000;
while($loop)
{
  eval drop table t_$loop;
  dec $loop;
}

--enable_query_log
set global table_definition_cache=default;
set global table_open_cache= default;
drop table t_bug;

--exec grep 't_bug' $MYSQLD_DATADIR/../../log/mysqld.1.err

disconnect con1;
[8 Jun 2017 12:11] Sinisa Milivojevic
Hi,

Once again, we can not accept test cases which include changing values in debugger or changing our own code. We can only accept test cases without any changes to the code or to the values of variables during runtime.

Also, we never encountered a stack trace like the one you posted, so I would recommend to you that you follow other advices that I have provided to you.
[9 Jun 2017 2:28] zhai weixiang
I think this bug can be verified by just reviewing the code in function dict_table_remove_from_cache_low 

all code is quoted from latest MySQL8.0 on github:

dict_table_remove_from_cache_low :

1) Firstly it will free all dict_index_t objects of the table 

2158         /* Remove the indexes from the cache */
2159
2160         for (index = UT_LIST_GET_LAST(table->indexes);
2161              index != NULL;
2162              index = UT_LIST_GET_LAST(table->indexes)) {
2163
2164                 dict_index_remove_from_cache_low(table, index, lru_evict);
2165         }

2) But later if dict_table_t::drop_aborted is set, then row_merge_drop_indexes is invoked. 

2190         if (lru_evict && table->drop_aborted) {
2191                 /* Do as dict_table_try_drop_aborted() does. */
2192
2193                 trx_t* trx = trx_allocate_for_background();
2194
2195                 ut_ad(mutex_own(&dict_sys->mutex));
2196                 ut_ad(rw_lock_own(dict_operation_lock, RW_LOCK_X));
2197
2198                 /* Mimic row_mysql_lock_data_dictionary(). */
2199                 trx->dict_operation_lock_mode = RW_X_LATCH;
2200
2201                 trx_set_dict_operation(trx, TRX_DICT_OP_INDEX);
2202
2203                 /* Silence a debug assertion in row_merge_drop_indexes(). */
2204                 ut_d(table->acquire());
2205                 row_merge_drop_indexes(trx, table, TRUE);
2206                 ut_d(table->release());
2207                 ut_ad(table->get_ref_count() == 0);
2208                 trx_commit_for_mysql(trx);
2209                 trx->dict_operation_lock_mode = 0;
2210                 trx_free_for_background(trx);

3) Let's look into function row_merge_drop_indexes. It assumes that dict_index_t exists but actually all index objects of the table has been freed.  

3496
3497         index = dict_table_get_first_index(table);
3498         ut_ad(dict_index_is_clust(index));
3499         ut_ad(dict_index_get_online_status(index) == ONLINE_INDEX_COMPLETE);
[9 Jun 2017 6:43] jianwei zhao
Hi,  Sinisa

I have checked the source codes again,  found two commit:

commit 6dd8e27ba69934e6aefbf8eb00f8cc24b43f3386
Author: Marko Mäkelä <marko.makela@oracle.com>
Date:   Wed Feb 1 16:09:34 2012 +0200

    InnoDB: WL#5526 online ADD INDEX, WL#5548 online DROP INDEX

    === First part: Replace the old ALTER TABLE API (WL#5534).

commit e2e2985238ea30ef9f51fd37cc577da5b61b4819
Author: Marko Mäkelä <marko.makela@oracle.com>
Date:   Mon Feb 6 11:07:16 2012 +0200

    WL#5526 non-functional clean-up after review comments from Sunny Bains

    dict_table_close(): Determine if indexes can be dropped only if !dict_locked.

In the second commit, there is a piece of code changes, as below:

@@ -463,8 +464,6 @@ dict_table_close(
                                        indexes after an aborted online
                                        index creation */
 {
-       ibool           drop_aborted;
-
        if (!dict_locked) {
                mutex_enter(&dict_sys->mutex);
        }
@@ -472,11 +471,6 @@ dict_table_close(
        ut_ad(mutex_own(&dict_sys->mutex));
        ut_a(table->n_ref_count > 0);

-       drop_aborted = try_drop
-               && table->drop_aborted
-               && table->n_ref_count == 1
-               && dict_table_get_first_index(table);
-
        --table->n_ref_count;

        MONITOR_DEC(MONITOR_TABLE_REFERENCE);
@@ -492,7 +486,13 @@ dict_table_close(
 #endif /* UNIV_DEBUG */

        if (!dict_locked) {
-               table_id_t      table_id = table->id;
+               table_id_t      table_id        = table->id;
+               ibool           drop_aborted;
+
+               drop_aborted = try_drop
+                       && table->drop_aborted
+                       && table->n_ref_count == 1
+                       && dict_table_get_first_index(table);

                mutex_exit(&dict_sys->mutex);

I guess it is the reason why caused the InnoDB crash, the judgement logic code has changed position within function dict_table_close();

Also, I must mention a point,  InnoDB has used a lot of sleep() function to test. like this:

diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
index c3a7e2c..eaba982 100644
--- a/storage/innobase/row/row0mysql.cc
+++ b/storage/innobase/row/row0mysql.cc
@@ -2676,6 +2676,10 @@ loop:
                return(n_tables + n_tables_dropped);
        }

+       DBUG_EXECUTE_IF("row_drop_tables_in_background_sleep",
+               os_thread_sleep(5000000);
+       );
+

why do you think it is not proper that I used os_thread_sleep() to reproduce this cases?
[9 Jun 2017 13:59] Sinisa Milivojevic
Hi!

First of all, you have reported a bug in the versions 5.6 and 5.7, while you base your analysis on  the source from the version 8.0. You can do that if you are reporting a bug in the version 8.0.

Next, your analysis of the code does not prove anything wrong. What is it that our Makela has done wrong ???

Third, we do add sleep for testing, but it is done by developers who know EXACTLY what they are doing. Next, you are pointing to the macro which is executed only if some conditions are met, while your additions to the code are all unconditional.  We do not consider users as versed in our design and code as our developers. Do you think otherwise ???

Last, but not least, if you are so convinced that 5.6 and 5.7 contained the described bug, why don't you execute the debug binaries of the versions that you are using. If a crash happens on an unchanged code, with debug binary, we will be more then happy to try to repeat the test case. Once again, test case must not involve changing the values of variables in the runtime, nor changing our code.
[9 Jun 2017 14:53] Mark Callaghan
If this is how InnoDB bugs are to be handled by MySQL support in the future then this is great news for MyRocks.
[9 Jun 2017 14:54] Mark Callaghan
Note that InnoDB uses os_thread_sleep in production code. Why is it wrong for the repro case to use it to hit the bug faster. See https://bugs.mysql.com/bug.php?id=68588
[9 Jun 2017 17:47] Jeremy Cole
This is utterly shameful handling of a bug report by a clearly competent user/developer. Is the *goal* here to drive away competent users?
[9 Jun 2017 18:00] Nicolas Noble
Any production-grade server code needs to properly behave with antagonist processes, much like FCC rule 15 where electronic devices need to be able to cope with interference from other sources.

A simple yet good method to simulate kernel's arbitrary preemption in the presence of antagonist processes is to introduce sleeps in critical locations. This is basic industry practice.
[9 Jun 2017 18:56] Miguel Solorzano
Thank you for the feedback and patient. Sorry for the way it was handled the issue and we will discuss we our development team the best way to check this bug report.
[12 Jun 2017 5:39] Umesh Shastry
Thank you for the report and test case.
Observed that 5.6.36 release/debug build is affected.

regards,
Umesh
[12 Jun 2017 5:41] Umesh Shastry
test results

Attachment: 86607_5.6.36.results (application/octet-stream, text), 66.68 KiB.

[12 Jun 2017 5:54] Umesh Shastry
test results

Attachment: 86607_5.7.18.results (application/octet-stream, text), 44.08 KiB.

[16 Oct 2017 6:23] jianwei zhao
Hi, dear friend  Upload the fixed DIFF file that we fixed on our production.

Attachment: fix.diff (application/octet-stream, text), 1.25 KiB.

[16 Oct 2017 6:33] Umesh Shastry
Thank you for attaching patch to the report. 
Please note that in order to submit contributions you must first sign the Oracle Contribution Agreement (OCA). For additional information please check http://www.oracle.com/technetwork/community/oca-486395.html.
If you have any questions, please contact the MySQL community team - https://www.mysql.com/about/contact/?topic=community

Regards,
Umesh
[7 Nov 2017 17:35] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.6.39, 5.7.21, 8.0.4 release, and here's the changelog entry:

 A FLUSH TABLES operation failed to drop an aborted index. While removing
the table from the cache, the clustered index was dropped prior to
checking for the aborted index.