Description:
My mysql is crashed, here is the stack:
[ERROR] [MY-013183] [InnoDB] Assertion failure: btr0sea.cc:1236:old_index != nullptr thread 140387008116480
UTC - mysqld got signal 6 ;
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x43) [0x4adbc47]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(print_fatal_signal(int)+0x3a2) [0x36bc24d]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(my_server_abort()+0x6b) [0x36bc4ff]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(my_abort()+0xd) [0x4ad2623]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x1d1) [0x4edabd1]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(btr_search_set_block_not_cached(buf_block_t*)+0x72) [0x4f6f0fc]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(buf_pool_clear_hash_index()+0x317) [0x4f854c1]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(btr_search_disable()+0xfc) [0x4f6c805]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld() [0x4f83406]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(buf_resize_thread()+0x13e) [0x4f8512b]
How to repeat:
In release mode, it will occasionally trigger this problem, I analyzed the code, constructed a debug mode use case, so that you can quickly repeat the problem.
You need to compile my code from ahi_crash.diff in debug mode, then execute the test case 'innodb.ahi_crash_ut_a_old_index'.
innodb.ahi_crash_ut_a_old_index is also in ahi_crash.diff.
The test case uses some special tricks, using the innodb_lru_free_buffer_pool variable to clear the buffer pool and simulate the process of lru free page.
I will upload the ahi_crash.diff later.
Suggested fix:
The code of crash is here:
buf_resize_thread
|-> buf_pool_resize
| |-> btr_search_disable
| | |-> buf_pool_clear_hash_index
| | | |-> btr_search_set_block_not_cached
| | | | |-> ut_a(old_index != nullptr); // crash
buf_pool_clear_hash_index function behaves as follows:
buf_pool_clear_hash_index {
...
for (ulong p = 0; p < srv_buf_pool_instances; p++) {
while (--chunk >= chunks) {
for (; i--; block++) {
// step1
if (block->ahi.index.load() == nullptr) continue;
mutex_enter(&block->mutex);
btr_search_set_block_not_cached(block);
// step2
|-> ut_a(old_index != nullptr); // crash
}
}
}
}
Consider the buf pool clear hash index function:
1. Iterate over the bp page. If block->ahi.index.load() == nullptr, skip the page.
2. If block->ahi.index is not null, add a block mute, then enter the btr_search_set_block_not_cached function and find that block->ahi.index has been set to null. Trigger ut_a(old_index!) = nullptr);
You can see that between the step1 and step2 above, the ahi page was cleared by another thread (like freed by the lru), resulting in a crash.