MySQL Bugs: #116793: AHI crash : ut_a(old

Bug #116793	AHI crash : ut_a(old_index != nullptr)
Submitted:	27 Nov 2024 3:51	Modified:	27 Nov 2024 7:09
Reporter:	Ke Yu (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	8.0.40	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
My mysql is crashed, here is the stack:
[ERROR] [MY-013183] [InnoDB] Assertion failure: btr0sea.cc:1236:old_index != nullptr thread 140387008116480
UTC - mysqld got signal 6 ;
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x43) [0x4adbc47]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(print_fatal_signal(int)+0x3a2) [0x36bc24d]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(my_server_abort()+0x6b) [0x36bc4ff]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(my_abort()+0xd) [0x4ad2623]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x1d1) [0x4edabd1]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(btr_search_set_block_not_cached(buf_block_t*)+0x72) [0x4f6f0fc]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(buf_pool_clear_hash_index()+0x317) [0x4f854c1]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(btr_search_disable()+0xfc) [0x4f6c805]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld() [0x4f83406]
/flash4/yk_data/mysql-server/runtime_output_directory/mysqld(buf_resize_thread()+0x13e) [0x4f8512b]

How to repeat:
In release mode, it will occasionally trigger this problem, I analyzed the code, constructed a debug mode use case, so that you can quickly repeat the problem.
You need to compile my code from ahi_crash.diff in debug mode, then execute the test case 'innodb.ahi_crash_ut_a_old_index'.
innodb.ahi_crash_ut_a_old_index is also in ahi_crash.diff.
 
The test case uses some special tricks, using the innodb_lru_free_buffer_pool variable to clear the buffer pool and simulate the process of lru free page.

I will upload the ahi_crash.diff later.

Suggested fix:
The code of crash is here:
buf_resize_thread
|-> buf_pool_resize
|  |-> btr_search_disable
|  |  |-> buf_pool_clear_hash_index
|  |  |  |-> btr_search_set_block_not_cached
|  |  |  |  |-> ut_a(old_index != nullptr); // crash

buf_pool_clear_hash_index function behaves as follows:
buf_pool_clear_hash_index {
    ...
    for (ulong p = 0; p < srv_buf_pool_instances; p++) {
        while (--chunk >= chunks) {
            for (; i--; block++) {
                // step1
                if (block->ahi.index.load() == nullptr) continue;

                mutex_enter(&block->mutex);
                btr_search_set_block_not_cached(block);
                // step2
                |-> ut_a(old_index != nullptr); // crash
            }
        }
    }
}

Consider the buf pool clear hash index function:
1. Iterate over the bp page. If block->ahi.index.load() == nullptr, skip the page.
2. If block->ahi.index is not null, add a block mute, then enter the btr_search_set_block_not_cached function and find that block->ahi.index has been set to null. Trigger ut_a(old_index!) = nullptr); 

You can see that between the step1 and step2 above, the ahi page was cleared by another thread (like freed by the lru), resulting in a crash.

I have uploaded the ahi_crash.diff. You can compile the code in debug mode and then
run the ahi_crash_ut_a_old_index test case in the file to repeat the problem

Hello Ke Yu,

Thank you for the report and feedback.
Verified as described.

regards,
Umesh