Description:
in our daily stress testing enviroment, we randomly encounter assertion of buf_fix_count equals to zero while getting a page. After code analysis, I guess this may also happen on higher version (we are using a heavily modified 8.0.24)
The root cause is that while restoring a cursor, it checks if a block is valid by firstly checking if page id is not changed and then checks if block state is BUF_BLOCK_FILE_PAGE. But two checking is not atomic, and page id/state may get changed on fly.
Three threads are involved
1. restore cursor 2.free a page(buf_LRU_free_page) 3.buf_page_init_for_read
remove page from page_hash, state
changed to BUF_BLOCK_REMOVE_HASH but
buf_page_t::id is not reset
-----------------------------------
invoke run_with_hint
-> buffer_fix_block_if_still_valid
m_page_id == m_block->page.id(true)
----------------------------------
reset page id to UINX_MAX and add
to free list
---------------------------------
Get block from free list
Init page id to new ID and
set page state to BUF_BLOCK_FILE_PAGE
(different id so protected by different hash lock)
------------------------------------
check
buf_block_get_state(m_block) ==
BUF_BLOCK_FILE_PAGE(true)
Increase block->page.buf_fix_count to 1
--------------------------------------
reset buf_fix_count to 0 in buf_page_init_low
---------------------------------------------
func(F)
decrease buf_fix_count to (unsigned)-1
How to repeat:
read the code
Suggested fix:
use block mutex to avoid changing of page id and state while checking if block is valid like this:
--- a/storage/innobase/buf/buf0block_hint.cc
+++ b/storage/innobase/buf/buf0block_hint.cc
@@ -72,10 +72,17 @@ void Block_hint::buffer_fix_block_if_still_valid() {
rw_lock_s_lock(latch);
/* If not own buf_pool_mutex, page_hash can be changed. */
latch = buf_page_hash_lock_s_confirm(latch, pool, m_page_id);
- if (buf_is_block_in_instance(pool, m_block) &&
- m_page_id == m_block->page.id &&
- buf_block_get_state(m_block) == BUF_BLOCK_FILE_PAGE) {
- buf_block_buf_fix_inc(m_block, __FILE__, __LINE__);
+ if (buf_is_block_in_instance(pool, m_block)) {
+ buf_block_t *ptr = m_block;
+ buf_page_mutex_enter(m_block);
+ if (m_page_id == m_block->page.id &&
+ buf_block_get_state(m_block) == BUF_BLOCK_FILE_PAGE) {
+ buf_block_buf_fix_inc(m_block, UT_LOCATION_HERE);
+ } else {
+ clear();
+ }
+
+ buf_page_mutex_exit(ptr);
} else {
clear();
}