Bug #116303 buf_fix_count of page may become negative with small buffer pool
Submitted: 5 Oct 7:15 Modified: 7 Oct 8:39
Reporter: zhai weixiang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[5 Oct 7:15] zhai weixiang
Description:
in our daily stress testing enviroment, we randomly encounter assertion of buf_fix_count equals to zero while getting a page. After code analysis, I guess this may also happen on higher version (we are using a heavily modified 8.0.24)

The root cause is that while restoring a cursor, it checks if a block is valid by firstly checking if page id is not changed and then checks if block state is BUF_BLOCK_FILE_PAGE. But two checking is not atomic, and page id/state may get changed on fly.

Three threads are involved 

1. restore cursor                              2.free a page(buf_LRU_free_page)               3.buf_page_init_for_read
                                                 remove page from page_hash, state
                                                 changed to BUF_BLOCK_REMOVE_HASH but
                                                 buf_page_t::id is not reset
                                                 -----------------------------------
invoke run_with_hint
-> buffer_fix_block_if_still_valid
m_page_id == m_block->page.id(true)
----------------------------------
                                                 reset page id to UINX_MAX and add
                                                 to free list
                                                 ---------------------------------

                                                                                              Get block from free list
                                                                                              Init page id to new ID and
                                                                                              set page state to BUF_BLOCK_FILE_PAGE
                                                                                              (different id so protected by different hash lock)
                                                                                              ------------------------------------
check 
buf_block_get_state(m_block) == 
BUF_BLOCK_FILE_PAGE(true)
Increase block->page.buf_fix_count to 1
--------------------------------------
                                                                                              reset buf_fix_count to 0 in buf_page_init_low
                                                                                              ---------------------------------------------
func(F)
decrease buf_fix_count to (unsigned)-1

    
    

How to repeat:
read the code

Suggested fix:
use block mutex to avoid changing of page id and state while checking if block is valid like this:

--- a/storage/innobase/buf/buf0block_hint.cc
+++ b/storage/innobase/buf/buf0block_hint.cc
@@ -72,10 +72,17 @@ void Block_hint::buffer_fix_block_if_still_valid() {
     rw_lock_s_lock(latch);
     /* If not own buf_pool_mutex, page_hash can be changed. */
     latch = buf_page_hash_lock_s_confirm(latch, pool, m_page_id);
-    if (buf_is_block_in_instance(pool, m_block) &&
-        m_page_id == m_block->page.id &&
-        buf_block_get_state(m_block) == BUF_BLOCK_FILE_PAGE) {
-      buf_block_buf_fix_inc(m_block, __FILE__, __LINE__);
+    if (buf_is_block_in_instance(pool, m_block)) {
+       buf_block_t *ptr = m_block;
+       buf_page_mutex_enter(m_block);
+       if (m_page_id == m_block->page.id &&
+               buf_block_get_state(m_block) == BUF_BLOCK_FILE_PAGE) {
+               buf_block_buf_fix_inc(m_block, UT_LOCATION_HERE);
+       } else {
+               clear();
+       }
+
+       buf_page_mutex_exit(ptr);
     } else {
       clear();
     }
[7 Oct 8:39] zhai weixiang
My colleague has reported https://bugs.mysql.com/bug.php?id=116305 so I'll close this one. Both has same root cause.
[7 Oct 8:39] zhai weixiang
My colleague has reported https://bugs.mysql.com/bug.php?id=116305 so I'll close this one. Both has same root cause.
[7 Oct 9:37] MySQL Verification Team
Hi Mr. weixiang,

Thank you for informing us.

We shall take a look at the original bug report.

Closed.