Bug #116956 buf_pool_validate_instance() incorrect
Submitted: 12 Dec 2024 1:47 Modified: 13 Dec 2024 13:51
Reporter: Ke Yu (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.40 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[12 Dec 2024 1:47] Ke Yu
Description:
The code in buf_pool_validate_instance() incorrect.
Line buf0buf.cc:6124 should be "UT_LIST_GET_LEN(buf_pool->free) != n_free) {"

The wrong code is:
static void buf_pool_validate_instance(buf_pool_t *buf_pool) {
...
  if (buf_pool->curr_size == buf_pool->old_size &&
      UT_LIST_GET_LEN(buf_pool->free) > n_free) {    // buf0buf.cc:6124, This '>' should be '!=' 
    ib::fatal(UT_LOCATION_HERE, ER_IB_MSG_85)
        << "Free list len " << UT_LIST_GET_LEN(buf_pool->free)
        << ", free blocks " << n_free << ". Aborting...";
  }
...
}

This is a code suggestion. Because it may appear that UT_LIST_GET_LEN(buf_pool->free) < n_free due to some bugs.
You can find that this code in MySQL 5.7 is the '!='.

static
ibool
buf_pool_validate_instance(
/*=======================*/
	buf_pool_t*	buf_pool)	/*!< in: buffer pool instance */
{
...
        if (buf_pool->curr_size == buf_pool->old_size
	    && UT_LIST_GET_LEN(buf_pool->free) != n_free) {  // This is correct.

		ib::fatal() << "Free list len "
			<< UT_LIST_GET_LEN(buf_pool->free)
			<< ", free blocks " << n_free << ". Aborting...";
	}
...
}

How to repeat:
From the code.

Suggested fix:
Line buf0buf.cc:6124 of 8.0.40 should be "UT_LIST_GET_LEN(buf_pool->free) != n_free) {"
[12 Dec 2024 1:51] Ke Yu
This is my fix code, based on 8.0.40.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 116956.diff (application/octet-stream, text), 933 bytes.

[12 Dec 2024 4:24] MySQL Verification Team
Hello Ke Yu,

Thank you for the report and contribution.

regards,
Umesh
[13 Dec 2024 13:51] Marcin Babij
Hello Ke!
Thank you for your contribution.
Unfortunately it is not correct.

We have
```
static void buf_pool_validate_instance(buf_pool_t *buf_pool) {
[...]
  mutex_enter(&buf_pool->chunks_mutex);
  mutex_enter(&buf_pool->LRU_list_mutex);
  hash_lock_x_all(buf_pool->page_hash);
  mutex_enter(&buf_pool->zip_mutex);
  mutex_enter(&buf_pool->free_list_mutex);
  mutex_enter(&buf_pool->flush_state_mutex);
```
where we count types of all pages under these latches. We don't take block mutex, but it doesn't matter here.

Consider:
```
void buf_LRU_block_free_non_file_page(buf_block_t *block) {
[...]
    buf_block_set_state(block, BUF_BLOCK_NOT_USED);
    mutex_enter(&buf_pool->free_list_mutex);
    UT_LIST_ADD_FIRST(buf_pool->free, &block->page);
    ut_d(block->page.in_free_list = true);
    ut_ad(!block->page.someone_has_io_responsibility());
    mutex_exit(&buf_pool->free_list_mutex);
  }
}
```
This method seems to be called from others that seem to not have any latches, in particular
```
bool buf_LRU_free_page(buf_page_t *bpage, bool zip) {
[...]
  mutex_exit(&buf_pool->LRU_list_mutex);

  /* Remove possible adaptive hash index on the page.
  The page was declared uninitialized by
  buf_LRU_block_remove_hashed().  We need to flag
  the contents of the page valid (which it still is) in
  order to avoid bogus Valgrind warnings.*/
  UNIV_MEM_VALID(((buf_block_t *)bpage)->frame, UNIV_PAGE_SIZE);
  btr_search_drop_page_hash_index((buf_block_t *)bpage, true);
  UNIV_MEM_INVALID(((buf_block_t *)bpage)->frame, UNIV_PAGE_SIZE);
[...]
  buf_LRU_block_free_hashed_page((buf_block_t *)bpage);
```
doesn't have page hash latch nor LRU mutex, and it seems it will not have any other latches.
So, we can have a block with `BUF_BLOCK_NOT_USED` state without it being added to the free list.
So it can happen that UT_LIST_GET_LEN(buf_pool->free) <= n_free. So, we can only file an error on `>`, not on `!=`.

Another case, now with getting a new block out of the free list:
```
buf_block_t *buf_LRU_get_free_only(buf_pool_t *buf_pool) {
  buf_block_t *block;

  mutex_enter(&buf_pool->free_list_mutex);

  block = reinterpret_cast<buf_block_t *>(UT_LIST_GET_FIRST(buf_pool->free));

  while (block != nullptr) {
[...]
    UT_LIST_REMOVE(buf_pool->free, &block->page);
    mutex_exit(&buf_pool->free_list_mutex);

[...]
      buf_block_set_state(block, BUF_BLOCK_READY_FOR_USE);
[...]
      return (block);
    }
  }

  mutex_exit(&buf_pool->free_list_mutex);

  return (block);  ----------------- here we return nullptr only
}
```
The `buf_LRU_get_free_only` seems to be called from many places, but without any more latches - when we need a page e.g. for hash map, we don't need any latches. We don't have any mutex, when we set the block state, we don't do that under the `free_list_mutex`.
So we can have more blocks in `BUF_BLOCK_READY_FOR_USE` state, than we have on free list, again, we can't use `!=`', only `>` is valid.
This was changed in `wl#8423 InnoDB: Split the buffer pool mutex` where before the patch the state was being updated along with adding to the free list under the big buf pool instance mutex.

Current code seems correct. And it seems hard to fix otherwise without introducing a potential performance regression as the critical sections under the free_list_mutex would be longer.

Best Regards,
Marcin Babij