Bug #100379 Resize buffer pool cause serious LRU_list_mutex contention
Submitted: 30 Jul 2020 8:58 Modified: 21 Sep 2020 14:22
Reporter: Baolin Huang Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: buffer pool resize

[30 Jul 2020 8:58] Baolin Huang
When decrease buffer pool size, buf_resize_thread and the background flush thread do large number of page flush at the same time.

The code is located in 

static ulint buf_flush_LRU_list(buf_pool_t *buf_pool) {
  scan_depth = UT_LIST_GET_LEN(buf_pool->LRU);
  withdraw_depth = buf_get_withdraw_depth(buf_pool);
  if (withdraw_depth > srv_LRU_scan_depth) {
    scan_depth = ut_min(withdraw_depth, scan_depth);
  } else {
    scan_depth = ut_min(static_cast<ulint>(srv_LRU_scan_depth), scan_depth);

    lru_len = UT_LIST_GET_LEN(buf_pool->LRU);
    if (UT_LIST_GET_LEN(buf_pool->withdraw) < buf_pool->withdraw_target) {
      scan_depth = ut_min(ut_max(buf_pool->withdraw_target -
      buf_flush_do_batch(buf_pool, BUF_FLUSH_LRU, scan_depth, 0, &n_flushed); // 刷脏到 free_list 
      buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU);

This can cause:
1. Execute buf_flush_do_batch with large scan_depth would raise serious lock contention, mainly because of LRU_list_mutex. Give a pstack below.

#8  enter  							  at storage/innobase/include/ib0mutex.h:766
#9  buf_flush_page_and_try_neighbors  at storage/innobase/buf/buf0flu.cc:1669
#10 in buf_flush_LRU_list_batch  at storage/innobase/buf/buf0flu.cc:1789
#11 buf_do_LRU_batch  at storage/innobase/buf/buf0flu.cc:1847
#12 buf_flush_batch  at storage/innobase/buf/buf0flu.cc:1957
#13 buf_flush_do_batch  at storage/innobase/buf/buf0flu.cc:2091
#14 in buf_pool_withdraw_blocks  at storage/innobase/buf/buf0buf.cc:1808
#15 buf_pool_resize  at storage/innobase/buf/buf0buf.cc:2148

2. Background brush dirty spend too much time to flush the LRU LIST, BUF_FLUSH_LIST cannot be executed. In extreme cases this can lead to the entire instance unusable completely.

How to repeat:
1. Sysbench on an instance with 40G buffer pool size
2. set global innodb_buffer_pool_size = 20G;
3. check the qps


Suggested fix:
Adjust the scan_depth set to avoid flush LRU with whole withdraw_depth size.
[30 Jul 2020 12:33] MySQL Verification Team
Hi Mr. Huang,

Thank you for your bug report.

We are quite aware of the impact that resizing of the buffer pool has on the specific mutex contention.

However, we do not see how can that be improved drastically.

Your idea of the manner of execution is very vague. Please, make it much more verbose and, if possible, propose to us a patch. Actually, a very detailed description of the new algorithm would suffice. Then we could make this a feature request.

We are waiting for your feedback.
[31 Aug 2020 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".