| Bug #100379 | Resize buffer pool cause serious LRU_list_mutex contention | ||
|---|---|---|---|
| Submitted: | 30 Jul 2020 8:58 | Modified: | 21 Sep 2020 14:22 |
| Reporter: | Baolin Huang | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
| Version: | 8.0 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | buffer pool resize | ||
[30 Jul 2020 12:33]
MySQL Verification Team
Hi Mr. Huang, Thank you for your bug report. We are quite aware of the impact that resizing of the buffer pool has on the specific mutex contention. However, we do not see how can that be improved drastically. Your idea of the manner of execution is very vague. Please, make it much more verbose and, if possible, propose to us a patch. Actually, a very detailed description of the new algorithm would suffice. Then we could make this a feature request. We are waiting for your feedback.
[31 Aug 2020 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".

Description: When decrease buffer pool size, buf_resize_thread and the background flush thread do large number of page flush at the same time. The code is located in ``` static ulint buf_flush_LRU_list(buf_pool_t *buf_pool) { ... scan_depth = UT_LIST_GET_LEN(buf_pool->LRU); withdraw_depth = buf_get_withdraw_depth(buf_pool); if (withdraw_depth > srv_LRU_scan_depth) { scan_depth = ut_min(withdraw_depth, scan_depth); } else { scan_depth = ut_min(static_cast<ulint>(srv_LRU_scan_depth), scan_depth); } ... } ``` ``` buf_pool_withdraw_blocks { lru_len = UT_LIST_GET_LEN(buf_pool->LRU); ... if (UT_LIST_GET_LEN(buf_pool->withdraw) < buf_pool->withdraw_target) { scan_depth = ut_min(ut_max(buf_pool->withdraw_target - UT_LIST_GET_LEN(buf_pool->withdraw), static_cast<ulint>(srv_LRU_scan_depth)), lru_len); buf_flush_do_batch(buf_pool, BUF_FLUSH_LRU, scan_depth, 0, &n_flushed); // 刷脏到 free_list buf_flush_wait_batch_end(buf_pool, BUF_FLUSH_LRU); } ... } ``` This can cause: 1. Execute buf_flush_do_batch with large scan_depth would raise serious lock contention, mainly because of LRU_list_mutex. Give a pstack below. ``` #8 enter at storage/innobase/include/ib0mutex.h:766 #9 buf_flush_page_and_try_neighbors at storage/innobase/buf/buf0flu.cc:1669 #10 in buf_flush_LRU_list_batch at storage/innobase/buf/buf0flu.cc:1789 #11 buf_do_LRU_batch at storage/innobase/buf/buf0flu.cc:1847 #12 buf_flush_batch at storage/innobase/buf/buf0flu.cc:1957 #13 buf_flush_do_batch at storage/innobase/buf/buf0flu.cc:2091 #14 in buf_pool_withdraw_blocks at storage/innobase/buf/buf0buf.cc:1808 #15 buf_pool_resize at storage/innobase/buf/buf0buf.cc:2148 ``` 2. Background brush dirty spend too much time to flush the LRU LIST, BUF_FLUSH_LIST cannot be executed. In extreme cases this can lead to the entire instance unusable completely. How to repeat: 1. Sysbench on an instance with 40G buffer pool size 2. set global innodb_buffer_pool_size = 20G; 3. check the qps Suggested fix: Adjust the scan_depth set to avoid flush LRU with whole withdraw_depth size.