Bug #85205 A follow-up fix for buffer pool mutex split patch might be suboptimal, commit 1
Submitted: 27 Feb 2017 13:09 Modified: 1 Mar 2017 15:22
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.0 OS:Any
Assigned to: CPU Architecture:Any

[27 Feb 2017 13:09] Laurynas Biveinis
Description:
One of the follow-up commits to the buffer pool mutex split patch (bug 75534) at is [1]. It is not entirely clear what problem is it fixing and whether it's done in an optimal way. 

The commit message describes a race condition between one thread fixing a dirty block and another flushing and freeing it. This is indeed a scenario bug 75534 patch must handle. This commit addresses it by adding buffer pool block mutex protection to many, but not all, block->page.buf_fix_count writes. We haven't encountered this particular bug, and the commit message is vague enough, so I'll assume that it was necessary to serialize between 1) all the functions calling buf_fix_inc that this patch added block mutex protection for, and 2) buf_LRU_free_page doing the page eviction. Thus, I'll assume that dirty -> clean transition by flushing is not an issue, only eviction is. Given this:

- Why there is need to add protection to buf_fix_dec calls? These do not serialize with eviction in any way, do they?
- It looks like there is no need to add protection to buf_page_get_zip, because it is already serialized with buf_LRU_free_page by the page hash latch.
- Likewise for buf_page_try_get_func.
- The commit introduces a new block mutex critical section around buf_page_get_io_fix_unlocked call in buf_page_get_gen. I/O fix is not buffer pool fix, the commit message does not describe this change. Why is it necessary? If it is indeed necessary, then there is no need to use _unlocked variant of the function, buf_page_get_io_fix should be called instead.
- If the above is correct, then the commit reduces to a single correct change, in a UNIV_IBUF_DEBUG section of buf_page_get_gen

Again, it's hard to tell without knowing more details about the original bug you fixed, at least the stacktraces.

[1]:

commit 65649a16ce9412f7d9286e7e6ad6c2de5769be25
Author: Shaohua Wang <shaohua.wang@oracle.com>
Date:   Thu Apr 21 13:34:00 2016 +0800

    BUG#23067038 ASSERTION FAILURE: BUF0BUF.CC:2861:BUF_PAGE_IN_FILE(BPAGE)
                 LEADS TO CORRUPT DATA
    
    It's a regression of wl#8423 InnoDB: Split the buffer pool mutex,a
    in which we removed block mutex protection for buf_fix_count.
    
    The assertion happens when one thread fixed a dirty block but the
    other thread flushed the block and moved the page from LRU list to
    free list, because it saw buf_fix_count is 0, other than 1.
    
    The solution is holding block mutex when buf fix and unfix.
    
    Reviewed-by: Debarun Banerjee <debarun.banerjee@oracle.com>
    RB: 12456

How to repeat:
Code review
[27 Feb 2017 13:36] Laurynas Biveinis
See also bug 85208
[1 Mar 2017 15:22] Sinisa Milivojevic
Hi!

I have studied carefully the patch and the code and I find your conclusions as justified.

Hence, I verify this bug on the base of the code analysis.