Bug #71411 buf_flush_LRU() does not return correct number in case of compressed pages
Submitted: 17 Jan 2014 16:58 Modified: 27 Jan 2014 16:19
Reporter: Inaam Rana (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[17 Jan 2014 16:58] Inaam Rana
Description:
buf_flush_LRU() returns the number of pages processed. There are two types of processing that can happen. A page can get evicted or a page can get flushed. These two numbers are quite distinct and should not be mixed.

In 5.7 the LRU batch function takes care of counting both numbers separately and report them through innodb_metrics table. However when we are using compressed tables we can end up evicting uncompressed page frames of a compressed page and that will be reported as LRU flushed pages:

static
ulint
buf_do_LRU_batch(
/*=============*/
        buf_pool_t*     buf_pool,       /*!< in: buffer pool instance */
        ulint           max)            /*!< in: desired number of
                                        blocks in the free_list */
{
        ulint   count = 0;

        if (buf_LRU_evict_from_unzip_LRU(buf_pool)) {
                count += buf_free_from_unzip_LRU_list_batch(buf_pool, max);
        }

        if (max > count) {
                count += buf_flush_LRU_list_batch(buf_pool, max - count);
        }

        return(count);
}

How to repeat:
see code.

Suggested fix:
Report and return evicted and flushed pages separately. May be we need to do the same with single page flushing. I'll see if I can come up with a quick patch.
[19 Jan 2014 6:20] Shane Bester
Thanks for the bug report.  See also last comment in bug #69170
[27 Jan 2014 16:19] Inaam Rana
I am going to upload a patch against 5.6.15. It solves the issue reported here but to do that I have to backport revision-id: sunny.bains@oracle.com-20130808223745-7lzp6h40vx1ndc7k from 5.7 as it contained partial fix for this issue. This revid contains other LRU/flushing related stuff as well but all of it makes sense for 5.6 too.

So the patch I am going to  upload will be a union of the above revid backported to 5.6 and the fix for the issue reported in this bug.

Commit comment from 5.7 revid:

    revno: 6228
    revision-id: sunny.bains@oracle.com-20130808223745-7lzp6h40vx1ndc7k
    parent: marc.alff@oracle.com-20130808135008-p5irewph4fm1yu19
    committer: Sunny Bains <Sunny.Bains@Oracle.Com>
    branch nick: trunk
    timestamp: Fri 2013-08-09 08:37:45 +1000
    message:
      WL#7047 - Optimize buffer pool list scans and related batch processing code
    
      Reduce excessive scanning of pages when doing flush list batches. The
      fix is to introduce the concept of "Hazard Pointer", this reduces the
      time complexity of the scan from O(n*n) to O(n).
    
      The concept of hazard pointer is reversed in this work.  Academically a
      hazard pointer is a pointer that the thread working on it will declare as
      such and as long as that thread is not done no other thread is allowed to
      do anything with it.
    
      In this WL we declare the pointer as a hazard pointer and then if any other
      thread attempts to work on it, it is allowed to do so but it has to adjust
      the hazard pointer to the next valid value. We use hazard pointer solely for
      reverse traversal of lists within a buffer pool instance.
    
      Add an event to control the background flush thread. The background flush
      thread wait has been converted to an os event timed wait so that it can be
      signalled by threads that want to kick start a background flush when the
      buffer pool is running low on free/dirty pages.
[27 Jan 2014 16:20] Inaam Rana
Cleanup of LRU/flushing code

Attachment: lru_cleanup-5.6.patch (application/octet-stream, text), 58.60 KiB.

[31 Oct 2014 14:10] Laurynas Biveinis
71411 fix for 5.7

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug71411.patch (application/octet-stream, text), 5.42 KiB.

[31 Oct 2014 14:16] Laurynas Biveinis
Inaam, Sunny, can you take a look at my 5.7 fix attempt?

In 5.7 it does not seem to be critical to propagate the pair of evicted and flushed pages all the way up (no buf_flush_common as in 5.6). So the easiest fix appears to me: 
- keep on adding the different pages together
- but make sure that eviction metrics are registered in buf_free_from_unzip_LRU_list_batch
- add the different pages together in buf_flush_LRU_list_batch processed page result. This looks like a correct change to me - to make evicted pages count in slot->n_flushed_lru - but is quite subtle so I'm not 100% certain. 
- remove unused buf_flush_LRU_lists, clarify some comments.
[31 Oct 2014 14:18] Laurynas Biveinis
The patch passed the public testsuite, release and debug builds, on several linux platforms.
[4 Nov 2014 15:00] Laurynas Biveinis
Bug 71411 fix for 5.7, 2nd attempt

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug71411-2.0.patch (application/octet-stream, text), 9.17 KiB.

[4 Nov 2014 15:04] Laurynas Biveinis
My first fix was too simple. buf_flush_common is gone but its code isn't. To avoid regressing on point 1) of bug 69170, the flushed-evicted pair has to be propagated until buf_flush_end at least. This is done now, and thus the patch is more similar to Inaam's 5.6 patch.

One question though: why does buf_flush_end wake up simulated AIO threads if srv_read_only_mode?

Tested the same as the 1st iteration.
[5 Nov 2014 16:54] Sunny Bains
Laurynas, thanks, I've requested Yasufumi to look at the patch.
[27 Nov 2014 1:11] Yasufumi Kinoshita
Thank you for contribution.
We are currently changing the code around adaptive flushing.
After that, I will look and try to merge your suggestions.
Thank you, again.
[2 Jun 2017 1:58] Laurynas Biveinis
Bug 71411 fix for 8.0.1

Attachment: bug71411-8.0.1.patch (application/octet-stream, text), 7.42 KiB.

[5 Aug 2017 4:49] Laurynas Biveinis
Bug 71411 fix for 8.0.2

Attachment: bug71411-8.0.2.patch (application/octet-stream, text), 7.42 KiB.

[2 Feb 2018 5:26] Laurynas Biveinis
The latest contributed fix applies cleanly and works on 8.0.4 too.
[13 Jun 2018 12:25] Laurynas Biveinis
Bug 71411 fix for 8.0.11

Attachment: bug71411-8.0.11.patch (application/octet-stream, text), 8.50 KiB.