MySQL Bugs: #58966: Too less agressive dirty page flushing in 5.5

Bug #58966	Too less agressive dirty page flushing in 5.5
Submitted:	16 Dec 2010 8:48	Modified:	14 Jan 2012 12:03
Reporter:	Yoshinori Matsunobu (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.5	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
I encountered on my test environment that flushing dirty pages from user threads (preflushing) happened many more times in 5.5 InnoDB than in 5.1 plugin.
I suspect that InnoDB's background flushing activities (burst flushing when dirty pages exceeding innodb_max_dirty_pages_pct, or adaptive flushing) in 5.5 is less aggressive than 5.1 Plugin. 

5.1 plugin's srv/srv0srv.c srv_master_thread()
        for (i = 0; i < 10; i++) {
...
                if (!skip_sleep) {

                        os_thread_sleep(1000000);
                        srv_main_sleeps++;
                }
...
                if (UNIV_UNLIKELY(buf_get_modified_ratio_pct()
                                  > srv_max_buf_pool_modified_pct)) {

                        /* Try to keep the number of modified pages in the
                        buffer pool under the limit wished by the user */
                        srv_main_thread_op_info =
                                "flushing buffer pool pages";
                        n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST,
                                                          PCT_IO(100),
                                                          IB_ULONGLONG_MAX);

                        /* If we had to do the flush, it may have taken
                        even more than 1 second, and also, there may be more
                        to flush. Do not sleep 1 second during the next
                        iteration of this loop. */

                        skip_sleep = TRUE;
                } else if (srv_adaptive_flushing) {

                        /* Try to keep the rate of flushing of dirty
                        pages such that redo log generation does not
                        produce bursts of IO at checkpoint time. */
                        ulint n_flush = buf_flush_get_desired_flush_rate();

                        if (n_flush) {
                                srv_main_thread_op_info =
                                        "flushing buffer pool pages";
                                n_flush = ut_min(PCT_IO(100), n_flush);
                                n_pages_flushed =
                                        buf_flush_batch(
                                                BUF_FLUSH_LIST,
                                                n_flush,
                                                IB_ULONGLONG_MAX);

                                if (n_flush == PCT_IO(100)) {
                                        skip_sleep = TRUE;
                                }
                        }
                }

5.5 srv/srv0srv.c srv_master_thread()
around srv/srv0srv.c line 2733:
        for (i = 0; i < 10; i++) {
....
                if (UNIV_UNLIKELY(buf_get_modified_ratio_pct()
                                  > srv_max_buf_pool_modified_pct)) {

                        /* Try to keep the number of modified pages in the
                        buffer pool under the limit wished by the user */

                        srv_main_thread_op_info =
                                "flushing buffer pool pages";
                        n_pages_flushed = buf_flush_list(
                                PCT_IO(100), IB_ULONGLONG_MAX);

                } else if (srv_adaptive_flushing) {

                        /* Try to keep the rate of flushing of dirty
                        pages such that redo log generation does not
                        produce bursts of IO at checkpoint time. */
                        ulint n_flush = buf_flush_get_desired_flush_rate();

                        if (n_flush) {
                                srv_main_thread_op_info =
                                        "flushing buffer pool pages";
                                n_flush = ut_min(PCT_IO(100), n_flush);
                                n_pages_flushed =
                                        buf_flush_list(
                                                n_flush,
                                                IB_ULONGLONG_MAX);
                        }
                }

Skipping 1-second sleeping code was removed in 5.5. Was there any reason to remove it? Background buf_flush_list() is called at most one time per second. If many more redo entries / dirty pages are generated than InnoDB flushes per second (adaptive flushing), it will sooner or later reach conditions that preflush happens. Increasing just innodb_io_capacity didn't help in my case. 
I verified that I could avoid massive preflushing by setting innodb_max_dirty_pages_pct lower (i.e. 25) and innodb_io_capacity higher (i.e. 1200), but I do not like setting innodb_max_dirty_pages_pct too low. 

How to repeat:
Read srv/srv0srv.c srv_master_thread()..

Looks like this change: bug #56933 "the one line fix reinstates an unconditional one second sleep in the outermost master thread loop. It is present in 5.1 but was lost during some early 5.5 work on the master thread. The result of that loss was that the master thread action became skewed towards background and flush loops which in turn resulted in considerably more flushing leading to loss of performance."

How many buffer pools are you using? Some of the tuning was done to reduce excessive flushing when there are many buffer pools. That's at bug #54346 .

What settings are you using for innodb_purge_threads and innodb_purge_batch_size? The first thing to do is set innodb_purge_threads to at least 1. If you've already done that then adding another purge thread or adjusting innodb_purge_batch_size may help.

Some earlier discussion of this area was for bug #40603 .

Yoshinori,

- The sleep is not taken out in 5.5. In fact, we have tried to make it more accurate. In 5.1 whenever the master thread would do some flushing work in the loop it will not sleep for next iteration. In 5.5 we measure the time spend during the flushing activity and sleep for 1 - 'time spent during flushing' seconds. The reason to fine tune this was because we noted that with the introduction of native AIO the flushing is happening too quickly. Are you running your tests on Linux with native AIO enabled?

- You can be generous with innodb_io_capacity. I'd say that with a reasonable IO subsystem one should start with 2000 and then fine tune based on throughput.

- As James mentioned having a separate purge thread can help. Purge processing can become a major drag on the master thread.

- Note that there are no changes to heuristics that we used for adaptive flushing in 5.1 and 5.5.

Thank you for the report.

Please do changes and answer on questions James and Inaam asked before.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Would there be any value in making this configurable (i.e., a feature request)?

This way folks could choose the 1 full second (5.1) or the new method (1 - "time spent during flushing" seconds).

Perhaps that'd be useful for those without native aio threads, or those who simply saw better performance with the 1 full second.

Also, is it possible "time spent during flushing" could be >= 1?  If so, then it would seem this could effectively sleep for 0.

I think having option to force "old" 5.1 behavior is a good idea in any case.