Bug #55004 async fuzzy checkpoint constraint isn't really async
Submitted: 5 Jul 2010 16:09 Modified: 27 Jun 2011 17:28
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S5 (Performance)
Version:5.1.47 OS:Any
Assigned to: Inaam Rana CPU Architecture:Any
Tags: async, checkpoint, Contribution, fuzzy, innodb

[5 Jul 2010 16:09] Mark Callaghan
Description:
The async fuzzy checkpoint constraint in InnoDB isn't really async. It blocks for IO to complete. This can be a serious issue on servers that don't have a lot of concurrent threads. A server that doesn't do much more than replication can suffer too much delay in the SQL slave thread because of this.

This is made worse because SHOW INNODB STATUS provides few details on this.

How to repeat:
> > Domas noticed that the replication thread on a host with 100% buffer
> pool hit rate was blocking on IO, as well as spending CPU cycles
> computing checksums before writing out pages. The callstack was:
> >
> >
> __pwrite_nocancel,os_file_write,os_aio,_fil_io,buf_flush_buffered_write
> s,buf_flush_batch,log_preflush_pool_modified_pages,log_check_margins,ro
> w_ins_index_entry_low,row_ins_index_entry,row_upd_step,row_update_for_m
> ysql,ha_innobase::update_row,handler::ha_update_row,write_record,mysql_
> insert,mysql_execute_command,mysql_parse,Query_log_event::do_apply_even
> t,apply_event_and_update_pos,handle_slave_sql,start_thread,clone,
> >
> > I confirmed that we were nowhere close to the log sync age limit and
> that log_check_margins() was calling log_preflush_pool_modified_pages()
> with sync set to FALSE.
> >
> > Inside log_preflush_pool_modified_pages() however is a call to
> buf_flush_batch() which calls buf_flush_buffered_writes().
> buf_flush_buffered_writes does a synchronous write to the doublewrite
> buffer and an async write to the tablespaces. After the async write
> though it calls buf_flush_sync_datafiles() which waits for all pending
> IOs to complete (even IO from other threads). This means that the
> behavior of log_preflush_pool_modified_pages() is always synchronous
> regardless of the value of the sync flag.

Suggested fix:
It isn't clear to me that the behavior can be fixed for the async limit because of the design of the IO system. However, a background thread can try to enforce the async limit instead of foreground threads. As long as it keeps up, then user threads won't get a delay from this.

A possible fix is in:
http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1/revision/3522
[5 Jul 2010 16:39] Mark Callaghan
There are a few more details in http://www.facebook.com/notes/mysqlfacebook/innodb-fuzzy-checkpoints/408059000932
[7 Jul 2010 0:57] Inaam Rana
Mark,

Thank you for the input.

I agree with you on all accounts. Your suggested fix also looks good. I have not looked at the patch but I assume that you are using the master thread as the background thread for doing the preflushing. I think I should merge this into the adaptive_flushing heuristics i.e.: until we reach modified_async_age we use adaptive_flushing heuristics and after that we do background preflushing to the requisite LSN no matter how many dirty pages we have to flush for that. Meanwhile the user threads should only watch out for the modified_sync age and if that is reached then we can trigger a sync preflushing in one of the user thread as a last resort while forcing other threads to wait for it to finish.

regards,
inaam
[7 Jul 2010 3:31] Mark Callaghan
I use something like this in the main background thread:

                if (UNIV_UNLIKELY(buf_get_modified_ratio_pct()
                                  > srv_max_buf_pool_modified_pct)) {

                        /* Try to keep the number of modified pages in the
                        buffer pool under the limit wished by the user */

                        srv_main_thread_op_info =
                                "flushing buffer pool pages";
                        my_get_fast_timer(&fast_timer);
                        n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST,
                                                          PCT_IO(100),
                                                          IB_ULONGLONG_MAX);
                        srv_buf_flush_secs +=
                                my_fast_timer_diff_now(&fast_timer, NULL);

                        if (n_pages_flushed != ULINT_UNDEFINED) {
                                srv_n_flushed_max_dirty += n_pages_flushed;
                        } else {
                                n_pages_flushed = 0;
                        }

                } else if (srv_background_checkpoint) {

                        srv_main_thread_op_info =
                                "flushing pages for background checkpoint";

                        /* Flush dirty pages with min LSN values that are
                        almost too old. If this is not done, a user thread may
                        have to block doing it. */

                        n_pages_flushed = log_checkpoint_margin_background(PCT_IO(100));

                }

                /* This isn't an 'else' branch with the code above because when
                the srv_background_checkpoint branch does no work, then this
                block of code should be run. */

                if (!n_pages_flushed && srv_adaptive_flushing) {

                        /* Try to keep the rate of flushing of dirty
                        pages such that redo log generation does not
                        produce bursts of IO at checkpoint time. */
                        ulint n_flush = buf_flush_get_desired_flush_rate();
[27 Jun 2011 17:28] John Russell
Closing without separate changelog entry. Adding the bug numbers under existing changelog entry for page_cleaner thread (for 5.6.2).