MySQL Bugs: #75512: buf_flush_do_batch can be optimized if the lsn

Bug #75512	buf_flush_do_batch can be optimized if the lsn_limit is already satisfied.
Submitted:	15 Jan 2015 7:59	Modified:	10 Mar 2015 3:20
Reporter:	zhai weixiang (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.7.5, 5.7.6	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
In function buf_flush_do_batch, If someone is performing a buffer pool flush operation, it will return false, regardless if the lsn_limit is already satisfied. 

Consider this backtrace:

log_checkpoint_margin-->log_preflush_pool_modified_pages -->buf_flush_lists -->buf_flush_do_batch

If return false from buf_flush_do_batch, then loop again in function log_checkpoint_margin and do one more dirty page pre-flush though it's not necessary.

I slightly changed the code  in function buf_flush_do_batch :

        if (!buf_flush_start(buf_pool, type)) {
                if (opt_bp_flush_optimize
                    && type == BUF_FLUSH_LIST
                    && min_n == ULINT_MAX
                    && buf_pool_oldest_lsn_for_instance(buf_pool) >= lsn_limit) {
                        /* Check if the oldest lsn has exceeded lsn_limit. */
                        srv_bp_optimize_counter++;
                        return (true);

                }
                return(false);
        }

I'll post a simple patch later.

How to repeat:
Read the code.

Suggested fix:
As described above.

Since all sessions need to wait all batch flush end in function log_preflush_pool_modified_pages, I think this change should be safe.

a simple patch for discuess, not tested fully

Attachment: bug75512.diff (application/octet-stream, text), 5.99 KiB.

Thank you for the report.

Verified as described using code analysis.

But regardless of the state of the bug could you, please, provide example scenario when current behavior leads to significant performance loss (if I understood correctly returned false leads to one more iteration in buf_flush_lists) or wrong behavior.

If the LSN age increased quickly and exceeded max_modified_age_sync,  the user threads began to preflush dirty pages.  Thi scenario can benefit from the changes. 

To repeat: Under heavy DML workload, with relaxed setting (for example : sync_binlog = 0, innodb_flush_log_at_trx_commit =2, gtid_mode=off ), also reduce the total size of redo log file.

I've tested the changes on our production environment.  The improvement is not very obvious. Anyway, better than nothing. :-)

should this bug be closed because of some new changes in 5.7.6 ?

In 5.7.6, the user threads doesn't need to do page flushes. Instead, it will only wait for the page cleaner thread.

quoted change log entry :

When the oldest modification LSN is close to the defined maximum (max_modified_age_sync), a synchronous preflush of buffer pool pages is initiated which may result in a “flush wait” scenario for user threads. To smooth throughput, user threads are only required to wait for a target LSN to be reached instead of waiting for an entire flushing batch to finish.