Bug #40603 | Innodb background IO rate limiting kills performance | ||
---|---|---|---|
Submitted: | 9 Nov 2008 18:22 | Modified: | 12 Aug 2009 17:01 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S4 (Feature request) |
Version: | 5.0,5.1 | OS: | Any |
Assigned to: | Inaam Rana | CPU Architecture: | Any |
Tags: | Contribution, innodb, io, limit, rate |
[9 Nov 2008 18:22]
Mark Callaghan
[10 Nov 2008 15:48]
Heikki Tuuri
InnoDB does not really sleep 1 second between buffer pool flushes. It sets skip_sleep = TRUE. But the main thread does a lot of things besides doing the buffer pool flushes. Ideally, we should have several main threads, and a way to tune the resources we allocate to the insert buffer merge, etc. AIO will speed up flushes, but it introduces another problem: the main thread may exhaust the AIO queue by putting too many writes to it. Assigning this feature request to Inaam, who is our AIO man. srv0srv.c in 5.1: /* ---- We run the following loop approximately once per second when there is database activity */ skip_sleep = FALSE; for (i = 0; i < 10; i++) { n_ios_old = log_sys->n_log_ios + buf_pool->n_pages_read + buf_pool->n_pages_written; srv_main_thread_op_info = "sleeping"; if (!skip_sleep) { os_thread_sleep(1000000); } skip_sleep = FALSE; ... if (UNIV_UNLIKELY(buf_get_modified_ratio_pct() > srv_max_buf_pool_modified_pct)) { /* Try to keep the number of modified pages in the buffer pool under the limit wished by the user */ n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST, 100, ut_dulint_max); /* If we had to do the flush, it may have taken even more than 1 second, and also, there may be more to flush. Do not sleep 1 second during the next iteration of this loop. */ skip_sleep = TRUE; } if (srv_activity_count == old_activity_count) { /* There is no user activity at the moment, go to the background loop */ goto background_loop; } } /* ---- We perform the following code approximately once per 10 seconds when there is database activity */ #ifdef MEM_PERIODIC_CHECK /* Check magic numbers of every allocated mem block once in 10 seconds */ mem_validate_all_blocks(); #endif /* If there were less than 200 i/os during the 10 second period, we assume that there is free disk i/o capacity available, and it makes sense to flush 100 pages. */ n_pend_ios = buf_get_n_pending_ios() + log_sys->n_pending_writes; n_ios = log_sys->n_log_ios + buf_pool->n_pages_read + buf_pool->n_pages_written; if (n_pend_ios < 3 && (n_ios - n_ios_very_old < 200)) { srv_main_thread_op_info = "flushing buffer pool pages"; buf_flush_batch(BUF_FLUSH_LIST, 100, ut_dulint_max); srv_main_thread_op_info = "flushing log"; log_buffer_flush_to_disk(); } ... /* Flush a few oldest pages to make a new checkpoint younger */ if (buf_get_modified_ratio_pct() > 70) { /* If there are lots of modified pages in the buffer pool (> 70 %), we assume we can afford reserving the disk(s) for the time it requires to flush 100 pages */ n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST, 100, ut_dulint_max); } else { /* Otherwise, we only flush a small number of pages so that we do not unnecessarily use much disk i/o capacity from other work */ n_pages_flushed = buf_flush_batch(BUF_FLUSH_LIST, 10, ut_dulint_max); } srv_main_thread_op_info = "making checkpoint"; /* Make a new checkpoint about once in 10 seconds */ log_checkpoint(TRUE, FALSE); srv_main_thread_op_info = "reserving kernel mutex"; mutex_enter(&kernel_mutex); /* ---- When there is database activity, we jump from here back to the start of loop */ if (srv_activity_count != old_activity_count) { mutex_exit(&kernel_mutex); goto loop; }
[6 Jul 2009 23:24]
Mark Callaghan
Heikki, For many workloads I will agree with you -- it skips sleep. And that creates a different problem. It is difficult to understand the rate at which IO occurs in that case. It can call fsync() and do other things much more than expected. To tune the server, I prefer as system that is more predictable so that when I configure the server to do 1000 IOPs from the background threads, then the server does no more than that.
[12 Aug 2009 17:01]
Inaam Rana
Fixed in plugin 1.0.4. Documentations and source available at www.innodb.com