Bug #69170 | buf_flush_LRU is lazy | ||
---|---|---|---|
Submitted: | 7 May 2013 22:50 | Modified: | 13 May 2013 15:30 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.6.11 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[7 May 2013 22:50]
Mark Callaghan
[8 May 2013 4:09]
Inaam Rana
Mark, Before going into details of the bug let me describe the background a bit. (yes, indeed there is a bug): * Why do we do LRU scan in smaller chunks? - This is because when an LRU batch is happening we are doing two things in the same scan. (a) If the page is replaceable we evict it and put it in free list (b) If the page is flushable we flush it and leave it in the LRU for the next iteration to put it on the free list We have this rather 'not so obvious' behavior because whenever we evict or flush a page we release the buffer pool mutex. That means we have to start the scan again from the tail of the LRU. For example, consider we are working with HDD and the lru_scan_depth is, say, 20,000. If we try to do a batch in a single pass what will typically happen is: * Assume all pages are dirty and flushable i.e.: not IO or buffer fixed. * When we deal with the first page we IO fix it, dispatch the IO request and restart the scan * In new scan we see the page at tail IO fixed, we skip it, IO fix the next one, dispatch IO request and restart the scan * For third page we have to skip the first two. So this whole thing becomes a O(n*n) Now if we consider very fast storage it is quite likely that by the time we are starting our new scan the page at the tail had already been flushed. This changes the whole picture. Now we'll simply evict the page and start scan again. Hence with larger LRU batches scans can turn O(n*n) for slower IO subsystem but will be likely O(n) for very fast storage systems. Given this your solution to "change buf_flush_start to not check that buf_pool->n_flush[flush_type] > 0" may work for high end systems but not for HDD driven systems. In fact, in case of SSD instead of removing that check it may just be better to do the whole batch in one single sweep instead of doing it in smaller chunks. I hope above explanation gives you some insight into why the smaller chunk batch was introduced in the first place. Having said that, I fully agree there is a bug here. Because smaller chunk size is 100 (this value is driven by available space in dblwr buffer) all of the subsequent calls for the same instance will return immediately without doing work. Bug Fix: ======== I have already pushed a fix for this. It'll be available in 5.6.12. However, I am not sure if you'd like the fix or not. The fix is to wait for small chunk LRU batch to end before starting a new one. The idea is that it will work on both slow and fast systems. This is, however, a short term fix. What I am working on currently is to have the ability to do a scan of LRU in O(n) even when buf_pool::mutex is released. If that goes through then the need to have smaller chunk LRU batch splitting won't be needed at all and the page_cleaner won't have to wait for the batch to end. Calling buf_flush_sync_datafiles(): =================================== It is not required because syncing is now done in the IO helper thread.
[8 May 2013 13:13]
Mark Callaghan
Would this change be useful? > 1) change buf_flush_common to only call buf_dblwr_flush_buffered_writes when page_count > 0. Not sure this helps for this benchmark though.
[9 May 2013 2:59]
Inaam Rana
Mark, Yes, I think the change you mentioned is correct but I am not sure if it can help in any big way. This gets called once per batch (or a chunk of it). Still it won't hurt as well. In fact, I think it will be better to have a boolean parameter which tells whether or not to wait if there is a currently running batch. In the above case we can pass it as false.
[9 May 2013 14:14]
Inaam Rana
Assigning to Vasil to look into buf_dblwr_flush_buffered_writes() changes. The core issue reported i.e., buf_flush_LRU_tail() not working properly has already been taken care of.
[27 Sep 2013 10:15]
Laurynas Biveinis
Re. Mark's problem 1), merely skipping buf_dblwr_flush_buffered_writes() if page_count == 0 will be only a partial fix in the presence of compressed pages for LRU flushes, because this value will originate from buf_do_LRU_batch(): if (buf_LRU_evict_from_unzip_LRU(buf_pool)) { count += buf_free_from_unzip_LRU_list_batch(buf_pool, max); } if (max > count) { count += buf_flush_LRU_list_batch(buf_pool, max - count); } Notice a mixup of evicted and flushed pages. In the case we will satisfy this LRU flush batch purely by unzip_LRU eviction and no flushing, we will still get page_count > 0 in buf_flush_common(). We are exploring a fix here and elsewhere of replacing single value counters with two-field structs for flushed and evicted pages.
[23 Jan 2014 9:31]
Laurynas Biveinis
Related bug 71411.