Bug #70899 unnecessary buf_flush_list() during recovery
Submitted: 13 Nov 2013 16:39 Modified: 7 May 2014 19:24
Reporter: Inaam Rana (OCA) Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.6.14 OS:Any
Assigned to: CPU Architecture:Any
Triage: Needs Triage: D3 (Medium)

[13 Nov 2013 16:39] Inaam Rana
In InnoDB startup code innobase_start_or_create_for_mysql() after creating rsegs we flush the entire buffer pool. The intent is to force trx_sys page to the disk. We can reach this code path after doing recovery. In this case we can potentially have millions of dirty pages in the buffer pool. This can seriously increase the recovery time.

2554         srv_available_undo_logs = trx_sys_create_rsegs(
2555                 srv_undo_tablespaces, srv_undo_logs);
2557         if (srv_available_undo_logs == ULINT_UNDEFINED) {
2558                 /* Can only happen if force recovery is set. */
2559                 ut_a(srv_force_recovery >= SRV_FORCE_NO_TRX_UNDO
2560                      || srv_read_only_mode);
2561                 srv_undo_logs = ULONG_UNDEFINED;
2562         }
2564         /* Flush the changes made to TRX_SYS_PAGE by trx_sys_create_rsegs()*/
2565         if (!srv_force_recovery && !srv_read_only_mode) {
2566                 bool success = buf_flush_list(ULINT_MAX, LSN_MAX, NULL);
2567                 ut_a(success);
2568                 buf_flush_wait_batch_end(NULL, BUF_FLUSH_LIST);
2569         }

How to repeat:
Crash the server with large enough buffer pool to have millions of dirty pages. Allow recovery to proceed. Try to connect to the server after redo scan and redo apply phase are finished. The server will be inaccessible and a lot of flushing will be happening.

In my case I had a buffer pool size of 64G, redo logs 4G and around 2.3 million dirty pages when I crashed the server. During recovery:

redo scan: nearly 2 minutes
redo apply: nearly 2 minutes
server stuck in flushing: nearly 11 minutes

and the stack during flushing looked like:

(gdb) where
#0  0x00000000009cde99 in ut_fold_binary (page=<value optimized out>) at /home/irana/workspace/mysql-5.6/storage/innobase/include/ut0rnd.ic:208
#1  buf_calc_page_new_checksum (page=<value optimized out>) at /home/irana/workspace/mysql-5.6/storage/innobase/buf/buf0checksum.cc:99
#2  0x00000000009cf4d3 in buf_flush_init_for_writing (page=0x7f1254794000 "\377r<\006", page_zip_=0x0, newest_lsn=382565654961)
    at /home/irana/workspace/mysql-5.6/storage/innobase/buf/buf0flu.cc:790
#3  0x00000000009d0451 in buf_flush_write_block_low (buf_pool=0x242f798, bpage=0x7f122ca6d1d0, flush_type=BUF_FLUSH_LIST, sync=false)
    at /home/irana/workspace/mysql-5.6/storage/innobase/buf/buf0flu.cc:908
#4  buf_flush_page (buf_pool=0x242f798, bpage=0x7f122ca6d1d0, flush_type=BUF_FLUSH_LIST, sync=false)
    at /home/irana/workspace/mysql-5.6/storage/innobase/buf/buf0flu.cc:1060
#5  0x00000000009d10b6 in buf_flush_try_neighbors (bpage=<value optimized out>, flush_type=BUF_FLUSH_LIST, n_to_flush=18446744073709551614, count=0x7fff9ea9ba50)
    at /home/irana/workspace/mysql-5.6/storage/innobase/buf/buf0flu.cc:1271
#6  buf_flush_page_and_try_neighbors (bpage=<value optimized out>, flush_type=BUF_FLUSH_LIST, n_to_flush=18446744073709551614, count=0x7fff9ea9ba50)
    at /home/irana/workspace/mysql-5.6/storage/innobase/buf/buf0flu.cc:1349
#7  0x00000000009d15be in buf_do_flush_list_batch (buf_pool=0x242f798, flush_type=<value optimized out>, min_n=18446744073709551614, lsn_limit=18446744073709551615)
    at /home/irana/workspace/mysql-5.6/storage/innobase/buf/buf0flu.cc:1596
#8  buf_flush_batch (buf_pool=0x242f798, flush_type=<value optimized out>, min_n=18446744073709551614, lsn_limit=18446744073709551615)
    at /home/irana/workspace/mysql-5.6/storage/innobase/buf/buf0flu.cc:1660
#9  0x00000000009d1f1e in buf_flush_list (min_n=18446744073709551614, lsn_limit=18446744073709551615, n_processed=0x0)
    at /home/irana/workspace/mysql-5.6/storage/innobase/buf/buf0flu.cc:1897
#10 0x000000000097debd in innobase_start_or_create_for_mysql () at /home/irana/workspace/mysql-5.6/storage/innobase/srv/srv0start.cc:2566
#11 0x00000000008d6a92 in innobase_init (p=<value optimized out>) at /home/irana/workspace/mysql-5.6/storage/innobase/handler/ha_innodb.cc:3375
#12 0x000000000054fca8 in ha_initialize_handlerton (plugin=0x23ca680) at /home/irana/workspace/mysql-5.6/sql/handler.cc:663
#13 0x00000000006b12eb in plugin_initialize (plugin=0x23ca680) at /home/irana/workspace/mysql-5.6/sql/sql_plugin.cc:1126
#14 0x00000000006b5636 in plugin_init (argc=0x126ecec, argv=0x236d718, flags=<value optimized out>) at /home/irana/workspace/mysql-5.6/sql/sql_plugin.cc:1417
#15 0x00000000005480d8 in init_server_components () at /home/irana/workspace/mysql-5.6/sql/mysqld.cc:4849
#16 0x000000000054a86f in mysqld_main (argc=47, argv=0x236d718) at /home/irana/workspace/mysql-5.6/sql/mysqld.cc:5449
#17 0x00007f1cc1219994 in __libc_start_main () from /lib64/libc.so.6
#18 0x0000000000540ce9 in _start ()

Suggested fix:
I think the fix is trivial. Add check for recv_needed_recovery to the if condition.

Also probably a good idea to move this whole buf_flush_list() logic inside trx_sys_create_rsegs(). Makes for a more error proof code.
[13 Nov 2013 17:49] Inaam Rana
I should clarify that I believe recv_needed_recovery check is safe because in trx_sys_create_rsegs() we create rsegs iff recv_needed_recovery is not set.
[14 Nov 2013 18:10] Sveta Smirnova
Thank you for the report.

Verified as described using code analysis.
[7 May 2014 19:24] Daniel Price
Fixed as of 5.6.20, 5.7.5 and here's the changelog entry:

The fix for Bug#16418661 added superfluous "buf_flush_list()" logic to
"InnoDB" startup code. 

Thank you for the bug report.
[6 Aug 2014 17:14] Laurynas Biveinis
$ bzr log -n0 -r 5922
revno: 5922
committer: Satya Bodapati <satya.bodapati@oracle.com>
branch nick: mysql-5.6
timestamp: Wed 2014-05-07 19:36:11 +0530
                 DURING RECOVERY
  Why and when the extra buf_flush_list() was added?
  The extra buf_flush_list() was added as part of rb#2293.
  The testcase for rb#2293 failed randomly reporting inconsistent number
  of undo tablespaces. MTR didn't provide a way to use custom
  innodb_data_file_path, innodb_undo_tablespaces. See Bug#17059436 So
  the workaround is to use "--exec $MYSQLD --args my.cnf" on a empty data
  directory. This will create  necessary files and aborts after that.
  The extra flush made sure that correct the number of undo tablespaces
  reported. This is not correct because the changes are already covered by redo.
  So the the ideal solution would be have been to run "--exec $MYSQLD on
  the datadir" again to apply redo or allow the bootstrapping to finish
  properly and exit.
  Remove uncessary flush. Fix the inconsistent number of undo tablespaces
  issue by passing '--skip-grant-tables --innodb-unkown-parameter' to $MYSQLD.
  This would allow the bootstrapping to finish and properly exit.
  Approved by Marko, Kevin. rb#5234, rb#5320