Bug #80788 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery
Submitted: 18 Mar 2016 5:30 Modified: 5 Jul 2017 6:38
Reporter: zhai weixiang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[18 Mar 2016 5:30] zhai weixiang
Description:
While testing MySQL 5.7 with heavy workload, I crashed the server by killing command , and found that innodb need to scan log file twice which is different from the behavior of 5.6.

Bellow is the log quoted from error-log file:

2016-03-18T02:36:42.462234Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 117669177068
2016-03-18T02:36:42.549599Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 117674419712
...
...
2016-03-18T02:36:56.333895Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 118602409472
2016-03-18T02:36:56.398020Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 118607652352
2016-03-18T02:36:56.460787Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 118612895232
2016-03-18T02:36:57.245597Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 117669373440
2016-03-18T02:36:57.377508Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 117674616320
...
...
2016-03-18T02:37:21.504109Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 118628820480
2016-03-18T02:37:21.642318Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 118634063360
2016-03-18T02:37:21.782366Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 118639306240
2016-03-18T02:37:21.883294Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 118643087383
2016-03-18T02:37:21.884623Z 0 [Note] InnoDB: Database was not shutdown normally!
2016-03-18T02:37:21.884654Z 0 [Note] InnoDB: Starting crash recovery.
...
...
2016-03-18T02:37:32.305484Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2016-03-18T02:37:40.517255Z 0 [Note] InnoDB: Apply batch completed

First log scan: 14s
Sec log scan: 24s
Recv data:  19s

The purpose of first log scanning is to find out MLOG_CHECKPOINT and parse the name of all data files that have been changed since latest checkpoint. 

How to repeat:
Run server with very heavy workload, kill it, and observe the error log.

Suggested fix:
If we store the start position of MLOG_CHECKPOINT log group into  file header of ib_logfile0, we can quickly find  MLOG_CHECKPOINT without scanning too many logs.
[18 Mar 2016 5:35] zhai weixiang
A simple patch based on 5.7.11 to test the conjecture,  not be fully tested

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: skip_scan.diff (application/octet-stream, text), 5.10 KiB.

[18 Mar 2016 5:57] zhai weixiang
edit synopsis
[18 Mar 2016 10:09] MySQL Verification Team
Hello Zhai,

Thank you for the report and contribution.

Thanks,
Umesh
[18 Mar 2016 10:10] MySQL Verification Team
-- 5.7.11

[umshastr@hod03]/export/umesh/server/binaries/mysql-5.7.11: cat my.cnf
[mysqld]
basedir=/export/umesh/server/binaries/mysql-5.7.11
datadir=/export/umesh/server/binaries/mysql-5.7.11/data
query_cache_type=0
table_open_cache_instances=32
innodb_buffer_pool_instances=64
innodb_checksum_algorithm=none
max_connections=2048
innodb_file_per_table
innodb_log_file_size=2048M
innodb_log_files_in_group=2
innodb_file_format=barracuda
innodb_buffer_pool_size=16G
innodb_open_files=4000
#innodb_use_native_io=1
innodb_flush_method=O_DIRECT
innodb_log_buffer_size=64M
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=25
innodb_adaptive_flushing=1
innodb_read_io_threads=16
innodb_write_io_threads=16
innodb_purge_threads=1
innodb_io_capacity=1024

rm -rf data
bin/mysqld --defaults-file=./my.cnf --initialize-insecure
bin/mysqld --defaults-file=./my.cnf --core-file --socket=/tmp/mysql_ushastry.sock  --port=15000 --log-error=/export/umesh/server/binaries/mysql-5.7.11/data/log.err 2>&1 &

2016-03-18T09:58:18.300058Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 14239111020
2016-03-18T09:58:18.388329Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14244353536
.
.
2016-03-18T09:58:18.483534Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14249596416
2016-03-18T09:58:18.572863Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14254839296
2016-03-18T09:58:18.667363Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14260082176
2016-03-18T09:58:18.766348Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14265325056
2016-03-18T09:58:18.861063Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14270567936
2016-03-18T09:58:18.954745Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14275810816
2016-03-18T09:58:19.051198Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14281053696
2016-03-18T09:58:19.143245Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14286296576
2016-03-18T09:58:19.249499Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14291539456
2016-03-18T09:58:19.349118Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14296782336
.
.
2016-03-18T09:58:31.775270Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14957385216

2016-03-18T09:58:32.089298Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14242256384
2016-03-18T09:58:32.209687Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14247499264
2016-03-18T09:58:32.325659Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14252742144
2016-03-18T09:58:32.440903Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14257985024
2016-03-18T09:58:32.567340Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14263227904
.
.
2016-03-18T09:58:49.203197Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14986745344
2016-03-18T09:58:49.292611Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 14990714667
2016-03-18T09:58:49.293508Z 0 [Note] InnoDB: Database was not shutdown normally!

2016-03-18T09:58:49.293519Z 0 [Note] InnoDB: Starting crash recovery.
2016-03-18T09:58:50.773146Z 0 [Note] InnoDB: 6 transaction(s) which must be rolled back or cleaned up in total 6 row operations to undo
2016-03-18T09:58:50.773189Z 0 [Note] InnoDB: Trx id counter is 1713152
2016-03-18T09:58:50.773240Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2016-03-18T09:58:54.250289Z 0 [Note] InnoDB: Apply batch completed
[25 Apr 2016 5:32] Marko Mäkelä
Posted by developer:
 
Thank you for the suggestion to write the LSN of the MLOG_CHECKPOINT record in the checkpoint info page.

This would indeed seem to eliminate the need for the extra redo log scan pass.
As far as I can tell, the fix would require us to pass the flush_lsn from log_checkpoint() to log_write_checkpoint_info().
[25 Apr 2016 5:53] Marko Mäkelä
Posted by developer:
 
The primary purpose of the MLOG_CHECKPOINT record is to act as a marker for recovery.

The first portion of the redo log (from the latest checkpoint LSN ot the MLOG_CHECKPOINT marker) may contain "forward references" to space_id for which MLOG_FILE_NAME records were originally written before the checkpoint. Any such MLOG_FILE_NAME records will be duplicated immediately before the MLOG_CHECKPOINT marker, by fil_names_clear().

After recovery has read the matching MLOG_CHECKPOINT for the checkpoint LSN, it knows that any subsequent page redo log records that refer to a given space_id must be preceded by a MLOG_FILE_NAME record for that space_id.

So, the suggested fix would have to be adjusted a little. In log_checkpoint(), we would have to do like this:

	lsn_t	checkpoint_end_lsn = log_sys->lsn;
	if (fil_names_clear(flush_lsn, do_write)) {
	... [snip lots of code]
	log_sys->next_checkpoint_lsn = oldest_lsn;
	log_write_checkpoint_info(checkpoint_end_lsn, sync);

Then, recovery would start the first scan from the saved checkpoint_end_lsn instead of from the checkpoint LSN (oldest_lsn). The checkpoint_end_lsn should point to a number of MLOG_FILE_NAME records terminated by MLOG_CHECKPOINT. No other records should occur there. From this first scan, recovery would have resolved any "forward references" to space_id, and it can start the second scan from the checkpoint LSN.
[2 Mar 2017 13:35] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.7.18, 8.0.1 release, and here's the changelog entry:

Changes to the InnoDB recovery process in MySQL 5.7 could result in up to
three scans of the redo log during recovery. To reduce the number of
scans, the first and second scans are now merged. With this change, there
is only one scan unless the redo log record hash table that is populated
by the scan reaches its memory threshold, in which case, a second scan is
initiated that performs a simultaneous scan and apply.
[20 Apr 2017 13:35] Hugo Dubois
I doesn't see any mention of this bug fix in the 5.7.18 release Notes. Is this bug fixed in the 5.7.18 ?
[18 May 2017 9:02] zhai weixiang
Is this bug really fixed ?  No changelog entry in release note...
[5 Jul 2017 6:22] MySQL Verification Team
Imho Bug #80788 is fixed in 5.7.19(and 8.0.1) which is not yet released and hence no change log entry details as of now here https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-19.html

I see 8.0.1 change log has this details https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-1.html
[5 Jul 2017 6:34] zhai weixiang
Reopen the bug as I can't find the commit log on github and no changelog entry appears on release Note ...
[5 Jul 2017 6:38] zhai weixiang
Thank you, Umesh !

I found the commit log on github. And currently this bug is only fixed in 8.0.1

commit d15e4c262ab5ec3f9e679992d0feceacabe510e3
Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com>
Date:   Thu Mar 2 11:56:15 2017 +0530

    Bug #22963951       REDUCE THE TIME OF LOOKING FOR
                    MLOG_CHECKPOINT DURING CRASH RECOVERY
    Analysis:
    =========
    In mysql-5.7, there can be maximum three scans are possible.
    Simultaneous scan and apply is possible only in the third scan.

    First scan: Scan all the redo logs from checkpoint lsn and process
    only MLOG_FILE_* records during first scan. It scans till the
    last MLOG_CHECKPOINT.

    Second scan: Scan all redo logs from checkpoint lsn and add log
    records to hash table. It verifies whether space id is having
    corresponding MLOG_FILE_NAME record. If the hash table heap memory
    is reached the threshold then stop adding records to hash but it
    continues to scan till end of the redo log file.

    Third scan: Scan all redo logs from checkpoint lsn and add log records
    to hash table only if the tablespace exists. If the heap memory reached
    the threshold then simultaneous scan and apply will happen.

    Note: Third scan only happens if hash table memory reaches the
    threshold during second scan.

    Fix:
    ====

    Merge first scan and second scan and make it a single scan. Single scan
    if the hash table doesn't reach the threshold limit.
    In worst case scenario, it does two scan.

    Reviewed-by: Debarun Banerjee <debarun.banerjee@oracle.com>
    RB: 14725

commit 25695fc6c27b0b9740ba8ad2ca54bf73996d80a7
Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com>
Date:   Thu Mar 2 15:31:34 2017 +0530

    Bug #22963951       REDUCE THE TIME OF LOOKING FOR
        MLOG_CHECKPOINT DURING CRASH RECOVERY
    - Post-push fix