MySQL Bugs: #71822: Please make InnoDB recovery process more verbose when scanning InnoDB files

Bug #71822	Please make InnoDB recovery process more verbose when scanning InnoDB files
Submitted:	24 Feb 2014 20:56	Modified:	25 Feb 2014 12:33
Reporter:	Simon Mudd (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S4 (Feature request)
Version:	5.6.15	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	file_per_table, innodb, recovery

Description:
Logging on startup after a failover took a long time without saying anything.

140224 18:05:02 mysqld_safe Starting mysqld daemon with databases from /path/to/datadir
2014-02-24 18:05:03 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2014-02-24 18:05:03 27734 [Note] Plugin 'FEDERATED' is disabled.
2014-02-24 18:05:03 27734 [Note] InnoDB: The InnoDB memory heap is disabled
2014-02-24 18:05:03 27734 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2014-02-24 18:05:03 27734 [Note] InnoDB: Compressed tables use zlib 1.2.3
2014-02-24 18:05:03 27734 [Note] InnoDB: Using Linux native AIO
2014-02-24 18:05:03 27734 [Note] InnoDB: Using CPU crc32 instructions
2014-02-24 18:05:03 27734 [Note] InnoDB: Initializing buffer pool, size = 120.0G
2014-02-24 18:05:10 27734 [Note] InnoDB: Completed initialization of buffer pool
2014-02-24 18:05:11 27734 [Note] InnoDB: Highest supported file format is Barracuda.
2014-02-24 18:05:12 27734 [Note] InnoDB: The log sequence numbers 42094814939579 and 42094814939579 in ibdata files do not match the log sequence number 52468941157960 in the ib_logfiles!
2014-02-24 18:05:12 27734 [Note] InnoDB: Database was not shutdown normally!
2014-02-24 18:05:12 27734 [Note] InnoDB: Starting crash recovery.
2014-02-24 18:05:12 27734 [Note] InnoDB: Reading tablespace information from the .ibd files...   <<=======
2014-02-24 19:10:07 27734 [Note] InnoDB: Restoring possible half-written data pages   <<=======
2014-02-24 19:10:07 27734 [Note] InnoDB: from the doublewrite buffer...
2014-02-24 19:10:57 27734 [Note] InnoDB: 128 rollback segment(s) are active.
2014-02-24 19:10:57 27734 [Note] InnoDB: Waiting for purge to start
2014-02-24 19:10:57 27734 [Note] InnoDB: 5.6.15 started; log sequence number 52468941157960
2014-02-24 19:10:57 7f6da358e700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
2014-02-24 19:10:57 27734 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
2014-02-24 19:10:57 27734 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
2014-02-24 19:10:57 27734 [Note] Server socket created on IP: '0.0.0.0'.
2014-02-24 19:10:58 27734 [Note] Event Scheduler: Loaded 0 events
2014-02-24 19:10:58 27734 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.15-log'  socket: 'mysql.sock'  port: 3306  MySQL Community Server (GPL)
2014-02-24 19:10:58 27734 [Note] Event Scheduler: scheduler thread started with id 1
2014-02-24 19:29:07 7f6da358e700 InnoDB: Buffer pool(s) load completed at 140224 19:29:07

Notice the 1 hour 5 minute delay while the recovery is taking place that's completely silent.

When I first saw this I thought mysqld was "stuck" and killed it only to try again. It was only afterwards I realised it was actually doing something. Note: CPU and I/O was not 100% so it's not clear to me exactly what mysqld was doing. The server has several cores yet this process seems to be single threaded.

How to repeat:
Note: this server had a large number of databases and tables:

[root@myserver ~]# cd $datadir
[root@myserver data]# find . -type d | wc -l
10779
[root@ data]# find . -name "*.ibd" | wc -l
167647
[root@ data]# 

The total instance size is 33 TB.

The issue was caused by a storage problem which required mysqld to be killed and the filesystem remounted.  While this is not MySQL's fault under other recovery scenarios recovery is quite verbose, so it's clear that something is happening. In this case there was no visibility to the process.

Suggested fix:

(1) Make the logging more verbose. I suggest that if nothing has been logged in 10 seconds then indicate the current state of progress, and if known how much more work needs to be done.

(2) If this process can be made to run faster that would also be good as 1 hour and 5 minutes to scan the InnoDB tables seems like a long period. Had the server been busy with large transactions that would have invoked an even longer recovery time. In this case that did not happen.

As databases and storage use grows it's likely that the number of files or databases may increase too.

Changed to feature request. In theory just a logging change but please consider doing this as it's most reassuring to see that things are actually working.

For the record these are the relevant configuration settings I was using:

innodb_adaptive_flushing = 1
innodb_adaptive_hash_index = 0
innodb_autoinc_lock_mode = 0
innodb_buffer_pool_dump_at_shutdown = 1
innodb_buffer_pool_instances = 60
innodb_buffer_pool_load_at_startup = 1
innodb_buffer_pool_size = 120G
innodb_change_buffering = INSERTS
innodb_file_format = Barracuda
innodb_file_per_table
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
innodb_io_capacity = 4000
innodb_lock_wait_timeout = 50
innodb_log_buffer_size = 32M
innodb_log_files_in_group = 2
innodb_log_file_size = 16G
innodb_max_dirty_pages_pct = 50
innodb_open_files = 9000
innodb_purge_batch_size = 20
innodb_purge_threads = 1
innodb_stats_on_metadata = 0
innodb_support_xa = 0
innodb_thread_concurrency = 999

We have some good news in 5.7:

http://dev.mysql.com/doc/refman/5.7/en/innodb-recovery-tablespace-discovery.html