MySQL Bugs: #40099: Provide facilities for preparing the server for faster shutdown.

Bug #40099	Provide facilities for preparing the server for faster shutdown.
Submitted:	17 Oct 2008 7:41	Modified:	18 Oct 2008 4:42
Reporter:	Simon Mudd (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: General	Severity:	S4 (Feature request)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When using MySQL in production environments it is helpful to minimise server downtimes. Currently MySQL may take a long time to shutdown depending on different configured options, which are usually related to dirty buffers which haven't yet been flushed to disk.

In order to prepare a server for fast shutdown and startup it would be useful to be able to force a flush of unwritten buffers to disk but in a way which is not as obtrusive as the FLUSH TABLES command (which can temporarily lock the server) so that once the process has completed you can run a normal mysqladmin shutdown command and expect it to run quickly. Alternatively this could be added as an option to the shutdown command itself.

This feature would be similar also other RDBMS which allow for various shutdown options between "shutdown when everything is idle", to "shutdown right now".

In a professional environment this flexibility is very useful.

I've experienced boxes taking 20-50 minutes to shutdown (and the server being unavailable) on boxes with 8GB of RAM and mainly MyISAM tables. This is far too long. Adding RAM to servers just makes this worse. I would be happy for the box to be "quietly" flushing tables for 1 hour, after which a shutdown will take 1 minute because the "1 minute" is the time that the server is really unavailable.

How to repeat:
1. Start up a server with 8GB of RAM, say using some MyISAM tables.
2. Set delay_key_write = ALL and then fill the key_buffer_cache but loading data into the server until all the RAM (or key_buffer_cache) is full.
3. shutdown the server and annotate the time taken.

To see that FLUSH TABLES is harmful (in this situation):
a. Prior to step 3 run FLUSH TABLES
b. Try to access (read/write) the server before the FLUSH TABLES completes

Suggested fix:

Several ideas:

Provide a FLUSH TABLES GENTLY (or SHUTDOWN PREPARE) command which will write out dirty blocks but not as aggressively as a normal FLUSH TABLES command, perhaps NOT locking the tables while this process is going on.

Fix innodb so that dirty buffers in the Innodb_buffer_pool are also written to disk in a way which is consistent with InnoDB's storage strategy, so that a later shutdown will not have to do this. (Effectively do a "gentle checkpoint".)

Log what is being done.

See also http://bugs.mysql.com/bug.php?id=34137 which appears to be forgotten.

This reminds me of http://forge.mysql.com/worklog/task.php?id=709
but which is different. However, when working on that task (which ended in an unfinished patch which isn't of much value today - server has changed and patch wasn't perfect), I had a discussion in 2004 with a customer who raised points similar to the ones in this bug report. I'm pasting the most relevant suggestions he and his DBA made, below. They used MyISAM with delay_key_write, like in this bug report, and saw long shutdowns the same way.
<quotes>
* in shutdown it's important to do as much as possible while connections are being terminated and new connections are still accepted. Flush buffers during this phase, and then repeat the flush when connections are not accepted anymore: this second flush will normally be instant because they'd be nothing left to flush.
* Also, tables that are Open but not In Use by any current statement should be flushed first and early so that if the server does get "stuck" and has to be killed the table corruption is limited. This would make a big difference to us - it has been one of the main causes of long database recovery times.
* Give more feedback in 'show processlist' about what's happening
to a thread that's been "killed". Specifically some way to indicate
that a thread has *noticed* that it has been asked to terminate and
is now cleaning up. Currently the state is set to "Killed". I suggest
setting the state to "Killing" initially and then the thread being killed
should change the status to "Killed" once it's noticed. Simple but useful.
* It would be nice to be able to have a connection and watch the progress of the shutdown as it's happening. There have been times when I've waited 10 mins or so before killing the mysqld process. If it knew it was progressing I may have waited it out.

Yes, the comments from your other user apply in our case too. Often our only reason to shutdown a server is to upgrade it to a newer version which fixes bugs we have been troubled by.

Many of our servers are behind load balancers but not all of them and those that aren't mean that shutting down affects our users who have to perform special application shutdown actions ahead of this. Hence we want to reduce this time to a minimum.

Extra logging of the shutdown process would also prevents us forcefully killing the server due to simply not knowing what it is doing. This might also help us/MySQL determine exactly what part of the shutdown process is taking longest and at least provide a pointer to which part of the code needs help if we want to reduce this shutdown time.

The logging is easy and harmless and could almost certainly be added to 5.0 and later versions.

The other changes are probably more obtrusive but it would be nice if they weren't only added to 6.0 or later as for production users this functionality is VERY useful and these versions are so far forward that they are beyond our planning horizon...

freshly started mysql server takes 20s to shutdown. Same server running for month takes 20 minutes to shutdown (this means downtime) - even after flush tables!.
Shutdown prepare is much needed!!!