Bug #25341 /etc/init.d/mysql stop may timeout too quickly
Submitted: 31 Dec 2006 0:08 Modified: 15 Feb 2007 6:10
Reporter: Mark Callaghan
Status: Closed
Category:Client Severity:S2 (Serious)
Version:5.0.34-BK, 5.0.30 OS:Linux (Linux)
Assigned to: Bugs System Target Version:
Tags: shutdown, timeout, innodb

[31 Dec 2006 0:08] Mark Callaghan
Description:
support_files/mysql.server.sh is the source of /etc/init.d/mysql. It contains a function,
wait_for_pid, that waits for the server's PID file to appear or be removed. The wait is
limited to 35 seconds and an error message is displayed when the timeout is reached before
the PID file has been removed/created. This is bad for a couple of reasons.

First, the timeout is arbitrary and more likely to be reached on modern servers with large
InnoDB buffer caches. InnoDB tries to flush dirty pages to disk on shutdown and can easily
take more than 35 seconds when the buffer cache is 10+ GB and the IO system does not
support thousands of IOs per second.

Second, this can cause confusing errors when /etc/init.d/mysql restart is run. When
restart is used and the timeout is reached on stop, the script will then try to start the
server even though one is currently running and in the process of shutting down.

I have seen error messages in the log where a server gets confused by the contents of the
InnoDB log when running crash recovery at startup, and I attribute those errors to this
problem.

How to repeat:
Run a server with a small number of disks and a large InnoDB buffer cache, make most of
the pages in the buffer cache dirty, run '/etc/init.d/mysql restart'

Suggested fix:
Remove the timeout from wait_for_pid when a server is to be stopped.

Make restart fail and not try to start a server when the timeout is reached on stop.
[1 Jan 2007 19:28] Valeriy Kravchuk
Thank you for a bug report. Verified just as described with 5.0.34-BK on Linux. One just
have to review the code:

wait_for_pid () {
  i=0
  while test $i -lt 35 ; do
    sleep 1
    case "$1" in
      'created')
        test -s $pid_file && i='' && break
        ;;
      'removed')
        test ! -s $pid_file && i='' && break
        ;;
      *)
        echo "wait_for_pid () usage: wait_for_pid created|removed"
        exit 1
        ;;
    esac
    echo $echo_n ".$echo_c"
    i=`expr $i + 1`
  done
...
[2 Jan 2007 16:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17526

ChangeSet@1.2361, 2007-01-02 10:06:12-05:00, cmiller@zippy.cornsilk.net +2 -0
  Bug#25341:  "init.d/mysql stop" may timeout too quickly
  
  Thirty five seconds is entirely too short of a period to wait for a server 
  to exit.  Instead, make a valliant effort to make sure it exits, and only
  give up after a very long period (arbitrarily chosen as 15 minutes).
  
  In addition, if we're being asked to restart the server, then don't try
  to start again if trying to stop the server failed.
  ---
  Return zero by default, when the script exits.
  ---
  Set return-/exit-value based on whether we successfully dealt with the 
  PID-file.
[31 Jan 2007 21:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19124

ChangeSet@1.2361, 2007-01-31 15:39:41-05:00, cmiller@zippy.cornsilk.net +2 -0
  Bug#25341:  "init.d/mysql stop" may timeout too quickly
  
  Thirty five seconds is entirely too short of a period to wait for a server 
  to exit.  Instead, make a valliant effort to make sure it exits, and only
  give up after a very long period (arbitrarily chosen as 15 minutes).
  
  In addition, if we're being asked to restart the server, then don't try
  to start again if trying to stop the server failed.
  ---
  Return zero by default, when the script exits.
  ---
  Set return-/exit-value based on whether we successfully dealt with the 
  PID-file.
  ---
  Don't wait that long if the program we're waiting on exits.  It 
  should only exit if the server is not going to be started.
[14 Feb 2007 16:10] Chad MILLER
Available in 5.0.36 and 5.1.16-beta.
[15 Feb 2007 6:10] Paul DuBois
Noted in 5.0.36, 5.1.16 changelogs.