Bug #25341 /etc/init.d/mysql stop may timeout too quickly
Submitted: 30 Dec 2006 23:08 Modified: 15 Feb 2007 5:10
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S2 (Serious)
Version:5.0.34-BK, 5.0.30 OS:Linux (Linux)
Assigned to: Chad MILLER CPU Architecture:Any
Tags: innodb, shutdown, timeout

[30 Dec 2006 23:08] Mark Callaghan
Description:
support_files/mysql.server.sh is the source of /etc/init.d/mysql. It contains a function, wait_for_pid, that waits for the server's PID file to appear or be removed. The wait is limited to 35 seconds and an error message is displayed when the timeout is reached before the PID file has been removed/created. This is bad for a couple of reasons.

First, the timeout is arbitrary and more likely to be reached on modern servers with large InnoDB buffer caches. InnoDB tries to flush dirty pages to disk on shutdown and can easily take more than 35 seconds when the buffer cache is 10+ GB and the IO system does not support thousands of IOs per second.

Second, this can cause confusing errors when /etc/init.d/mysql restart is run. When restart is used and the timeout is reached on stop, the script will then try to start the server even though one is currently running and in the process of shutting down.

I have seen error messages in the log where a server gets confused by the contents of the InnoDB log when running crash recovery at startup, and I attribute those errors to this problem.

How to repeat:
Run a server with a small number of disks and a large InnoDB buffer cache, make most of the pages in the buffer cache dirty, run '/etc/init.d/mysql restart'

Suggested fix:
Remove the timeout from wait_for_pid when a server is to be stopped.

Make restart fail and not try to start a server when the timeout is reached on stop.
[1 Jan 2007 18:28] Valeriy Kravchuk
Thank you for a bug report. Verified just as described with 5.0.34-BK on Linux. One just have to review the code:

wait_for_pid () {
  i=0
  while test $i -lt 35 ; do
    sleep 1
    case "$1" in
      'created')
        test -s $pid_file && i='' && break
        ;;
      'removed')
        test ! -s $pid_file && i='' && break
        ;;
      *)
        echo "wait_for_pid () usage: wait_for_pid created|removed"
        exit 1
        ;;
    esac
    echo $echo_n ".$echo_c"
    i=`expr $i + 1`
  done
...
[2 Jan 2007 15:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17526

ChangeSet@1.2361, 2007-01-02 10:06:12-05:00, cmiller@zippy.cornsilk.net +2 -0
  Bug#25341:  "init.d/mysql stop" may timeout too quickly
  
  Thirty five seconds is entirely too short of a period to wait for a server 
  to exit.  Instead, make a valliant effort to make sure it exits, and only
  give up after a very long period (arbitrarily chosen as 15 minutes).
  
  In addition, if we're being asked to restart the server, then don't try
  to start again if trying to stop the server failed.
  ---
  Return zero by default, when the script exits.
  ---
  Set return-/exit-value based on whether we successfully dealt with the 
  PID-file.
[31 Jan 2007 20:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19124

ChangeSet@1.2361, 2007-01-31 15:39:41-05:00, cmiller@zippy.cornsilk.net +2 -0
  Bug#25341:  "init.d/mysql stop" may timeout too quickly
  
  Thirty five seconds is entirely too short of a period to wait for a server 
  to exit.  Instead, make a valliant effort to make sure it exits, and only
  give up after a very long period (arbitrarily chosen as 15 minutes).
  
  In addition, if we're being asked to restart the server, then don't try
  to start again if trying to stop the server failed.
  ---
  Return zero by default, when the script exits.
  ---
  Set return-/exit-value based on whether we successfully dealt with the 
  PID-file.
  ---
  Don't wait that long if the program we're waiting on exits.  It 
  should only exit if the server is not going to be started.
[14 Feb 2007 15:10] Chad MILLER
Available in 5.0.36 and 5.1.16-beta.
[15 Feb 2007 5:10] Paul DuBois
Noted in 5.0.36, 5.1.16 changelogs.