Bug #31785 mysql.server stop does not spot server gone away
Submitted: 23 Oct 2007 13:22 Modified: 1 Feb 19:41
Reporter: Axel Schwenke
Status: Patch approved
Category:Server Severity:S3 (Non-critical)
Version:5.0.52, 5.1.23 OS:Any
Assigned to: Daniel Fischer Target Version:5.0+
Triage: D3 (Medium) / E2 (Low)

[23 Oct 2007 13:22] Axel Schwenke
Description:
When the MySQL server was killed without the pid file being removed then trying to stop
the server with 'mysql.server stop' will wait 900 seconds before giving up.

This regularly bites users using a DRBD/Heartbeat cluster. Usually they test the cluster
by manually killing mysqld_safe und mysqld with SIGKILL. When Heartbeat tries to shutdown
the 'mysql' resource it hangs for a long time and eventually kills the mysql resource
script after a timeout.

How to repeat:
Start MySQL using the mysql.server script:

~ $/usr/local/mysql/current/share/mysql/mysql.server start
Starting MySQL                                                       done

Now manually kill mysqld_safe and mysqld with kill -9 and then try to stop the MySQL
server:

~ $/usr/local/mysql/current/share/mysql/mysql.server status
MySQL running (12709)                                                done
~ $kill -9 12680
~ $kill -9 12709
~ $/usr/local/mysql/current/share/mysql/mysql.server stop
Shutting down MySQL/usr/local/mysql/current/share/mysql/mysql.server: line 336: kill:
(12709) - No such process
..........
(interrupted with CTRL+C)

BTW, the 'status' method of mysql.server diagnoses right:

~ $/usr/local/mysql/current/share/mysql/mysql.server status
MySQL is not running, but PID file exists                            failed

Suggested fix:
The mysql.server script should first check if a process with the pid from the pid file
exists before it tries to stop it:

--- mysql.orig  2007-10-23 12:30:25.000000000 +0200
+++ mysql       2007-10-23 12:43:58.000000000 +0200
@@ -332,10 +332,17 @@
     if test -s "$pid_file"
     then
       mysqlmanager_pid=`cat $pid_file`
-      echo $echo_n "Shutting down MySQL"
-      kill $mysqlmanager_pid
-      # mysqlmanager should remove the pid_file when it exits, so wait for it.
-      wait_for_pid removed; return_value=$?
+
+      if (kill -0 $mysqlmanager_pid 2>/dev/null)
+      then
+        echo $echo_n "Shutting down MySQL"
+        kill $mysqlmanager_pid
+        # mysqlmanager should remove the pid_file when it exits, so wait for it.
+        wait_for_pid removed; return_value=$?
+      else
+        log_failure_msg "MySQL manager or server process #$mysqlmanager_pid is not
running!"
+        rm $pid_file
+      fi
 
       # delete lock for RedHat / SuSE
       if test -f $lock_dir

after applying this patch:

~ $/usr/local/mysql/current/share/mysql/mysql.server stop
MySQL manager or server process #12709 is not running!               failed
[1 Feb 18:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41576

ChangeSet@1.2574, 2008-02-01 18:39:50+01:00, df@pippilotta.erinye.com +1 -0
  BUG#31785 mysql.server stop does not spot server gone away