Bug #11122 Server won't always start when cold booting after a crash
Submitted: 7 Jun 2005 2:22 Modified: 4 Jul 6:47
Reporter: David Zafman
Status: Verified
Category:Server Severity:S2 (Serious)
Version:4.1.9, 5.0.60 OS:Linux (Linux FC3)
Assigned to: Target Version:

[7 Jun 2005 2:22] David Zafman
Description:

Normal system services put there pid files in /var/run or a directory under /var/run.  The
/etc/rc.d/rc.sysinit script removes all pid files on a cold boot of a system.  But mysql
doesn't put its pid file there, nor does it provide a mechanism to remove it on a cold
boot as far as I can tell.  So, it is possible for the database to not start-up during a
machine cold boot.  The following message was seen in my /var/log/mysql/mysqld.err
immediately following a system crash:

A mysqld process already exists at  Sun Jun 5 22:10:32 PDT 2005

The problem is that if the pid of the "grep mysqld" happens to match the pid of the mysql
that was running before the crash the following code will believe that mysql is already
running.

#
# If there exists an old pid file, check if the daemon is already running
# Note: The switches to 'ps' may depend on your operating system
if test -f $pid_file
then
  PID=`cat $pid_file`
  if /bin/kill -0 $PID > /dev/null 2> /dev/null
  then
    if /bin/ps p $PID | grep mysqld > /dev/null
    then    # The pid contains a mysqld process
      echo "A mysqld process already exists"
      echo "A mysqld process already exists at " `date` >> $err_log
      exit 1
    fi
  fi

How to repeat:

Crash your machine in a loop until the database doesn't come up.  I'm assuming that mysql
service is enabled for the default run-level.

I was doing "ssh root@machine reboot -f -n" to simulate a crash.  In between check if the
database has started or not.

Suggested fix:

A kludgy fix would be to make sure that "grep" is not part of the command string, so that
the grep doesn't find itself.  I really think that all mysql pid files should be in
/var/run, so that normal system mechanism can properly clean them on cold boot.

--- /usr/bin/mysqld_safe        2005-01-12 19:35:05.000000000 -0800
+++ /usr/bin/mysqld_safe.new    2005-06-06 17:17:50.092664000 -0700
@@ -261,7 +261,7 @@
   PID=`cat $pid_file`
   if /bin/kill -0 $PID > /dev/null 2> /dev/null
   then
-    if /bin/ps p $PID | grep mysqld > /dev/null
+    if /bin/ps p $PID | grep mysqld | grep -v grep > /dev/null
     then    # The pid contains a mysqld process
       echo "A mysqld process already exists"
       echo "A mysqld process already exists at " `date` >> $err_log
[8 Jun 2005 7:29] Jorge del Conde
Checked the 4.1bk code.  Patch makes sense
[9 Jun 2005 3:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/25792
[10 Jun 2005 6:57] Jim Winstead
Fixed in 4.1.13 and 5.0.8.
[28 Jun 2005 16:29] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented in 4.1.13 and 5.0.8 change history; closed.
[4 Jul 6:46] Sean Pringle
Hi

This overall problem is not quite fixed it seems:

If mysqld_safe itself (instead of the grep) gets the same PID as the content of a stale
mysqld PID file, it aborts in the same place with the same "A mysqld process already
exists" error.

Current code:

#
# If there exists an old pid file, check if the daemon is already running
# Note: The switches to 'ps' may depend on your operating system
if test -f $pid_file
then
  PID=`cat $pid_file`
  if /bin/kill -0 $PID > /dev/null 2> /dev/null
  then
    if /bin/ps p $PID | grep -v grep | grep $MYSQLD > /dev/null
    then    # The pid contains a mysqld process
      echo "A mysqld process already exists"
      echo "A mysqld process already exists at " `date` >> $err_log
      exit 1
    fi
  fi
  rm -f $pid_file
...

Above, $MYSQLD can default to just 'mysqld'.  An additional 'grep -v mysqld_safe' fixes
the problem as before.

This is hard to reproduce, but can be simulated by adding the following line to
mysqld_safe just before the above code block:

echo -n $$ >$pid_file  # simulate stale pid file

Presumably mysqld_safe getting the same PID as the previous crashed mysqld is unlikely,
but if the system reboots and MySQL is configured to start at boot time, the general PID
range will be similar each time, increasing the chances.