Bug #11122 Server won't always start when cold booting after a crash
Submitted: 7 Jun 2005 0:22 Modified: 17 Oct 2008 17:04
Reporter: David Zafman Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.9, 5.0.60 OS:Linux (Linux FC3)
Assigned to: Chad MILLER
Triage: Triaged: D3 (Medium)

[7 Jun 2005 0:22] David Zafman
Description:

Normal system services put there pid files in /var/run or a directory under /var/run.  The /etc/rc.d/rc.sysinit script removes all pid files on a cold boot of a system.  But mysql doesn't put its pid file there, nor does it provide a mechanism to remove it on a cold boot as far as I can tell.  So, it is possible for the database to not start-up during a machine cold boot.  The following message was seen in my /var/log/mysql/mysqld.err immediately following a system crash:

A mysqld process already exists at  Sun Jun 5 22:10:32 PDT 2005

The problem is that if the pid of the "grep mysqld" happens to match the pid of the mysql that was running before the crash the following code will believe that mysql is already running.

#
# If there exists an old pid file, check if the daemon is already running
# Note: The switches to 'ps' may depend on your operating system
if test -f $pid_file
then
  PID=`cat $pid_file`
  if /bin/kill -0 $PID > /dev/null 2> /dev/null
  then
    if /bin/ps p $PID | grep mysqld > /dev/null
    then    # The pid contains a mysqld process
      echo "A mysqld process already exists"
      echo "A mysqld process already exists at " `date` >> $err_log
      exit 1
    fi
  fi

How to repeat:

Crash your machine in a loop until the database doesn't come up.  I'm assuming that mysql service is enabled for the default run-level.

I was doing "ssh root@machine reboot -f -n" to simulate a crash.  In between check if the database has started or not.

Suggested fix:

A kludgy fix would be to make sure that "grep" is not part of the command string, so that the grep doesn't find itself.  I really think that all mysql pid files should be in /var/run, so that normal system mechanism can properly clean them on cold boot.

--- /usr/bin/mysqld_safe        2005-01-12 19:35:05.000000000 -0800
+++ /usr/bin/mysqld_safe.new    2005-06-06 17:17:50.092664000 -0700
@@ -261,7 +261,7 @@
   PID=`cat $pid_file`
   if /bin/kill -0 $PID > /dev/null 2> /dev/null
   then
-    if /bin/ps p $PID | grep mysqld > /dev/null
+    if /bin/ps p $PID | grep mysqld | grep -v grep > /dev/null
     then    # The pid contains a mysqld process
       echo "A mysqld process already exists"
       echo "A mysqld process already exists at " `date` >> $err_log
[8 Jun 2005 5:29] Jorge del Conde
Checked the 4.1bk code.  Patch makes sense
[9 Jun 2005 1:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/25792
[10 Jun 2005 4:57] Jim Winstead
Fixed in 4.1.13 and 5.0.8.
[28 Jun 2005 14:29] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented in 4.1.13 and 5.0.8 change history; closed.
[4 Jul 2008 4:46] Sean Pringle
Hi

This overall problem is not quite fixed it seems:

If mysqld_safe itself (instead of the grep) gets the same PID as the content of a stale mysqld PID file, it aborts in the same place with the same "A mysqld process already exists" error.

Current code:

#
# If there exists an old pid file, check if the daemon is already running
# Note: The switches to 'ps' may depend on your operating system
if test -f $pid_file
then
  PID=`cat $pid_file`
  if /bin/kill -0 $PID > /dev/null 2> /dev/null
  then
    if /bin/ps p $PID | grep -v grep | grep $MYSQLD > /dev/null
    then    # The pid contains a mysqld process
      echo "A mysqld process already exists"
      echo "A mysqld process already exists at " `date` >> $err_log
      exit 1
    fi
  fi
  rm -f $pid_file
...

Above, $MYSQLD can default to just 'mysqld'.  An additional 'grep -v mysqld_safe' fixes the problem as before.

This is hard to reproduce, but can be simulated by adding the following line to mysqld_safe just before the above code block:

echo -n $$ >$pid_file  # simulate stale pid file

Presumably mysqld_safe getting the same PID as the previous crashed mysqld is unlikely, but if the system reboots and MySQL is configured to start at boot time, the general PID range will be similar each time, increasing the chances.
[5 Aug 2008 15:45] Chad MILLER
The real bug is mysql doesn't put its pid files in a location that is automatically cleaned by the system at boot.  Correcting that may be hard, but we should seriously consider it.
[5 Aug 2008 16:12] Chad MILLER
Gosh, so much will break if the path or any mysqld arguments contain the contiguous letters 'g', 'r', 'e', and 'p'.  :\
[5 Aug 2008 18:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50949

2653 Chad MILLER	2008-08-05
      Bug#11122: Server won't always start when cold-booting after a crash
      
      The grep expression that finds a running "mysqld" program fails if the
      "mysqld_safe" is running with the same PID.
      
      Now, match "msyqld" at the end of a line or before a space only.  This 
      also has the effect of the matcher expression never matching itself, as 
      the metacharacters don't describe themselves.
      
      Additionally, some text to search could be truncated if very long.
[2 Oct 2008 16:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55119

2653 Chad MILLER	2008-10-02
      Bug#11122: Server won't always start when cold-booting after a crash
      
      The grep expression that finds a running "mysqld" program fails if the
      "mysqld_safe" is running with the same PID.
      
      Now, excise "ps" output that has the word " grep" or "mysqld_safe" in 
      it, to be a little more certain that the matched process is not a false 
      positive hit.  This will fail when the path to mysqld contains either
      of those two names, which should be acceptable.
      
      Additionally, some text to search could be truncated if very long.  
      Expand the number of lines "ps" emits.
[9 Oct 2008 17:26] Bugs System
Pushed into 5.0.72  (revid:chad@mysql.com-20081002162552-cw77j2cpzw23qycy) (version source revid:chad@mysql.com-20081006135227-u2s7w953ysaqjhda) (pib:4)
[9 Oct 2008 17:35] Bugs System
Pushed into 5.1.30  (revid:chad@mysql.com-20081002162552-cw77j2cpzw23qycy) (version source revid:mats@sun.com-20081008113713-2vxny72m5w1tywoi) (pib:4)
[14 Oct 2008 18:20] Paul Dubois
Noted in 5.0.72, 5.1.30 changelogs.

Resetting report to NDI pending push into 6.0.x.
[15 Oct 2008 14:54] Paul Dubois
Correction: Noted in 5.1.29 changelog, not 5.1.30.
[17 Oct 2008 16:46] Bugs System
Pushed into 6.0.8-alpha  (revid:chad@mysql.com-20081002162552-cw77j2cpzw23qycy) (version source revid:chad@mysql.com-20081006135653-hlwefkm6dvvqm3z6) (pib:5)
[17 Oct 2008 17:04] Paul Dubois
Noted in 6.0.8 changelog.
[28 Oct 2008 21:05] Bugs System
Pushed into 5.1.29-ndb-6.2.17  (revid:chad@mysql.com-20081002162552-cw77j2cpzw23qycy) (version source revid:tomas.ulin@sun.com-20081028140209-u4emkk1xphi5tkfb) (pib:5)
[28 Oct 2008 22:24] Bugs System
Pushed into 5.1.29-ndb-6.3.19  (revid:chad@mysql.com-20081002162552-cw77j2cpzw23qycy) (version source revid:tomas.ulin@sun.com-20081028194045-0353yg8cvd2c7dd1) (pib:5)
[1 Nov 2008 9:48] Bugs System
Pushed into 5.1.29-ndb-6.4.0  (revid:chad@mysql.com-20081002162552-cw77j2cpzw23qycy) (version source revid:jonas@mysql.com-20081101082305-qx5a1bj0z7i8ueys) (pib:5)