Bug #11590 mysql.server desn't react immidiately to mysqld_safe failed start
Submitted: 27 Jun 2005 12:49 Modified: 22 Sep 2005 22:41
Reporter: Victoria Reznichenko Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1 OS:Linux (linux)
Assigned to: Timothy Smith CPU Architecture:Any

[27 Jun 2005 12:49] Victoria Reznichenko
Description:
If you start MySQL server with mysql.server script and mysqld_safe doesn't start it writes immediately error message to the error log, but mysql.server script still writes dots and then fails:

Starting MySQL................................... failed

How to repeat:
1. put any dummy option to my.cnf file
2. start MySQL server with mysql.server script
[6 Jul 2005 23:51] Jim Winstead
that's because the only way that mysql.server knows that startup has failed is by timing out while waiting for the pid file to be created.
[8 Jul 2005 10:00] MySQL Verification Team
Jim, let me explain.

The problem isn't only in mysql.server script. There is no way to know if mysqld starts successfully or fails because mysqld_safe always exits with status 0. So this should be fixed in mysqld_safe too then the init script can see error and exit.
[13 Jul 2005 16:56] Jim Winstead
I don't understand -- mysql.server says it failed. What else should it do? It can't notice immediately when mysqld_safe exits. It doesn't matter what mysqld_safe returns, because mysql.server can't check its return value.
[22 Sep 2005 22:41] Timothy Smith
Hi.  I'm setting this to "Won't fix", because I agree that it's less-than-desirable behavior, but I can't find a way around it in a shell script.

The crux of the problem is that mysqld_safe must be run as a background job, like:

mysqld_safe --options &

How does the calling script (mysql.server) find out if mysqld_safe returned an error or not?  It can use wait:

mysqld_safe --options &
the_pid=$!

...

wait $the_pid
mysqld_safe_error=$?

test $mysqld_safe_error -eq 0 || echo "error..."

But that will wait for the whole job to complete - that's no good.

I tried several ways of getting around this, but was unable to find a combination that works properly.  Here is my current test program, in case it sparks any ideas.  It seems that there should be some combination of wait, sleep, trap and kill which can get this job done, but I am unable to find it.

#! /bin/sh

mode=$1; test $# -gt 0 && shift

pid_file=PID
test -f $pid_file && rm $pid_file

case $mode in
run-break-short )
    echo "This program is broken!"
    exit 1
    ;;

run-break-long )
    echo "This program will be broken!"
    sleep 10
    exit 1
    ;;

run-ok-short )
    echo "This program is good!"
    ps uwwp $$ > $pid_file
    exit 0
    ;;

run-ok-long )
    echo "This program will be good!"
    sleep 10
    ps uwwp $$ > $pid_file
    exit 0
    ;;

'' )
    my_pid=$$
    echo "Top PID is $$\n"

    wait_timed_out=0

    (
        #
        # Here is where the main program is run
        # Think of this as running mysqld...
        #
        sh $0 run-ok-long &
        main_job_pid=$!

        echo "Shell is for $main_job_pid ..."
        wait $main_job_pid
        main_error_code=$?

        echo "Main job is done, with error code $main_error_code"

        exit $main_error_code
    ) &
    wait_job_pid=$!

    # Idea for "sleep pipeline" taken from:
    # http://www.cit.gu.edu.au/~anthony/info/shell/script.hints
#   sleep 2 | (
#       kill -ALRM $wait_job_pid
#       echo "sent signal to pid $wait_job_pid"
#   ) &
#   kill_job_pid=$!

    echo "Waiting for wait job $wait_job_pid"
    wait $wait_job_pid > /dev/null 2>&1
    sub_error_code=$?

    if [ $wait_timed_out -eq 0 ]; then
        echo "I waited for $main_job_pid, via $wait_job_pid" \
                ", and it returned $sub_error_code"
    else
        echo "I stopped waiting for $main_job_pid, via $wait_job_pid" \
                " (I guess $sub_error_code is useless)"
    fi
    ;;

* )
    echo "Usage: $0 <see source>"
    exit 1
    ;;
esac

Regards,

Timothy