| Bug #5001 | mysqld_safe kills all mysqlds when only one dies (under multiple installations) | ||
|---|---|---|---|
| Submitted: | 11 Aug 2004 20:20 | Modified: | 2 Sep 2004 0:19 |
| Reporter: | Jason Sydes | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server | Severity: | S2 (Serious) |
| Version: | 4.0.20 | OS: | Linux (linux) |
| Assigned to: | Bugs System | CPU Architecture: | Any |
[12 Aug 2004 3:39]
Matthew Lord
This is still the case in 4.1.3. Maybe the safest way would be to store the socket file and use fuser?
[12 Aug 2004 9:25]
Brian Aker
fuser is not portable.
[13 Aug 2004 18:53]
Jason Sydes
But doesn't the pid file always appear on the command line? Isn't that the most portable way to ensure that you're killing the right processes? I suppose you could accidentally kill another process, say `less /tmp/mysql6.pid', but that seems a relatively harmless consequence....
[14 Aug 2004 22:47]
Sergei Golubchik
fuser is portable enough, the killing code is protected by @IS_LINUX@ anyway. at least all our linux hosts (and many non-linux ones, *bsd excluded) have fuser.
[17 Aug 2004 23:44]
Patrick Galbraith
I cannot get the failure to occur. I have started 4 servers via mysqld_multi, killed one of the servers yet all the rest of the remaining servers continue to run.
[18 Aug 2004 0:07]
Jason Sydes
Really? What is the "ps" line look like inside your mysqld_safe? Actually, when you kill one of the mysqld processes, does that server restart? Or does it just keep running? When I killed mine, it would restart that entire server, but, also, it would kill all the mysql processes, including those of the other servers. Are you seeing the following line in the error log of the "killed" server: "Number of processes now running: blah" with "blah" replaced by some number (probably around 10 or 20 in your test case, though for me it was generally several hundred). You might see that line multiple times too. That line is spit out by mysqld_safe. If you're not seeing that line, you're not going to see the same behavior I saw. If you are seeing that line, are you seeing that line in the other servers' error logs as well? If so, then that's the bug that I'm reporting. If you are seeing that line, and you're only seeing it the error log of the "killed" server, and you're **not** seeing it in the error logs of the other servers, then I am truly perplexed (I'll explain myself after hearing what you're seeing). When you do a `SHOW STATUS' or `mysqladmin stat' on the "killed" server, does that show you that it had been restarted (ie does the uptime show a low number)? How about for the other servers? Thanks for your attention to this Patrick! Jason
[18 Aug 2004 0:59]
Patrick Galbraith
Jason: when I kill one process of a running server ( 1 out of 3 servers), I see this in its log: 040817 15:47:50 mysqld restarted 040817 15:47:50 Warning: Asked for 196608 thread stack, but got 126976 040817 15:47:50 Can't start server: Bind on TCP/IP port: Address already in use 040817 15:47:50 Do you already have another mysqld server running on port: 3308 ? 040817 15:47:50 Aborting 040817 15:47:50 /usr/local/mysql/bin/mysqld: Shutdown Complete 040817 15:47:50 mysqld ended As far as processes, the other servers don't get touched. The shutdown server remains unstarted, and the others stay running, same exact PIDs before and after. I would suggest starting your servers with mysqld_multi and using mysqld_multi settings in my.cnf. Hope this helps;)
[22 Aug 2004 22:16]
Sergei Golubchik
Patrick, the reported bug is in *mysqld_safe* Thus to repeat it you need to use mysqld_safe as in mysqld_multi --example or with --mysqld=mysqld_safe. And then it's perfectly repeatable
[25 Aug 2004 22:30]
Patrick Galbraith
It works as intended, even with mysqld=/path/to/safe_mysqld (both 4.0.20 and 4.1.3): rsna:/usr/local # /usr/local/mysql/bin/mysqld_multi --user=mysql --mysqld=/usr/local/mysql/bin/mysqld_safe start krsna:/usr/local # ps aux|grep msyql root 6053 0.0 0.1 1764 568 pts/6 R 13:15 0:00 grep msyql krsna:/usr/local # ps aux|grep mysql root 25261 0.0 0.0 2092 4 pts/5 S Aug23 0:00 mysql mysql root 5955 0.2 0.2 2676 1200 pts/6 S 13:15 0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --socket=/tmp/mysql.sock2 --port=3307 --pid-file=/usr/local/mysql/data2/hostname.pid2 --datadir=/usr/local/mysql/data2 root 5958 0.3 0.2 2676 1200 pts/6 S 13:15 0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --socket=/tmp/mysql.sock3 --port=3308 --pid-file=/usr/local/mysql/data3/hostname.pid3 --datadir=/usr/local/mysql/data3 root 5961 0.3 0.2 2676 1200 pts/6 S 13:15 0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --socket=/tmp/mysql.sock4 --port=3309 --pid-file=/usr/local/mysql/data4/hostname.pid4 --datadir=/usr/local/mysql/data4 mysql 6032 1.4 2.0 30304 10548 pts/6 D 13:15 0:00 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data4 --user=mysql --log=/usr/local/mysql/data4/krsna.log --pid-file=/usr/local/mysql/data4/hostname.pid4 --skip-locking --port=3309 --socket=/tmp/mysql.sock4 mysql 6034 1.4 2.0 30304 10548 pts/6 D 13:15 0:00 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data2 --user=mysql --log=/usr/local/mysql/data2/krsna.log --pid-file=/usr/local/mysql/data2/hostname.pid2 --skip-locking --port=3307 --socket=/tmp/mysql.sock2 mysql 6036 1.5 2.0 30304 10548 pts/6 D 13:15 0:00 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data3 --user=mysql --log=/usr/local/mysql/data3/krsna.log --pid-file=/usr/local/mysql/data3/hostname.pid3 --skip-locking --port=3308 --socket=/tmp/mysql.sock3 krsna:/usr/local # kill -9 6036 krsna:/usr/local # mysql -S /tmp/mysql.sock4 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 to server version: 4.0.20-standard-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> \q Bye krsna:/usr/local # mysql -S /tmp/mysql.sock2 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 to server version: 4.0.20-standard-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> \q Bye krsna:/usr/local # mysql -S /tmp/mysql.sock3 ERROR 2002: Can't connect to local MySQL server through socket '/tmp/mysql.sock3' (2) krsna:/usr/local # ps aux|grep mysql root 25261 0.0 0.0 2092 4 pts/5 S Aug23 0:00 mysql mysql root 5955 0.0 0.2 2676 1200 pts/6 S 13:15 0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --socket=/tmp/mysql.sock2 --port=3307 --pid-file=/usr/local/mysql/data2/hostname.pid2 --datadir=/usr/local/mysql/data2 root 5961 0.0 0.2 2676 1200 pts/6 S 13:15 0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --socket=/tmp/mysql.sock4 --port=3309 --pid-file=/usr/local/mysql/data4/hostname.pid4 --datadir=/usr/local/mysql/data4 mysql 6032 0.1 2.0 30352 10644 pts/6 S 13:15 0:00 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data4 --user=mysql --log=/usr/local/mysql/data4/krsna.log --pid-file=/usr/local/mysql/data4/hostname.pid4 --skip-locking --port=3309 --socket=/tmp/mysql.sock4 mysql 6034 0.1 2.0 30352 10644 pts/6 S 13:15 0:00 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data2 --user=mysql --log=/usr/local/mysql/data2/krsna.log --pid-file=/usr/local/mysql/data2/hostname.pid2 --skip-locking --port=3307 --socket=/tmp/mysql.sock2 krsna:/usr/local # tail -7 /usr/local/mysql/data2/krsna.err 040817 17:29:09 mysqld ended 040825 13:15:23 mysqld started 040825 13:15:23 Warning: Asked for 196608 thread stack, but got 126976 040825 13:15:30 InnoDB: Started /usr/local/mysql/bin/mysqld: ready for connections. Version: '4.0.20-standard-log' socket: '/tmp/mysql.sock2' port: 3307 rsna:/usr/local # tail -10 /usr/local/mysql/data3/krsna.err 040825 13:16:33 mysqld restarted 040825 13:16:33 Warning: Asked for 196608 thread stack, but got 126976 040825 13:16:33 Can't start server: Bind on TCP/IP port: Address already in use 040825 13:16:33 Do you already have another mysqld server running on port: 3308 ? 040825 13:16:33 Aborting 040825 13:16:33 /usr/local/mysql/bin/mysqld: Shutdown Complete 040825 13:16:33 mysqld ended krsna:/usr/local # tail -7 /usr/local/mysql/data4/krsna.err 040817 17:29:08 mysqld ended 040825 13:15:23 mysqld started 040825 13:15:23 Warning: Asked for 196608 thread stack, but got 126976 040825 13:15:30 InnoDB: Started /usr/local/mysql/bin/mysqld: ready for connections. Version: '4.0.20-standard-log' socket: '/tmp/mysql.sock4' port: 3309 So, you kill -9 a threat of server #3 (we'll call it #3 in this case), and mysqld_safe tries to restart, but I assume the port is still in use because it's in the process of dying, so it gives up, and also doesn't try to kill any other mysqld process on any other port. My my.conf file is this: [mysqld_multi] mysqld = /usr/local/mysql/bin/mysqld_safe mysqladmin = /usr/local/mysql/bin/mysqladmin [mysqld2] socket = /tmp/mysql.sock2 port = 3307 pid-file = /usr/local/mysql/data2/hostname.pid2 datadir = /usr/local/mysql/data2 [mysqld3] socket = /tmp/mysql.sock3 port = 3308 pid-file = /usr/local/mysql/data3/hostname.pid3 datadir = /usr/local/mysql/data3 [mysqld4] socket = /tmp/mysql.sock4 port = 3309 pid-file = /usr/local/mysql/data4/hostname.pid4 datadir = /usr/local/mysql/data4
[2 Sep 2004 0:19]
Patrick Galbraith
With help from Serg, we found that there was an error in the regular expression that caused safe_mysqld to kill itself and not each mysqld thread:
PROC=`ps xa | grep $ledir/$MYSQLD | grep -v "grep" | sed -n '$p'`
changed to
PROC=`ps xa | grep "$ledir/$MYSQLD\>" | grep -v "grep" | sed -n '$p'`
This allowed it to manifest the original error that this bug has been logged for, which I've fixed by having logic test to see if $mysql_tcp_port set, and grepping to see if the port is in the ps list:
if test -n "$mysql_tcp_port"
then
numofproces=`ps xa | grep -v "grep" | grep $ledir/$MYSQLD| grep -c "port=$mysql_tcp_port"`
else
numofproces=`ps xa | grep -v "grep" | grep -c $ledir/$MYSQLD`
fi
echo -e "\nNumber of processes running now: $numofproces" | tee -a $err_log
I=1
while test "$I" -le "$numofproces"
do
if test -n "$mysql_tcp_port"
then
PROC=`ps xa | grep "$ledir/$MYSQLD\>" | grep -v "grep" | grep "port=$mysql_tcp_port" | sed -n '$p'`
else
PROC=`ps xa | grep "$ledir/$MYSQLD\>" | grep -v "grep" | sed -n '$p'`
fi
NOTE: mysqld_multi always starts mysqld_safe with a port set, so this is a good means of checking this

Description: When running multiple mysql installations (ie mysqld_multi), and one installation dies, mysqld_safe will kill ~all~ mysql processes on that machine. This should be relatively easy to fix. Here's the relevant line from mysqld_safe: PROC=`ps xa | grep $ledir/$MYSQLD | grep -v "grep" | sed -n '$p'` which later leads to for T in $PROC .... if kill -9 $T (We don't actually use mysqld_multi (we hacked up /etc/init.d/mysqld years ago before mysqld_multi was available), but I'm 99% sure that it doesn't matter.) On our installation, when you run the script, variable substitution results in the following line: ps xa | grep /usr/local/dh/mysql/base/libexec/mysqld | grep -v "grep" | sed -n '$p' Since both mysqld_multi and our /etc/init.d/mysqld actually uses mysqld_safe, and we have not modified mysqld_safe (beyond compilation flags to set the basedir as shown above), I'm almost positive you'll have the same problem if you are running multiple installations of mysql with the standard mysqld_multi setup. Our /etc/init.d/mysqld is just a mildly modified version of your mysql.server.sh script. In any case, the "PROC" line above doesn't result in any kind of selectivitity, and the "kill" then kills indiscriminately. So if I'm running 5 separate installations of mysql on my machine, and one of them segfaults, the mysqld_safe for that single dead mysql installation will then kill all the processes of the four other (otherwise happily running) mysql installations. Hope I'm making sense here. We've been seeing this for years, and I just now found it. How to repeat: Install several installations of mysql on a single machine. run `ps aux' to find all the processes of a single mysql installation. maybe something like: ps aux | grep /tmp/mysql.sock3 in a second window, do a `tail -f ' on the error logs of a few of the ~other~ mysql installations back to the first window: don't kill the mysqld_safe process. instead, kill -9 one of the mysqld processes. watch the error logs of the other mysql installations (in the other window). you'll start seeing stuff like this: Number of processes running now: 300 Number of processes running now: 96 Number of processes running now: 73 Number of processes running now: 53 mysqld process hanging, pid 5230 - killed mysqld process hanging, pid 5201 - killed mysqld process hanging, pid 5487 - killed mysqld process hanging, pid 3809 - killed 040811 10:08:51 mysqld restarted Suggested fix: Quick fix for me (and others?): Change: KILL_MYSQLD=1; to KILL_MYSQLD=0; in mysqld_safe. Permanent fix: Looks like you can probably just do: PROC=`ps xa | grep $ledir/$MYSQLD | grep $pid_file | grep -v "grep" | sed -n '$p'` and it will be fixed. You'll also want to change numofproces=`ps xa | grep -v "grep" | grep -c $ledir/$MYSQLD` to numofproces=`ps xa | grep -v "grep" | grep $pid_file | grep -c $ledir/$MYSQLD` to correctly report the number of processes still running.