Description:
create_pid_file() in sql/mysqld.cc does not check the return value of
my_write() when writing the process pid to the .pid file. So it ignores
all possible errors that might happen after a successfull my_create().
Especially if the pid file is to be written on a file system with no free
space left the my_create() may succeed (as there are still free inodes
and file slots in the target directory because these had already been
allocated before the file system filled up) but the my_write() fails
as no storage space can be allocated for the new file.
This leads to an empty pid file being written which confuses the
mysqld_safe and the init script. E.g. it is possible to start the server
a second time using mysqld_safe. The second server will later fail
and bail out when it tries to bind to the same socket but InnoDB
recovery happens before that. The customer ended up with two
servers trying to do tablespace recovery at the same time which
led to complete desaster (in 4.1 InnoDB protects itself against this
using file locks but not in 4.0 yet).
How to repeat:
Let --pid-file point to a filled up file system and start the server using mysqld_safe.
The pid file will be created empty and it is possible to start the server using mysqld_safe a second time.
Suggested fix:
Check number of bytes written returned by my_write() in create_pid_file() (which is in sql/mysqld.cc) and at least give a meaningfull error message. Or maybe just pass the right flags to my_write() to make it complain.
Or maybe the server should even bail out if either my_create() or my_write() fails
as the mysqld_safe and the init script heavily rely on this file being correctly written?
The scripts can be extended to be aware of empty pid files and to try to check/kill
all running mysqld processes in this case as a workaround (i have patches for that
on the other laptop, will add them later) but this will not play nice if multiple independant
servers are running on the same machine.