Bug #100035 Socket lock file checking error
Submitted: 29 Jun 2020 14:24 Modified: 1 Jul 2020 15:26
Reporter: yuhui wang Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.7.28, 8.0.20 OS:CentOS
Assigned to: CPU Architecture:Any
Tags: mysql.sock.lock, socket lock file

[29 Jun 2020 14:24] yuhui wang
Description:
As of commit 1bba7015030a1cdd9eba21d48e7fdd1bad16ebef, we use lock file to prevent multiple of mysqld running on the same unix socket file. The lock file contains the pid of the running mysqld instance it is attached to. If another instance with the same socket filename comes up, the lock file is checked and the pid of the process is read and it is checked to see if it is running via calling kill with signal 0. If so the current instance is not allowed to come up.
Note the following things:
a. in linux, pid and tid is almost the same thing(ps aux -L). So, it just store main(mysqld_main) thread's tid to lock file. 
b. we use system function kill(id, 0) to check if there is another process. This is wrong. It only check thread instead of process
c. we start bunches of new threads before checking lock file's content
d. thread id may be reused 

Thus, in following case, mysql server can not startup:
1. first start mysqld. For example, we start 3 threads with id: 1000,1001,1002, the pid is 1000(main thread's tid). mysql.sock.lock's content is 1000.
2. kill mysqld. Left mysql.sock.lock file. Its content is 1000.
3. start mysqld. We again start 3 threads and the thread id is 998,999,1000. Pid  
is 998. Threads whose id is 999,1000 is normal background threads, for expamle, page cleaner threads in innodb. 
4. We use kill(1000,0) to check if there is another process running and it return true, mysqld fails to startup

On normal physical server, As os use pid/tid in round robin way, it is hard to trigger this problem. But if we deploy mysqld in container, for example, Docker.
It has high risk to meet this problem

How to repeat:
It seems we can not specify new thread's tid. We have to modify mysql.sock.lock to simulate this problem.

1. Start mysqld with gdb, and make a breakpoint in Unix_socket::create_lockfile. Running mysqld.
2. gdb will stop when we enter function Unix_socket::create_lockfile. Use ps aux -L | grep port to get all threads that mysqld have started and just remember one thread id, for example, 12466
3. Change mysql.sock.lock's content to 12466
4. continue running mysqld in gdb. It will exit and error log will have following errors:
```
540 2020-06-29T21:30:49.581391+08:00 0 [ERROR] [MY-010259] [Server] Another process with pid 12466 is using unix socket file.
541 2020-06-29T21:30:49.581445+08:00 0 [ERROR] [MY-010268] [Server] Unable to setup unix socket lock file.
542 2020-06-29T21:30:49.581478+08:00 0 [ERROR] [MY-010119] [Server] Aborting
```

Suggested fix:
Use os file lock to protect socket file?
[30 Jun 2020 12:11] MySQL Verification Team
Hi Mr. wang,

Thank you for your bug report.

However, this is not a bug.

Our Reference Manual contains very detailed instructions on how to start several MySQL servers on the same machine, even while deploying containers. Among other preconditions, each MySQL server should have a separate socket file.

Not a bug.
[1 Jul 2020 15:26] yuhui wang
I mean it is not right to use kill(pid, 0) function(in Unix_socket::create_lockfile()) to judge if there is a existing mysqld because pid will be reuse.