MySQL Bugs: #64929: 'mysqladmin shutdown' doesn't stop API node if there are no running data nodes

Bug #64929	'mysqladmin shutdown' doesn't stop API node if there are no running data nodes
Submitted:	10 Apr 2012 14:17	Modified:	15 Apr 2012 14:06
Reporter:	Timur Bakeyev	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	5.5.20-ndb-7.2.5-gpl	OS:	Linux (debian6.0 on x86_64)
Assigned to:		CPU Architecture:	Any
Tags:	regression

Description:
# /usr/local/mysql/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf --socket=/var/lib/mysql/mysql.sock shutdown
Warning;  Aborted waiting on pid file: '/var/run/mysqld/mysqld.pid' after 3600 seconds

This doesn't terminate mysqld and daemon has to be killed with 'kill -9'. More appropriate 'kill -15' has no effect.

How to repeat:
Start management node, but don't start data nodes. Start MySQL API node. It'll take a bit longer, as it tries to establish connection to data nodes, but eventually it starts.

# ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2 (not connected, accepting connect from 10.0.0.140)
id=3 (not connected, accepting connect from 10.0.0.141)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @10.0.0.120  (mysql-5.5.20 ndb-7.2.5)

[mysqld(API)]   18 node(s)
id=4 (not connected, accepting connect from 10.0.0.120)
id=5 (not connected, accepting connect from 10.0.0.120)
id=6 (not connected, accepting connect from 10.0.0.120)
id=7 (not connected, accepting connect from 10.0.0.120)
id=8 (not connected, accepting connect from any host)

mysql> show processlist;
|    1 | system user |           | NULL | Daemon  |  964 | Waiting for ndbcluster to start | NULL             |

Issue 'mysqladmin shutdown' to shutdown MySQL node. It'll hung until timeout reached(3600 sec). MySQL will be sitting in memory, but won't accept any connections and unix domain socket will be removed right after the command is issued. Pid file will remain though.

The only way to kill the mysqld in memory is with 'kill -9'.

It's interesting to note though that if MySQL node was started when data nodes already were online, but later brought down - 'mysqladmin shutdown' works as expected.

It also works in the case when first MySQL node started and then data nodes, giving working cluster configuration.

So, the problem exposed only when MySQL node(s) started and when tried to be shut down while data nodes are not operational(or still starting). It's not a common scenario, but happens in real life.

Suggested fix:
Unknown.

Verified just as described on Bug #64929.

Verified it on below environment:

root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# uname -an
Linux ushastry 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux

## Starting MGM

root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# bin/ndb_mgmd --initial --configdir=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5 --ndb-nodeid=3 -f config.ini
MySQL Cluster Management Server mysql-5.5.20 ndb-7.2.5
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# 
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# bin/ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=1 (not connected, accepting connect from localhost)
id=2 (not connected, accepting connect from localhost)

[ndb_mgmd(MGM)]	1 node(s)
id=3	@127.0.0.1  (mysql-5.5.20 ndb-7.2.5)

[mysqld(API)]	2 node(s)
id=4 (not connected, accepting connect from localhost)
id=5 (not connected, accepting connect from localhost)

ndb_mgm> 

## Starting SQL Node/API node 4

root@ushastry:/home/ushastry# 
root@ushastry:/home/ushastry# cd Downloads/mysql-cluster-gpl-7.2.5
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# bin/mysqld_safe --defaults-file=my.cnf --user=mysql &
[1] 1710
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# 120412 11:34:22 mysqld_safe Logging to '/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/data/cluster.err'.
120412 11:34:22 mysqld_safe Starting mysqld daemon with databases from /home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/data

#### Verified that "mysqld" process is up and running

root@ushastry:/home/ushastry# ps aux|grep "mysqld"
root      1710  0.0  0.0   1896   624 pts/1    S    11:34   0:00 /bin/sh bin/mysqld_safe --defaults-file=my.cnf --user=mysql
mysql     1906  0.1  1.9 300088 31524 pts/1    Sl   11:34   0:00 /home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/bin/mysqld --defaults-file=my.cnf --basedir=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5 --datadir=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/data --plugin-dir=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/lib/plugin --user=mysql --log-error=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/data/cluster.err --pid-file=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/data/cluster.pid
root      1933  0.0  0.0   4008   764 pts/2    S+   11:37   0:00 grep --color=auto mysqld
root@ushastry:/home/ushastry# 

root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# 
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# bin/mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.5.20-ndb-7.2.5-log Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show processlist;
+----+-------------+-----------+------+---------+------+---------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time | State                           | Info             |
+----+-------------+-----------+------+---------+------+---------------------------------+------------------+
|  1 | system user |           | NULL | Daemon  |   48 | Waiting for ndbcluster to start | NULL             |
|  2 | root        | localhost | NULL | Query   |    0 | NULL                            | show processlist |
+----+-------------+-----------+------+---------+------+---------------------------------+------------------+
2 rows in set (0.00 sec)

mysql> 

IMP NOTE: ## SQL Node 4 is up but no SQL/API node will connect to the cluster until all ndbd nodes reach "started" state. Your "ndb_mgm -e show" proves they are all still "not connected"

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=1 (not connected, accepting connect from localhost)
id=2 (not connected, accepting connect from localhost)

[ndb_mgmd(MGM)]	1 node(s)
id=3	@127.0.0.1  (mysql-5.5.20 ndb-7.2.5)

[mysqld(API)]	2 node(s)
id=4 (not connected, accepting connect from localhost)
id=5 (not connected, accepting connect from localhost)

ndb_mgm> 

## ## Stoping SQL/API node 4
## When mysqladmin is invoked to shutdown, it just hangs and if quit forcefully it returns with a warning

root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# bin/mysqladmin -uroot -p shutdown
Enter password: 
^CWarning;  Aborted waiting on pid file: '/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/data/cluster.pid' after 52 seconds
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# 
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# 

## mysql session ends but the process is still exists

mysql> show processlist;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
ERROR: 
Can't connect to the server

mysql> 

mysql> quit
Bye
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# bin/mysql -u root -p
Enter password: 
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# 

root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# ps aux|grep "mysqld"
root      1710  0.0  0.0   1896   624 pts/1    S    11:34   0:00 /bin/sh bin/mysqld_safe --defaults-file=my.cnf --user=mysql
mysql     1906  0.1  1.9 300524 31948 pts/1    Sl   11:34   0:00 /home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/bin/mysqld --defaults-file=my.cnf --basedir=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5 --datadir=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/data --plugin-dir=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/lib/plugin --user=mysql --log-error=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/data/cluster.err --pid-file=/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5/data/cluster.pid
root      1939  0.0  0.0   4008   764 pts/2    S+   11:39   0:00 grep --color=auto mysqld
root@ushastry:/home/ushastry/Downloads/mysql-cluster-gpl-7.2.5# 

This doesn't terminate mysqld and daemon has to be killed with 'kill -9'.

see bug#40961

I don't see any correlation with the mentioned bug, maybe you'd re-read the description?

In both cases the mysqld is by hanging waiting for connection to cluster. So its a duplicate with the same root cause. Since the later bug report is always a duplicate of the first this is a duplicate. But correctly we should set the original to verified. As both bugs are linked with each other information of how to reproduce will not get lost.

Maybe my definition of 'hang' is different, but I don't see 'mysqld' "hanging".

In the first place, mysqld starts fine, having connection only to mgmd node, just it takes longer, until 'ndb-wait-setup' is expired.

After that mysqld is quite responsive, you can start client, query list of databases and tables, but, of course, as soon as it comes to the data it can't fetch it and gives:

mysql> select * from users;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

I wouldn't call this 'hanging'. That's actually what is expected.

If you look onto the ticket you referred to the problem there caused by the lack of available slots on the management server.

In my case I have plenty of free slots.

Honestly, I don't see, why 'mysqladmin shutdown' would need connection to data nodes in any case. It may want to terminate running queries, but in my case there are any.

And, to conclude - I can't reproduce this bug on Ver 14.14 Distrib 5.1.39-ndb-7.0.9 - there 'mysqladmin shutdown' works fine in the same conditions.

Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2 (not connected, accepting connect from 10.0.0.105)
id=3 (not connected, accepting connect from 10.0.0.107)
With 5.1.39 ndb-7.0.9:

[ndb_mgmd(MGM)] 1 node(s)
id=1    @10.0.0.104  (mysql-5.1.39 ndb-7.0.9)

[mysqld(API)]   6 node(s)
id=4 (not connected, accepting connect from 10.0.0.106)
id=5 (not connected, accepting connect from 10.0.0.108)
id=6 (not connected, accepting connect from 10.0.0.120)
id=7 (not connected, accepting connect from 10.0.0.121)
id=8 (not connected, accepting connect from any host)

|  1 | system user |           | NULL | Daemon  |  132 | Waiting for ndbcluster to start | NULL             |

# time mysqladmin shutdown

real    0m2.019s

And no mysqld in memory.