Bug #31934 mysqld lost connect with cluster after the second killing all ndbds
Submitted: 30 Oct 2007 6:55 Modified: 3 Mar 2008 14:45
Reporter: li zhou Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1 OS:Any
Assigned to: CPU Architecture:Any
Tags: 5.1.22

[30 Oct 2007 6:55] li zhou
Description:
When i did test for nodes restart, i found after i killed all ndbds the second times, the mysqld lost connect with cluster. 

How to repeat:
1:) start a cluster with 4 data nodes , 2 mysqlds ,1 ndb_mgmd
2:) create a ndb table t1 and insert data into it.
3:) kill all the data nodes. 
4:) restat all the data nodes.
5:) in mysql client do "select * from t1", "show create table". It works well.
6:) kill all the data nodes
7:) restart all the data nodes again.
8:) all the mysqlds lost connection with cluster.

I also test it using mysql-test-run. One of mysqld will lost connection with cluster.

Suggested fix:
Mysqld should keep connection with cluster.
[17 Feb 2008 16:18] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.1.23-rc. In case of the same problem, please, upload .test file you had used.
[20 Feb 2008 6:53] li zhou
Did test using mysql, 5.1.24-rc(bk from ndb tree). It still occur.

I did two kinds of tests, all failed.
1: manually test with 4 data nodes
2: MTR test with 2 data nodes.

The steps of manually test is:
1: build and make install
2: Install db in the install directory
   ./bin/mysql_install_db --user=mysql
   ./bin/mysql_install_db --user=mysql --datadir=var1
3: start cluster  
  ./libexec/ndb_mgmd -f config.ini
  ./libexec/ndbd --initial
  ./libexec/ndbd --initial
  ./libexec/ndbd --initial
  ./libexec/ndbd --initial
4: start mysqld
   ./libexec/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/var --user=mysql --pid=/usr/local/mysql/var/dev3-63.pid --log-error=/usr/local/mysql/var/dev3-63.err &
  ./libexec/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/var1 --user=mysql --pid=/usr/local/mysql/var1/dev3-63-2.pid --log-error=/usr/local/mysql/var/dev3-63-2.err --socket=/tmp/mysql1.sock --port=3307 &
5: create table t1 and insert data
   ./bin/mysql -u root test -S /tmp/mysql.sock
   sql>create table t1(a int, b int) engine ndb;
   sql>insert into t1 values(1,1);
   sql>insert into t1 values(1,2);

6:  kill all the data nodes.
   killall ndbd
7: restat all the data nodes.
   ./libexec/ndbd
  ./libexec/ndbd
  ./libexec/ndbd
  ./libexec/ndbd
8: in mysql client do "select * from t1", "show create table". It works well. 
  sql> select * from t1;
9: kill all the data nodes
   killall ndbd
10:  restart all the data nodes again.
   ./libexec/ndbd
  ./libexec/ndbd
  ./libexec/ndbd
  ./libexec/ndbd
11: all the mysqlds lost connection with cluster.
   ./bin/ndb_mgm -e show  
   sql> select * from t1;
[20 Feb 2008 6:58] li zhou
Test file for the sencond time of data nodes restart.

Attachment: ndb_sys_restart.test (application/octet-stream, text), 1.62 KiB.

[20 Feb 2008 6:58] li zhou
config file for manually test.

Attachment: config.ini (application/octet-stream, text), 894 bytes.