Bug #33626 MySQL clients don't reconnect to Cluster after it was restarted
Submitted: 2 Jan 2008 14:39 Modified: 12 Jan 2009 21:41
Reporter: Geert Vanderkelen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1.23 OS:Any
Assigned to: Don Kehn CPU Architecture:Any

[2 Jan 2008 14:39] Geert Vanderkelen
Description:
Shutting down cluster, or killing the ndbd's with SIGKILL, will give errors when MySQL Clients still connected try to do something.

Exiting the MySQL Client and reconnecting to the MySQL server helps.

How to repeat:
--Simple cluster setup, nothing special.

mysql> CREATE TABLE t1 (id INT KEY AUTO_INCREMENT, c1 VARCHAR(20)) ENGINE=ndb;
Query OK, 0 rows affected (1.12 sec)

mysql> INSERT INTO t1 (c1) VALUES ('Geert');
Query OK, 1 row affected (0.06 sec)

--Killall ndbd's process with SIGKILL
--Restart ndbd's

mysql> INSERT INTO t1 (c1) VALUES ('Geert');
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> show warnings;
+-------+------+----------------------------------------------------+
| Level | Code | Message                                            |
+-------+------+----------------------------------------------------+
| Error | 1296 | Got error 4009 'Cluster Failure' from NDB          | 
| Error | 1296 | Got error 157 'Unknown error code' from NDBCLUSTER | 
| Error | 1033 | Incorrect information in file: './test/t1.frm'     | 
+-------+------+----------------------------------------------------+

--Exit the mysql client normally

mysql> INSERT INTO t1 (c1) VALUES ('Jan');
Query OK, 1 row affected (0.03 sec)

Suggested fix:
Known problem where SQL nodes don't fully disconnect.
Related bug #27644, which is another manifestation of the problem.

Workaround: reconnect all MySQL Clients, no need to restart the SQL Nodes, but that would make sure it's all cleaned up.
[3 Nov 2008 21:12] Don Kehn
We are working up a fix for this.
[14 Nov 2008 19:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58831

2750 Don Kehn	2008-11-14
      BUG#33626 - When the binlog thread encounters the cluster failure event, free the resources used by the binlog thread and start over.
[14 Nov 2008 19:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58832

2750 Don Kehn	2008-11-14
      BUG#33626 - When the binlog thread encounters the cluster failure event, free the resources used by the binlog thread and start over.
[14 Nov 2008 19:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58833

2750 Don Kehn	2008-11-14
      BUG#33626 - When the binlog thread encounters the cluster failure event, free the resources used by the binlog thread and start over.
[14 Nov 2008 19:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58837

2749 Don Kehn	2008-11-14
      BUG#33626 Check that the ndb object in thd_ndb has the same connect count
[14 Nov 2008 19:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58840

2752 Don Kehn	2008-11-14
      BUG #33626 - Adds the check_ndb_connection in ndb_util_thread_func and removes assert from binlog TE_CLUSTER_FAILURE
[14 Nov 2008 22:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58855

2753 Don Kehn	2008-11-14
      BUG #33626 - ndbcluster_print_error crash
[17 Nov 2008 19:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58992

3102 Don Kehn	2008-11-17 [merge]
      [BUG #33626] added all the changes for the ndb recycle code in merge from 6.3
[17 Nov 2008 20:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58998

3103 Don Kehn	2008-11-17
      [BUG #33626] modification to ndb_reconnect.test due to windows changes.
[17 Nov 2008 20:47] Bugs System
Pushed into 5.1.29-ndb-6.4.0  (revid:don.kehn@sun.com-20081117203023-oinhdyg8rfcsy5i0) (version source revid:don.kehn@sun.com-20081117203023-oinhdyg8rfcsy5i0) (pib:5)
[17 Nov 2008 21:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/59004

2755 Don Kehn	2008-11-17 [merge]
      [BUG #33626] merge.
[17 Nov 2008 21:04] Bugs System
Pushed into 5.1.29-ndb-6.3.19  (revid:don.kehn@sun.com-20081117210047-f5zoaivrp71zv0q5) (version source revid:don.kehn@sun.com-20081117210047-f5zoaivrp71zv0q5) (pib:5)
[18 Nov 2008 8:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/59034

2756 Tomas Ulin	2008-11-18
      partial revert of Bug #33626
[18 Nov 2008 8:12] Bugs System
Pushed into 5.1.29-ndb-6.3.19  (revid:tomas.ulin@sun.com-20081118081119-4vibxm4sgqvxmv8c) (version source revid:tomas.ulin@sun.com-20081118081119-4vibxm4sgqvxmv8c) (pib:5)
[18 Nov 2008 8:36] Bugs System
Pushed into 5.1.29-ndb-6.4.0  (revid:tomas.ulin@sun.com-20081118081119-4vibxm4sgqvxmv8c) (version source revid:tomas.ulin@sun.com-20081118083442-0towxfxe5hh7z56f) (pib:5)
[29 Dec 2008 6:39] Geert Vanderkelen
Need update on this bug report. Things were pushed to source, but original problem remains:

[root@MASTER:/test]
> SELECT VERSION();
+-----------------------------+
| VERSION()                   |
+-----------------------------+
| 5.1.30-ndb-6.3.21-debug-log | 
+-----------------------------+

[root@MASTER:/test]
> create table t1 (id int not null auto_increment key, name varchar(20)) engine=ndb;
Query OK, 0 rows affected (1.25 sec)

[root@MASTER:/test]
> insert into t1 (name) values ('Geert');
Query OK, 1 row affected (0.03 sec)

-- killall -SIGKILL ndbd  # on both data nodes

[root@MASTER:/test]
> insert into t1 (name) values ('Jan');
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

-- try second time, it works:

[root@MASTER:/test]
> insert into t1 (name) values ('Jan');
Query OK, 1 row affected (0.06 sec)
[12 Jan 2009 13:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62983

2799 Tomas Ulin	2009-01-12
      Bug #33626  MySQL clients does not reconnect to Cluster after it was restarted
[12 Jan 2009 13:12] Bugs System
Pushed into 5.1.30-ndb-6.3.21 (revid:tomas.ulin@sun.com-20090112131143-h9c8g6eeydjjh7gq) (version source revid:tomas.ulin@sun.com-20090112131143-h9c8g6eeydjjh7gq) (merge vers: 5.1.30-ndb-6.3.21) (pib:6)
[12 Jan 2009 13:15] Tomas Ulin
fixed in 6.3.21 and 6.4.1
[12 Jan 2009 13:19] Bugs System
Pushed into 5.1.30-ndb-6.4.1 (revid:tomas.ulin@sun.com-20090112131731-owqb92hl0hyz1hje) (version source revid:tomas.ulin@sun.com-20090112131731-owqb92hl0hyz1hje) (merge vers: 5.1.30-ndb-6.4.1) (pib:6)
[12 Jan 2009 21:41] Jon Stephens
Documented fix in the NDB-6.3.21 and NDB-6.4.1 changelogs as follows:

        If all data nodes were shut down, MySQL clients were unable to 
        access NDBCLUSTER tables and data even after the data nodes were 
        restarted, without the MySQL clients themselves being restarted.