Bug #70435 All data node shutdown by any trigger
Submitted: 26 Sep 2013 12:15 Modified: 21 Nov 2013 12:15
Reporter: 李 源錫 Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql cluster 7.3, mysql cluster 7.1, 7.3.3 OS:Linux (CentOS 6.4)
Assigned to: CPU Architecture:Any

[26 Sep 2013 12:15] 李 源錫
Description:
At mysql cluster 7.3(mysql5.6.11-ndb7.3.2) version,
All data node shutdown by any trigger.

and Same shutdown problem occurred at old mysql cluster 7.1(mysql5.1.56-ndb7.1.15) version.

Maybe this bug correspond to All mysql cluster version.

How to repeat:
1. confirm mysql cluster version
mysql> select version();
+------------------------------+
| version()                    |
+------------------------------+
| 5.6.11-ndb-7.3.2-cluster-gpl |
+------------------------------+
1 row in set (0.00 sec)

2. create table a
mysql> create table a (
    -> a1 varchar(100) NOT NULL DEFAULT '', 
    -> a2 int(6) DEFAULT '0', UNIQUE KEY idx_b1 (a1)
    -> ); 
Query OK, 0 rows affected (0.06 sec)

3. create table b
mysql> CREATE TABLE b ( 
    -> b1 char(20) NOT NULL DEFAULT '', 
    -> b2 enum('N','Y') NOT NULL DEFAULT 'N' ,  
    -> UNIQUE KEY idx_b1 (b1)
    -> ); 
Query OK, 0 rows affected (0.05 sec)

4. create trigger a_update
mysql> delimiter //
mysql> CREATE TRIGGER a_update BEFORE UPDATE ON a
    -> FOR EACH ROW
    -> UPDATE b SET b2='Y' WHERE b1=SUBSTRING(old.a1, 1 + INSTR(old.a1, '.')) 
    -> //
Query OK, 0 rows affected (0.01 sec)

5. create procedure a_ins
mysql> delimiter //
mysql> CREATE PROCEDURE a_ins()
    -> BEGIN
    -> DECLARE i INT DEFAULT 1;
    -> WHILE (i<=20000) DO INSERT INTO a VALUES(concat('a',i,'.aa'), 1);
    -> SET i=i+1;
    -> END WHILE;
    -> END
    -> //
Query OK, 0 rows affected (0.00 sec)

6. insert record to table a
mysql> delimiter ;
mysql> call a_ins;
Query OK, 1 row affected (10.63 sec)

7. insert record to table b
mysql> insert into b values('aa','N');
Query OK, 1 row affected (0.00 sec)

8. run below query.
mysql> UPDATE a SET a2=2;

then, in this timing all data node shutdowned.

ERROR 1297 (HY000): Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER

in manage node show below message after a few minutes.

ndb_mgm> Node 3: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 4: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
[27 Sep 2013 7:09] MySQL Verification Team
Hello,

Thank you for the report.
I can not repeat described behavior on my local box.

Could you please attach the cluster logs? Preferably using the ndb_error_reporter utility:

  http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-programs-ndb-error-reporter.html

Thanks,
Umesh
[27 Sep 2013 8:44] 李 源錫
To. Umesh
thanks for your fast confirm.
but you must test with over 2 data node.
please one more test at enviroment with 2 data node.
[27 Sep 2013 8:49] MySQL Verification Team
Hello,

Thank you for the feedback.
I tried it on my local box with 2,4 storage nodes and cannot reproduce.

Could you please attach the cluster logs? Preferably using the ndb_error_reporter utility:

  http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-programs-ndb-error-reporter.html

Thanks,
Umesh
[27 Sep 2013 9:14] 李 源錫
To. Umesh
ok. I will Send error trace file after a few hours.

if your local box is not physical 2 data node, can do you test with physical 2 data node?
[27 Sep 2013 12:55] 李 源錫
file by ndb_err_report

Attachment: ndb_error_report_20130927123949.tar.bz2 (application/octet-stream, text), 142.45 KiB.

[27 Sep 2013 12:57] 李 源錫
To. umesh

I attached file produced by ndb_error_report.
Please confirm.
[27 Sep 2013 13:16] 李 源錫
To. umesh

> I can not repeat described behavior on my local box.
Do you have any error message when you run "UPDATE a SET a2=2;",
although data node don't downed?
[27 Sep 2013 13:53] 李 源錫
To. umesh

> I can not repeat described behavior on my local box.

I more tested at one physical machine, also.
At one physical machine, i occured problem also.
Follow next procedure.

1. see my.cnf
cat /usr/local/mysql5.6.11-ndb7.3.2/my.cnf

[MYSQL_CLUSTER]
ndb-connectstring="host=localhost"

[MYSQLD]                        
basedir=/usr/local/mysql5.6.11-ndb7.3.2
datadir=/data/mysql5.6.11-ndb7.3.2

ndbcluster                            
default_storage_engine=ndb

2. confer config.ini
cat /usr/local/mysql5.6.11-ndb7.3.2/config.ini

[NDB_MGMD DEFAULT]
datadir=/data/mysql-cluster5.6.11-ndb7.3.2

[NDBD DEFAULT]
datadir=/data/mysql-cluster5.6.11-ndb7.3.2

NoOfReplicas=2
DataMemory=2G
IndexMemory=1G

[NDB_MGMD]
hostname=localhost
NodeID=1

[NDBD]
hostname=localhost
NodeID=2
[NDBD]
hostname=localhost
NodeID=3

[MYSQLD]
hostname=localhost
NodeID=4

3. run mgmd
/usr/bin/sudo -u mysql /usr/local/mysql5.6.11-ndb7.3.2/bin/ndb_mgmd -f /usr/local/mysql5.6.11-ndb7.3.2/config.ini --configdir=/usr/local/mysql5.6.11-ndb7.3.2

4. run ndbmtd
/usr/bin/sudo -u mysql /usr/local/mysql5.6.11-ndb7.3.2/bin/ndbmtd --defaults-file=/usr/local/mysql5.6.11-ndb7.3.2/my.cnf
/usr/bin/sudo -u mysql /usr/local/mysql5.6.11-ndb7.3.2/bin/ndbmtd --defaults-file=/usr/local/mysql5.6.11-ndb7.3.2/my.cnf

5. run mysqld
 /usr/local/mysql5.6.11-ndb7.3.2/bin/mysqld_safe --defaults-file=/usr/local/mysql5.6.11-ndb7.3.2/my.cnf --datadir=/data/mysql5.6.11-ndb7.3.2 &

6. login mysql
/usr/local/mysql5.6.11-ndb7.3.2/bin/mysql -uroot -p test

7. run next command

create table a (
a1 varchar(100) NOT NULL DEFAULT '', 
a2 int(6) DEFAULT '0', UNIQUE KEY idx_b1 (a1)
);

CREATE TABLE b (
b1 char(20) NOT NULL DEFAULT '',
b2 enum('N','Y') NOT NULL DEFAULT 'N' , 
UNIQUE KEY idx_b1 (b1)
);

delimiter //
CREATE TRIGGER a_update BEFORE UPDATE ON a
FOR EACH ROW
UPDATE b SET b2='Y' WHERE b1=SUBSTRING(old.a1, 1 + INSTR(old.a1, '.'))
//
delimiter ;

delimiter //
drop procedure a_ins//
CREATE PROCEDURE a_ins()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE (i<=20000) DO INSERT INTO a VALUES(concat('a',i,'.aa'), 1);
SET i=i+1;
END WHILE;
END
//
delimiter ;

call a_ins;

insert into b values('aa','N');

8. confirm record count table a,b
mysql> select count(*) from a;
+----------+
| count(*) |
+----------+
|    20000 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from b;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

9. confirm trigger
mysql> show triggers;
+----------+--------+-------+------------------------------------------------------------------------+--------+---------+------------------------+----------------+----------------------+----------------------+--------------------+
| Trigger  | Event  | Table | Statement                                                              | Timing | Created | sql_mode               | Definer        | character_set_client | collation_connection | Database Collation |
+----------+--------+-------+------------------------------------------------------------------------+--------+---------+------------------------+----------------+----------------------+----------------------+--------------------+
| a_update | UPDATE | a     | UPDATE b SET b2='Y' WHERE b1=SUBSTRING(old.a1, 1 + INSTR(old.a1, '.')) | BEFORE | NULL    | NO_ENGINE_SUBSTITUTION | root@localhost | utf8                 | utf8_general_ci      | latin1_swedish_ci  |
+----------+--------+-------+------------------------------------------------------------------------+--------+---------+------------------------+----------------+----------------------+----------------------+--------------------+
1 row in set (0.00 sec)

10. at last, run problem query.
mysql> UPDATE a SET a2=2;
ERROR 1297 (HY000): Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER

--you will see error like upper.

-- at once, count record
mysql> select count(*) from a where a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

--you will see error like upper, 

-- rerun , count record
mysql> select count(*) from a where a2=2;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

-- you will see count 0. a2 column is not updated normally.

--rerun problem then query
mysql> UPDATE a SET a2=2;  
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

-- you will see error again.

--rerun problem then query
mysql> UPDATE a SET a2=2;  
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

-- you will see error again.

--rerun problem then query
mysql> UPDATE a SET a2=2;  
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

-- you will see error again.
--rerun problem then query
mysql> UPDATE a SET a2=2;  
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

-- you will see error again.

-- repeat this query about 20 times, or more times.
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
mysql> UPDATE a SET a2=2;
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

-- see table status
+----------------+----------------------------------------------------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment                                            |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------+----------+----------------+----------------------------------------------------+
| a    | NULL   |    NULL | NULL       | NULL |           NULL |        NULL |            NULL |         NULL |      NULL |           NULL | NULL        | NULL        | NULL       | NULL      |     NULL | NULL           | Got error 157 'Unknown error code' from NDBCLUSTER |
| b    | NULL   |    NULL | NULL       | NULL |           NULL |        NULL |            NULL |         NULL |      NULL |           NULL | NULL        | NULL        | NULL       | NULL      |     NULL | NULL           | Got error 157 'Unknown error code' from NDBCLUSTER |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------+----------

-- you will confirm data node down.
[1 Oct 2013 15:46] 李 源錫
To. umesh

Did you retest in your local box according to my new procedure in one physical machine ?

Please check A.S.A.P.
This problem is serious in my production service using mysql cluster.
[10 Oct 2013 2:56] 李 源錫
To. support staff

Why don't reply my question?
[10 Oct 2013 9:25] MySQL Verification Team
Hi,

I tried couple of times with the provided test case and but it never crashed even with 2 separate physical boxes..

// mysql-5.6.11 ndb-7.3.2

---------------------
[ndbd(NDB)]     2 node(s)
id=2    @x.x.x.1  (mysql-5.6.11 ndb-7.3.2, Nodegroup: 0, Master)
id=3    @x.x.x.2  (mysql-5.6.11 ndb-7.3.2, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @x.x.x.100  (mysql-5.6.11 ndb-7.3.2)

[mysqld(API)]   12 node(s)
id=4    @x.x.x.100  (mysql-5.6.11 ndb-7.3.2)
id=5 (not connected, accepting connect from any host)

------------

mysql> drop table if exists a;
Query OK, 0 rows affected (0.03 sec)

mysql> drop table if exists b;
Query OK, 0 rows affected (0.01 sec)

mysql>
mysql>
mysql> create table a (
    -> a1 varchar(250) NOT NULL DEFAULT '',
    -> a2 int(6) DEFAULT '0', UNIQUE KEY idx_b1 (a1)
    -> );

CREATE TABLE b (
b1 char(20) NOT NULL DEFAULT '',
b2 enum('N','Y') NOT NULL DEFAULT 'N' ,
UNIQUE KEY idx_b1 (b1)
Query OK, 0 rows affected (0.04 sec)

mysql>
mysql> CREATE TABLE b (
    -> b1 char(20) NOT NULL DEFAULT '',
    -> b2 enum('N','Y') NOT NULL DEFAULT 'N' ,
    -> UNIQUE KEY idx_b1 (b1)
    -> );

delimiter //
CREATE TRIGGER a_update BEFORE UPDATE ON a
FOR EACH ROW
UPDATE b SET b2='Y' WHERE b1=SUBSTRING(old.a1, 1 + INSTR(old.a1, '.'))
Query OK, 0 rows affected (0.03 sec)

mysql>
mysql> delimiter //
mysql> CREATE TRIGGER a_update BEFORE UPDATE ON a
    -> FOR EACH ROW
    -> UPDATE b SET b2='Y' WHERE b1=SUBSTRING(old.a1, 1 + INSTR(old.a1, '.'))
    -> //
delimiter ;

delimiter //
drop procedure a_ins//
CREATE PROCEDURE a_ins()
BEGIN
Query OK, 0 rows affected (0.03 sec)

mysql> delimiter ;
mysql>
mysql> delimiter //
mysql> drop procedure a_ins//
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE PROCEDURE a_ins()
    -> BEGIN
    -> DECLARE i INT DEFAULT 1;
    -> WHILE (i<=200000) DO INSERT INTO a VALUES(concat('a',i,'.aa'), 1);
    -> SET i=i+1;
    -> END WHILE;
    -> END
    -> //
Query OK, 0 rows affected (0.00 sec)

// i<=200000 (increased to check if it fails for > 20000)

mysql> delimiter ;
mysql>
mysql> call a_ins;
Query OK, 1 row affected (10 min 33.99 sec)

mysql> insert into b values('aa','N');
Query OK, 1 row affected (0.00 sec)

mysql> UPDATE a SET a2=2;
Query OK, 200000 rows affected (9.83 sec)
Rows matched: 200000  Changed: 200000  Warnings: 0

Btw, looking at your config file - are you running cluster on default settings?
See http://dev.mysql.com/doc/refman/5.6/en/mysql-cluster-config-starting.html
[10 Oct 2013 12:02] 李 源錫
thanks mesh your reply.

i unbelievable data node is normal in your test.
it maybe  in your test trigger not work.

1.
please could you upload config.ini and my.cnf?

2. 
please could you upliad next result file?
mysqldump -uroot -p -R -d targetdb > targetdb.dmp

3.
ans please teach me your os version.
uname -a
[10 Oct 2013 17:44] 李 源錫
To. umesh
thanks mesh your reply.

>i unbelievable data node is normal in your test.
>it maybe  in your test trigger not work.

>1.
>please could you upload config.ini and my.cnf?

>2. 
>please could you upliad next result file?
>mysqldump -uroot -p -R -d targetdb > targetdb.dmp

>3.
>ans please teach me your os version.
>uname -a

I want to reply about upper request in today.
I immediately will retest considering your config.
I have tested with three other hardware(supermicro, ibm, hp).
At all lserver, all data node downed in this test.
[11 Oct 2013 3:47] 李 源錫
To. umesh

I am very sorry that I promotes your confirm.
If you can permit time and possible, I would like the following confirmation.

1.
please could you upload your config.ini and my.cnf

2. 
please could you upload next result file.
mysqldump -uroot -p -R -d targetdb > targetdb.dmp

3.
please teach os version.
uname -a

thanks.

p.s) 
>Btw, looking at your config file - are you running cluster on default settings?
>See http://dev.mysql.com/doc/refman/5.6/en/mysql-cluster-config-starting.html

I agin tested after same config set like http://dev.mysql.com/doc/refman/5.6/en/mysql-cluster-config-starting.html.

but, all data node shutdowned.
error message is below.

ndb_mgm> Node 3: Forced node shutdown completed. Caused by error 6050: 'WatchDog terminate, internal error or massive overload on the machine running this node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 2: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
[21 Nov 2013 12:13] MySQL Verification Team
Thank you for the feedback.
Verified as reported.

Marking this as duplicate of Bug #70955

Thanks,
Umesh
[21 Nov 2013 12:35] MySQL Verification Team
Duplicate of Bug #56929