Bug #51816 Deleting lots of rows cause a data node to crash
Submitted: 8 Mar 2010 4:55 Modified: 9 Mar 2010 8:40
Reporter: Mikiya Okuno Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Triage: Triaged: D3 (Medium) / R6 (Needs Assessment) / E6 (Needs Assessment)

[8 Mar 2010 4:55] Mikiya Okuno
Description:
Error -1 is less useful when we want to drill down what is happening inside MySQL Cluster. It should report exact error number instead of reporting unknown -1.

How to repeat:
mysql> create table t2 (a bigint unsigned not null primary key auto_increment) engine ndb;
Query OK, 0 rows affected (2.13 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> delimiter //
mysql> create procedure p1 (nrows int)
    -> begin
    -> select @x:=max(a) from t1;
    -> set @y:=@x + nrows;
    -> repeat set @x:=@x+1; insert into t1 values(@x); until @x > @y end repeat;
    -> end;//
Query OK, 0 rows affected (0.01 sec)

mysql> delimiter ;
mysql> call p1 (1000000);
+------------+
| @x:=max(a) |
+------------+
|    1084546 | 
+------------+
1 row in set (0.00 sec)

Query OK, 1 row affected (18 min 46.23 sec)

mysql> delete from t1 where a < 500000;
ERROR 1296 (HY000): Got error -1 'Unknown error code' from NDBCLUSTER

Suggested fix:
please show us an accurate error code.
[8 Mar 2010 5:07] Mikiya Okuno
The repeat procedure is from copy and paste. While the table t1 had 1084546 rows in advance, it was populated in other way. The error actually occur when a table has 1M rows or so.
[8 Mar 2010 7:46] Magnus Blåudd
The direct error message you get from MySQL Server is quite often not correct. Please try SHOW WARNINGS directly after the error ocurs to (hopefully) get more error codes.
[8 Mar 2010 14:05] Jørgen Austvik
Please give output of SHOW VARNINGS
[9 Mar 2010 8:40] Mikiya Okuno
Alas... I've got an actual error; deleting lots of lows causes segmentation fault:

<<<SHOW WARNINGS>>>
mysql> delete from t1 where a < 500000;
ERROR 1296 (HY000): Got error -1 'Unknown error code' from NDBCLUSTER
mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                        |
+-------+------+----------------------------------------------------------------------------------------------------------------+
| Error | 1297 | Got temporary error 4010 'Node failure caused abort of transaction' from NDB                                   | 
| Error | 1296 | Got error -1 'Unknown error code' from NDBCLUSTER                                                              | 
| Error | 1622 | Storage engine NDB does not support rollback for this statement. Transaction rolled back and must be restarted | 
| Error | 1296 | Got error 4350 'Transaction already aborted' from NDB                                                          | 
| Error | 1296 | Got error 4350 'Transaction already aborted' from NDBCLUSTER                                                   | 
| Error | 1181 | Got error 4350 during ROLLBACK                                                                                 | 
+-------+------+----------------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

<<<CLUSTER LOG>>>
2010-03-09 17:30:10 [MgmtSrvr] ALERT    -- Node 11: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2010-03-09 17:30:10 [MgmtSrvr] ALERT    -- Node 1: Node 11 Disconnected
2010-03-09 17:30:10 [MgmtSrvr] ALERT    -- Node 10: Node 11 Disconnected

<<<ERROR LOG>>>
Time: Friday 5 March 2010 - 17:53:47
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation fault
Error object: ndbd.cpp
Program: /usr/local/telco-7.0/bin/ndbmtd
Pid: 31184 thr: 3
Version: mysql-5.1.39 ndb-7.0.9
Trace: /home/mikiya/mysql-data/simple-7.0/data0/ndb_10_trace.log.11 /home/mikiya/mysql-data/simple-7.0/data0/ndb_10_trace.log.11_t

<<<TRACELOG SNIPPET>>>
CMVMI   000249 002063 002097 
CMVMI   000249 002063 002097 
CMVMI   000249 
CMVMI   000249 002063 002097 

--------------- Signal ----------------
r.bn: 254 "CMVMI", r.proc: 10, r.sigId: 1866 gsn: 164 "CONTINUEB" prio: 0
s.bn: 254 "CMVMI", s.proc: 10, s.sigId: 1863 length: 3 trace: 1 #sec: 0 fragInf: 0
 H'000003e8 H'0000026f H'00000000
--------------- Signal ----------------
r.bn: 254 "CMVMI", r.proc: 10, r.sigId: 1865 gsn: 247 "EVENT_REP" prio: 1
s.bn: 247/1 "DBLQH", s.proc: 10, s.sigId: 1043052 length: 2 trace: 1 #sec: 0 fragInf: 0
 H'00000024 H'00000012
[31 Mar 2010 19:41] Joshua Gordon
I have also confirmed this bug. If you use replication as a backup the bug 36763 (Truncate does not replicate using mixed mode) This creates an issue where you cannot delete a table. Any fix or workaround would be appreciated.
[1 Apr 2010 7:30] Mikiya Okuno
Hi,

I've tested the patch. I've verified that the error message has changed to specific one instead of 'Unknown error code' like below:

mysql> delete from t1 where a < 2000000;
ERROR 1297 (HY000): Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER

Kind regards,
--
Mikiya