MySQL Bugs: #61496: All data node was shutdown during delete record.

Bug #61496	All data node was shutdown during delete record.
Submitted:	13 Jun 2011 1:23	Modified:	11 Oct 2016 23:33
Reporter:	ws lee	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	7.1.10	OS:	Linux (CentOS 5.5)
Assigned to:		CPU Architecture:	Any

Description:
All data node was shutdown during delete record.

the query is
delete from sbtest limit 1000000;

How to repeat:
my config.ini below.

# cat config.ini
---------------------------------
[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=12000M
IndexMemory=1000M

MaxNoOfConcurrentOperations=1000000
NoOfFragmentLogFiles=16
FragmentLogFileSize=256M

[NDBD]
hostname=10.10.10.1
NodeID=1
[NDBD]
hostname=10.10.10.2
NodeID=2

[NDB_MGMD]
hostname=10.10.10.3
NodeID=3
[NDB_MGMD]
hostname=10.10.10.4
NodeID=4

[MYSQLD]
hostname=10.10.10.5
NodeID=5
[MYSQLD]
hostname=10.10.10.6
NodeID=6
---------------------------------------------------

1.
firstly, Insert 500 million record using sysbench(http://sysbench.sourceforge.net/).

# sysbench --test=oltp --db-driver=mysql --mysql-host=10.10.10.5 --mysql-table-engine=ndbcluster --oltp-table-size=5000000 --mysql-db=test --mysql-user=sysbench prepare

2. 
create file with this sql.
# cat delete.sql
delete from sbtest limit 1000000;
delete from sbtest limit 1000000;
delete from sbtest limit 1000000;
delete from sbtest limit 1000000;
delete from sbtest limit 1000000;

3.
run delete.sql in sql node.

# ./bin/mysql -uroot -p test

mysql> source delete.sql
Query OK, 1000000 rows affected (20.86 sec)

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Query OK, 1000000 rows affected (20.56 sec)

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

This timming, all data node was shutdown.
(if you don't encounter this error, repeat 2 - 3 step.)

ndb_mgm> Node 2: Forced node shutdown completed. Caused by error 6052: 'Signal lost, out of send buffer memory, please increase SendBufferMemory or lower the load(Resource configuration error). Permanent error, external action needed'.
Node 1: Forced node shutdown completed. Caused by error 2300: 'Generic error(Restart error). Temporary error, restart node'.

ndb_1_error log is below.

Time: Monday 13 June 2011 - 10:16:50
Status: Permanent error, external action needed
Message: Signal lost, out of send buffer memory, please increase SendBufferMemory or lower the load (Resource configuration error)
Error: 6052
Error data: Remote node id 2.
Error object: TransporterCallback.cpp
Program: /usr/local/mysq/bin/ndbmtd

I am using ndbmtd.
(not ndbd)

To. mysql staff

why not check this bug?

The error message states that the shutdown on node 2 is not a bug, rater a resource configuration error.

Message: Signal lost, out of send buffer memory, please increase SendBufferMemory or
lower the load (Resource configuration error)

See: http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-tcp-definition.html#ndbparam-tcp-send...

Large operations can overload SendBufferMemory.  The shutdown on node 1 is a suspect.  Please attach ndb_error_reporter files.

To. Matthew Montgomer
thanks your reply.

of course, i see this problem caused by SendBufferMemory.
but, why data node down?

any service, SendBufferMemory(etc 1G) is over frequently.

I want to failed heavy query, in place of down data node.

Hi!

I also find this behavior frustrating. Can we, please, fix it somehow? Now "Number of replicas" queries with unexpectedly large results kill cluster instantly. And with start up time of more, than 2 hours it renders cluster absolutely useless :(

Time: Wednesday 28 March 2012 - 12:39:01
Status: Permanent error, external action needed
Message: Signal lost, out of send buffer memory, please increase SendBufferMemory or lower the load (Resource configuration error)
Error: 6052
Error data: Remote node id 2.
Error object: /export/home/pb2/build/sb_0-4838533-1327945758.79/rpm/BUILD/mysql-cluster-gpl-7.2.4/mysql-cluster-gpl-7.2.4/storage/ndb/src/kernel/vm/TransporterCallback.cpp
Program: ndbmtd
Pid: 24881 thr: 3
Version: mysql-5.5.19 ndb-7.2.4.

This is not a bug, it is miss-configuration. MySQL Cluster is designed to be real time RDBMS and this makes the "crash" more viable option then "failure".

7.4 version is loose on the real-time stuff so it is much more relaxed on this type of errors (the transaction will run longer and the system will slow down rather then crash).

all best
Bogdan Kecman