Bug #61496 All data node was shutdown during delete record.
Submitted: 13 Jun 2011 1:23 Modified: 11 Oct 2016 23:33
Reporter: ws lee Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:7.1.10 OS:Linux (CentOS 5.5)
Assigned to: CPU Architecture:Any

[13 Jun 2011 1:23] ws lee
Description:
All data node was shutdown during delete record.

the query is
delete from sbtest limit 1000000;

How to repeat:
my config.ini below.

# cat config.ini
---------------------------------
[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=12000M
IndexMemory=1000M

MaxNoOfConcurrentOperations=1000000
NoOfFragmentLogFiles=16
FragmentLogFileSize=256M

[NDBD]
hostname=10.10.10.1
NodeID=1
[NDBD]
hostname=10.10.10.2
NodeID=2

[NDB_MGMD]
hostname=10.10.10.3
NodeID=3
[NDB_MGMD]
hostname=10.10.10.4
NodeID=4

[MYSQLD]
hostname=10.10.10.5
NodeID=5
[MYSQLD]
hostname=10.10.10.6
NodeID=6
---------------------------------------------------

1.
firstly, Insert 500 million record using sysbench(http://sysbench.sourceforge.net/).

# sysbench --test=oltp --db-driver=mysql --mysql-host=10.10.10.5 --mysql-table-engine=ndbcluster --oltp-table-size=5000000 --mysql-db=test --mysql-user=sysbench prepare

2. 
create file with this sql.
# cat delete.sql
delete from sbtest limit 1000000;
delete from sbtest limit 1000000;
delete from sbtest limit 1000000;
delete from sbtest limit 1000000;
delete from sbtest limit 1000000;

3.
run delete.sql in sql node.

# ./bin/mysql -uroot -p test

mysql> source delete.sql
Query OK, 1000000 rows affected (20.86 sec)

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Query OK, 1000000 rows affected (20.56 sec)

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

This timming, all data node was shutdown.
(if you don't encounter this error, repeat 2 - 3 step.)

ndb_mgm> Node 2: Forced node shutdown completed. Caused by error 6052: 'Signal lost, out of send buffer memory, please increase SendBufferMemory or lower the load(Resource configuration error). Permanent error, external action needed'.
Node 1: Forced node shutdown completed. Caused by error 2300: 'Generic error(Restart error). Temporary error, restart node'.

ndb_1_error log is below.

Time: Monday 13 June 2011 - 10:16:50
Status: Permanent error, external action needed
Message: Signal lost, out of send buffer memory, please increase SendBufferMemory or lower the load (Resource configuration error)
Error: 6052
Error data: Remote node id 2.
Error object: TransporterCallback.cpp
Program: /usr/local/mysq/bin/ndbmtd
[13 Jun 2011 1:26] ws lee
I am using ndbmtd.
(not ndbd)
[30 Jun 2011 5:49] ws lee
To. mysql staff

why not check this bug?
[30 Jun 2011 16:07] MySQL Verification Team
The error message states that the shutdown on node 2 is not a bug, rater a resource configuration error.

Message: Signal lost, out of send buffer memory, please increase SendBufferMemory or
lower the load (Resource configuration error)

See: http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-tcp-definition.html#ndbparam-tcp-send...

Large operations can overload SendBufferMemory.  The shutdown on node 1 is a suspect.  Please attach ndb_error_reporter files.
[30 Jun 2011 16:22] ws lee
To. Matthew Montgomer
thanks your reply.

of course, i see this problem caused by SendBufferMemory.
but, why data node down?

any service, SendBufferMemory(etc 1G) is over frequently.

I want to failed heavy query, in place of down data node.
[30 Mar 2012 9:33] Timur Bakeyev
Hi!

I also find this behavior frustrating. Can we, please, fix it somehow? Now "Number of replicas" queries with unexpectedly large results kill cluster instantly. And with start up time of more, than 2 hours it renders cluster absolutely useless :(

Time: Wednesday 28 March 2012 - 12:39:01
Status: Permanent error, external action needed
Message: Signal lost, out of send buffer memory, please increase SendBufferMemory or lower the load (Resource configuration error)
Error: 6052
Error data: Remote node id 2.
Error object: /export/home/pb2/build/sb_0-4838533-1327945758.79/rpm/BUILD/mysql-cluster-gpl-7.2.4/mysql-cluster-gpl-7.2.4/storage/ndb/src/kernel/vm/TransporterCallback.cpp
Program: ndbmtd
Pid: 24881 thr: 3
Version: mysql-5.5.19 ndb-7.2.4.
[11 Oct 2016 23:33] MySQL Verification Team
This is not a bug, it is miss-configuration. MySQL Cluster is designed to be real time RDBMS and this makes the "crash" more viable option then "failure".

7.4 version is loose on the real-time stuff so it is much more relaxed on this type of errors (the transaction will run longer and the system will slow down rather then crash).

all best
Bogdan Kecman