MySQL Bugs: #77559: [Err] 1296 - Got error 4008 'Receive from NDB failed' from NDBCLUSTER

Bug #77559	[Err] 1296 - Got error 4008 'Receive from NDB failed' from NDBCLUSTER
Submitted:	30 Jun 2015 9:21	Modified:	3 Nov 2015 18:16
Reporter:	Wenfang Zhang Zhang	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: NDB API	Severity:	S2 (Serious)
Version:	mysql-cluster-advanced-7.4.6-linux-glibc	OS:	Red Hat (2.6.32-504.el6.x86_64)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	[Err] 1296 - Got error 4008 'Receive from NDB failed' from NDBCLUSTER

Description:
I used mysql-cluster-advanced-7.4.6-linux-glibc2.5-x86_64.tar.gz on my Linux (OS:redhat 6),There are 2 mgm nodes,4 data nodes and 4 sql nodes. BTY,the sql node and the data node are on the same mechine. every mechine has the same hardware : 32G memory, 16 Cpu , i encounterd a strange problem ,when i query a join sql: 
SELECT 
I.PARTY_ID, 
I.ID_TP_CD, 
I.REF_NUM, 
I.MATCH_NUM 
FROM 
PARTY PA 
LEFT JOIN PERSON PE ON PE.PARTY_ID = PA.PARTY_ID 
LEFT JOIN IDENTIFIER I ON I.PARTY_ID = PA.PARTY_ID 
LEFT JOIN PERSON_NAME PN ON PN.PARTY_ID = PA.PARTY_ID 
WHERE 
PA.INACTIVATED_DT IS NULL 
AND PE.GENDER_TP_CODE = 2 
AND PE.BIRTH_DT = '2011-12-10' 
AND PN.LAST_NAME = 'name-测试111' 

it has errors:[Err] 1296 - Got error 4008 'Receive from NDB failed' from NDBCLUSTER 

and every table have the primary key party_id,and the sum record is 15000000 , but when i query a simple sql ,ie, select * from party where party_id=123434; 
it can return successfully . 
mgm configuration: 

[ndbd default] 

NoOfReplicas=2 # Number of replicas 
DataMemory=7000M # How much memory to allocate for data storage 
IndexMemory=2000M # How much memory to allocate for index storage 
MaxNoOfConcurrentTransactions = 8096 
MaxNoOfConcurrentOperations = 20M 
NoOfFragmentLogFiles =500 
TransactionDeadLockDetectionTimeOut=10000 
TotalSendBufferMemory = 256M 

[tcp default] 
SendBufferMemory=16M 
ReceiveBufferMemory=16M 

[ndb_mgmd] 
NodeId=1 
hostname=22.8.129.203 # Hostname or IP address of MGM node 
datadir=/hadoop/mysql/mysql_cluster/data # Directory for MGM node log files 

[ndb_mgmd] 
NodeId=2 
hostname=22.8.129.204 # Hostname or IP address of MGM node 
datadir=/hadoop/mysql/mysql_cluster/data # Directory for MGM node log files 

[ndbd] 
NodeId=3 
hostname=22.8.129.205 # Hostname or IP address 
datadir=/hadoop/mysql/mysql_cluster/data # Directory for this data node's data files 

[ndbd] 
NodeId=4 
hostname=22.8.129.206 # Hostname or IP address 
datadir=/hadoop/mysql/mysql_cluster/data # Directory for this data node's data files 

[ndbd] 
NodeId=7 
hostname=22.8.129.207 # Hostname or IP address 
datadir=/hadoop/mysql/mysql_cluster/data # Directory for this data node's data files 

[ndbd] 
NodeId=9 
hostname=22.8.129.209 # Hostname or IP address 
datadir=/hadoop/mysql/mysql_cluster/data # Directory for this data node's data files 

[mysqld] 
NodeId=5 
hostname=22.8.129.205 # Hostname or IP address 

[mysqld] 
NodeId=6 
hostname=22.8.129.206 # Hostname or IP address 

[mysqld] 
NodeId=19 
hostname=22.8.129.207 # Hostname or IP address 

[mysqld] 
NodeId=20 
hostname=22.8.129.209 # Hostname or IP address 

sql node and data node configuration: 

[mysqld] 
max_connections = 8096 
max_connect_errors = 99999999 
max_user_connections = 8000 
wait_timeout = 28800 
interactive_timeout = 28800 
thread_concurrency = 16 
basedir = /hadoop/mysql/mysql_cluster 
datadir = /hadoop/mysql/mysql_cluster/data 
log-error=/hadoop/mysql/log/mysqld.log 
character_set_server = utf8 
sql_mode=NO_ENGINE_SUBSTITUTION 
ndbcluster 
ndb-connectstring=22.8.129.203,22.8.129.204 
[mysql_cluster] 
ndb-connectstring=22.8.129.203,22.8.129.204 

How to repeat:
i tried several parameters in config.init regarding time out , but it does't work at all.Maybe this version has some issues?

4008 error code means that NDB API tried to send to the data node but could not succeed with that. The most usual problem that gives this kind of problem is overload.

So most likely the query that you perform manages to overload the connections to the cluster.
Not sure how and why though.

You can try and see if it helps by increasing the SendBufferMemory sizes.

yes, i has already increase the variable SendBufferMemory to 1000M,but it also occur the problem. and when i explain this SQL, it seems that scan all table. And, as the document says, DataMemory and IndexMemory in config.ini refers to the memory of every data node,but when i set DataMemory=9000M;IndexMemory=2500M,
i found that the ndb process use the 26G memory,it's strange.

Hi,

Looking trough data you provided this looks like overloaded cluster, not a bug. In order to properly configure your cluster (both config and hardware) I suggest you get support subscription from us and we can solve your overloading problems.

While you are deciding if you wish to proceed and get support contract I can advice you test MySQL Enterprise Monitor and monitor your cluster for a while.

kind regards
Bogdan Kecman