Bug #81886 MySQL cluster API Node restarts with " mysqld got signal 6" error
Submitted: 16 Jun 2016 13:33 Modified: 18 Aug 2016 19:32
Reporter: Ciprian Dumitrel Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: NDB API Severity:S1 (Critical)
Version:: 5.6.24-ndb-7.4.6-cluster-gpl / MySQL C OS:Red Hat (Red Hat Enterprise Linux Server release 6.6 (Santiago))
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: cluster, Communications link failure, connection pool, hibernate, java, MySQL, ndb api, Red Hat

[16 Jun 2016 13:33] Ciprian Dumitrel
Description:
Hello,

We are developing a Java Web Spring and Hibernate-based application, connecting to a MySQL cluster having one management node, 2 NDB nodes and 2 SQL API Nodes deployed as below:

1) The management node is deployed on a virtual machine running Red Hat Enterprise Linux Server release 6.6 (Santiago) OS with 4GBs of RAM and 1 CPU;
2) We have 2 other virtual machines, with the same OS and the same resources as the one where the management node is deployed. On each of the 2 machines 1 NDB node and 1 SQL API Node are deployed (We will attach the configuration files afterwards).
3) From our application we query the database and after multiple reloading of certain pages we loose the connection to the database, because one of the 2 SQL API Nodes gets restarted. In our application server logs we get the following exception: 

Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 33 milliseconds ago.  The last packet sent successfully to the server was 33 milliseconds ago.
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
	at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1036)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3427)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3327)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3814)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
	at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2030)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
	at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
	at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:82)
	at org.hibernate.loader.Loader.getResultSet(Loader.java:2066)
	at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1863)
	at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1839)
	at org.hibernate.loader.Loader.doQuery(Loader.java:910)
	at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:355)
	at org.hibernate.loader.Loader.doList(Loader.java:2554)
	at org.hibernate.loader.Loader.doList(Loader.java:2540)
	at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2370)
	at org.hibernate.loader.Loader.list(Loader.java:2365)
	at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:497)
	at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:387)
	at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:236)
	at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1264)
	at org.hibernate.internal.QueryImpl.list(QueryImpl.java:103)
	at wso2.api.wrapper.dao.impl.HibernateSubscriptionDaoImpl.getAPISubscriptionRequest(HibernateSubscriptionDaoImpl.java:183)
	at wso2.api.wrapper.bo.subscription.impl.SubscriptionBoImpl.getAPISubscriptionRequestHistory(SubscriptionBoImpl.java:307)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:98)
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:262)
	... 80 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
	at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2914)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3337)
	... 115 more

We have associated this exception with the moment when the SQL API Node restarts . In the node's error logs we can see the following errors:

/pb2/build/sb_0-14878975-1427908940.3/rpm/BUILD/mysql-cluster-gpl-7.4.6/mysql-cluster-gpl-7.4.6/storage/ndb/src/common/util/NdbSqlUtil.cpp:480: require((lb + m1 <= n1 && lb + m2 <= n2)) failed
11:50:12 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=90
max_threads=151
thread_count=52
connection_count=51
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68104 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x3529a40
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
...............................
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
160616 13:50:12 mysqld_safe Number of processes running now: 0
160616 13:50:12 mysqld_safe mysqld restarted
2016-06-16 13:50:14 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2016-06-16 13:50:14 0 [Note] /usr/sbin/mysqld (mysqld 5.6.24-ndb-7.4.6-cluster-gpl) starting as process 28071 ...
2016-06-16 13:50:14 28071 [Note] Plugin 'FEDERATED' is disabled.

How to repeat:
We have a page that generates multiple database queries. After multiple reloads the page does not load anymore. In our application server logs (Tomcat8) we can see the "Communications link failure" exception at that moment in time. Navigating to both of the node's error logs, we can see that the SQL API corresponding to one node has restarted at a time stamp before the time stamp of the exception received in the application server logs.
[20 Jun 2016 10:52] MySQL Verification Team
Hi,
With the info you shared I can only tell you that your cluster fails but not why it fails. I need you to upload the full ndb_error_reporter output. It will collect all the log files from the data nodes and management node so that I can look why it crashed.

Also, since you have this one page that when you refresh few times you crash your cluster, if you can extract all the sql queries from that page it would speed up the process.

kind regards
Bogdan Kecman
[27 Jun 2016 15:15] Ciprian Dumitrel
Hello Bogdan,

Thanks a lot for helping with this. Today I could not reproduce the "Communications link failure" exception. The last time we got this exception was on the 17th of June, at 15:14:08. There are errors in the MySQL cluster logs at the same moment of time. I will immediately attach the ndb_error_reporter logs, as well as the queries that run when that certain page is loaded. 

BR,
Ciprian
[28 Jun 2016 8:32] Ciprian Dumitrel
Hello Bogdan,

Today we have managed to reproduce the incident. I will attach a fresh ndb_error_report. 

Thanks,
Ciprian
[18 Jul 2016 19:32] MySQL Verification Team
Hi,

for start, you don't have enough API slots in your cluster config. You only have 2 mysqld slots and mysqld cannot properly connect to a cluster. Open more mysqld or api slots (just add at the end of your config few [mysqld] and few [api] lines).

second, I don't have your latest mysqld config as mysqld is what's crashing, nor I have your mysqld error log. 

Anyhow, trying to reproduce your issue I managed to reproduce it with

1. binlog turned on
2. mysqld with default configuration of ndb-cluster-connection-pool=3 and 2 mysqld servers and only 2 mysqld slots as you currently have in your config

The solution

1. upgrade to 7.4.12 
2. fix your config, you need at least noofmysqldnodes * ndb-cluster-connection-pool nodes, but I suggest you create at least (1+noofmysqldnodes) * ndb-cluster-connection-pool

Let me know if this do solve your problem and if not please come back with
 - mysqld config too
 - mysqld error log too
 - ndb_error_report (as you did so far)

kind regards
Bogdan Kecman
[19 Aug 2016 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".