MySQL Bugs: #84135: when create table, it thow an exception "Could not connect to storage engine"

Bug #84135	when create table, it thow an exception "Could not connect to storage engine"
Submitted:	9 Dec 2016 2:01	Modified:	21 Dec 2016 1:08
Reporter:	ll lucky	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.7.16 ndb-7.5.4	OS:	CentOS (CentOS Linux release 7.2.1511 (Core))
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	create table error, MySQL Cluster, ndb, storage engine

Description:
Linux version 3.10.0-327.el7.x86_64 (builder@kbuilder.dev.centos.org) 
(gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) )

I use mysql-connector-c++-1.1.7 to connect to mysql cluster, when I repeat executing a "create table if not exist" command ,it sometimes throw a exception,
"Could not connect to storage engine" with MySQL error code: 157.
    It may be OK at the beginning,but after ten more or dozens of tries,this error will occor;after thousands or more ,and the cluster break down.

How to repeat:
Execute a create table command in C++ program, create a table with "engine=ndbcluster", it may occur.
like the program below:
testMysql()
{
	try {
		sql_execute("CREATE TABLE IF NOT EXISTS thisisatest(ID BIGINT(11) auto_increment,RawTableName varchar(30))ENGINE=ndbcluster AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;");
		AfxMessageBox(L"Success",0,0);
	}
	catch (sql::SQLException &e)
	{
		CString str(e.what());
		LPCTSTR lpstr = str;
		AfxMessageBox(lpstr,0, 0);
	}
	return 0;
}

Suggested fix:
I have no idea, if there is any idea about how to avoid or fix it,please let me know, my email is gyd.001@163.com.

In addition, I install mysql-cluster with the version under "Linux-Generic", and I choosed a 64-bit version, my CentOS is also x64.

Hi,
we need a complete log from the cluster in order to see what is wrong.
Please use ndb_error_reporter tool to collect logs and upload them to this bug report

thanks
Bogdan Kecman

error_report

Attachment: ndb_error_report_20161216000024.tar.bz2 (application/octet-stream, text), 194.47 KiB.

I upload a file that error_report generated, and I saw the log file, but there seems nothing useful info.

hi,

node2 and node3 logs are missing from this archive.
I see number of restarts, don't see if they are manual or from crashes (missing other log files). In any way I see a lot of heartbeat errors. Can you please check your network between nodes and cpu usage on the nodes.

Looking further looks like you are using MCM to maintain your cluster. Can you grab all the logs from the MCM ?
https://dev.mysql.com/doc/mysql-cluster-manager/1.3/en/mcm-collect-logs.html

thanks
Bogdan

a complete bug-report

Attachment: ndb_error_report_20161218005612.tar.bz2 (application/octet-stream, text), 117.75 KiB.

I'm sorry, there was something wrong with the ssh, and I cost some time to fix it. I upload a bug-report file that should be complete.
I cleared some log in ndb node before times for better analysis,and then tried to execute the code to create table dozen of times and the error occured again.Seventeen was an interesting number, after about seventeen times for each sql node,the exception began to throw.
Below was the cpp code that I use, I think it is easy to reproduce:
int testMysql{	
        sql::Statement *m_stmt = NULL;
	try {
		sql::Driver *mdriver = get_driver_instance();;
		sql::Connection *m_con = mdriver->connect("192.168.126.130", "mysqlcluster", "Grid2016");
		m_con->setSchema("test");
		m_stmt = m_con->createStatement();

		m_stmt->execute("CREATE TABLE IF NOT EXISTS thisisatest(id int(10), name varchar(32), score int(10)) \
ENGINE=ndbcluster AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;");
		delete m_stmt;
		AfxMessageBox(L"创建成功",0,0);
	}
	catch (sql::SQLException &e)
	{
		delete m_stmt;
		CString str(e.what());
		LPCTSTR lpstr = str;
		AfxMessageBox(lpstr,0, 0);
	}
return 0;
}
In the end, thank you for your attention.

hi,
are there files ndb_2_error.log and ndb_3_error.log?

I see number of force shutdowns and restarts in the log - were you doing this manually or?

Can you give me a timestamp when you for e.g. have this exception?

Did you look at the cpu usage and io usage on your data nodes when this exception occurs?

Looking at your log file, there is no bug here, you are just overloading your data nodes with more then the hardware can handle, if you look at out.log files you will see for example this:

2016-12-18 00:26:47 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1358
2016-12-18 00:26:47 [ndbd] INFO     -- Watchdog: User time: 21  System time: 690
2016-12-18 00:26:47 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1462
2016-12-18 00:26:47 [ndbd] INFO     -- Watchdog: User time: 21  System time: 695
2016-12-18 00:26:47 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1571
2016-12-18 00:26:47 [ndbd] INFO     -- Watchdog: User time: 22  System time: 706
2016-12-18 00:26:47 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1674
2016-12-18 00:26:47 [ndbd] INFO     -- Watchdog: User time: 22  System time: 712
2016-12-18 00:26:47 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1778
2016-12-18 00:26:47 [ndbd] INFO     -- Watchdog: User time: 22  System time: 721
2016-12-18 00:26:47 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1882
2016-12-18 00:26:47 [ndbd] INFO     -- Watchdog: User time: 22  System time: 726
2016-12-18 00:26:47 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1986
2016-12-18 00:26:47 [ndbd] INFO     -- Watchdog: User time: 22  System time: 731
2016-12-18 00:26:48 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2089
2016-12-18 00:26:48 [ndbd] INFO     -- Watchdog: User time: 22  System time: 736
2016-12-18 00:26:48 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2192
2016-12-18 00:26:48 [ndbd] INFO     -- Watchdog: User time: 22  System time: 740
2016-12-18 00:26:48 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2293
2016-12-18 00:26:48 [ndbd] INFO     -- Watchdog: User time: 22  System time: 740
2016-12-18 00:26:48 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2424
2016-12-18 00:26:48 [ndbd] INFO     -- Watchdog: User time: 22  System time: 758
2016-12-18 00:26:48 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2524
2016-12-18 00:26:48 [ndbd] INFO     -- Watchdog: User time: 22  System time: 775
2016-12-18 00:26:48 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2624
2016-12-18 00:26:48 [ndbd] INFO     -- Watchdog: User time: 22  System time: 787

that means your data node is stuck overloaded for many milliseconds more then it should be

I do not see this is a bug, this is just a normal behavior of missconfigured / misscaled cluster setup. Properly configuring the MySQL Cluster setup is something our support team can help you out with.

take care
Bogdan Kecman

This cluster was built on my vmware virtual machine, so frequently shutdown and restart is normal.I can't run the virtual machine all the time on my notebook.
I did the test just after 2016-12-18 00:00, and also, for better analysis,I cleared the log on ndb node and initialized the ndb node before the test. So there can't be heavy burden on it. If it was overloaded, how could it was not a bug?  I can't believe that. It will not happen in version 7.4.13. By the way, there was no any error log or slow log.

Hi,

> This cluster was built on my vmware virtual machine, 
> so frequently shutdown and restart is normal.
> I can't run the virtual machine all the time on my notebook.

A VM is not really a best env for a mysql cluster and a notebook hosting them is 100% not a proper place to test mysql cluster :(. It's enough that your host was opening a file and you had high io load and cluster will report problems. It's a system that depend on consistent io capacity and consisten cpu capacity delivering real time performance. If you don't give it as much IO as it needs it will not work properly. The 7.2 and older would crash in this situations, and 7.4 in this situations just don't allow you to connect not crashing.

> I did the test just after 2016-12-18 00:00, 
> and also, for better analysis,I cleared the 
> log on ndb node and initialized the ndb node before the test. 
> So there can't be heavy burden on it.

Not sure what you are talking about since 
I see logs here since 2016-11-18. What did you clear?
Did you send me correct logs?

This is all I have since 2016-12-18:

== starting the cluster ==

2016-12-18 00:17:46 [MgmtSrvr] INFO     -- Got initial configuration from '/var/lib/mysql-cluster/config.ini', will try to set it when all ndb_mgmd(s) started
2016-12-18 00:17:46 [MgmtSrvr] INFO     -- Id: 1, Command port: *:1186
2016-12-18 00:17:46 [MgmtSrvr] INFO     -- MySQL Cluster Management Server mysql-5.7.16 ndb-7.5.4 started
2016-12-18 00:17:46 [MgmtSrvr] INFO     -- Node 1: Node 1 Connected
2016-12-18 00:17:46 [MgmtSrvr] INFO     -- Node 1 connected
2016-12-18 00:18:04 [MgmtSrvr] INFO     -- Node 1: Node 2 Connected
2016-12-18 00:18:04 [MgmtSrvr] INFO     -- Node 2 connected
2016-12-18 00:18:04 [MgmtSrvr] INFO     -- Starting initial configuration change
2016-12-18 00:18:04 [MgmtSrvr] INFO     -- Configuration 1 commited
2016-12-18 00:18:04 [MgmtSrvr] INFO     -- Config change completed! New generation: 1
2016-12-18 00:18:49 [MgmtSrvr] INFO     -- Nodeid 3 allocated for NDB at 192.168.126.128
2016-12-18 00:18:49 [MgmtSrvr] INFO     -- Node 1: Node 3 Connected
2016-12-18 00:18:50 [MgmtSrvr] INFO     -- Node 3: Node 2 Connected
2016-12-18 00:19:04 [MgmtSrvr] INFO     -- Node 3: Buffering maximum epochs 100
2016-12-18 00:19:05 [MgmtSrvr] INFO     -- Node 3: Start phase 0 completed 
2016-12-18 00:19:05 [MgmtSrvr] INFO     -- Node 3: Communication to Node 4 opened

== data node 3 starting --initial ==

2016-12-18 00:19:05 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]
...
2016-12-18 00:19:32 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]

== problems with management node 2, disconnected then connected again                             == 
== if this is a network problem or something else I'm not sure I don't see that node 2 crashes so ==
2016-12-18 00:19:32 [MgmtSrvr] ALERT    -- Node 3: Node 2 Disconnected
2016-12-18 00:19:32 [MgmtSrvr] INFO     -- Node 3: Communication to Node 2 closed
2016-12-18 00:19:32 [MgmtSrvr] ALERT    -- Node 1: Node 2 Disconnected
2016-12-18 00:19:33 [MgmtSrvr] INFO     -- Node 3: Communication to Node 2 opened

== node 3 still starting ==

2016-12-18 00:19:35 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]
...
2016-12-18 00:22:02 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]

== node 4 failed to connect due to "no free node id" ==
2016-12-18 00:22:05 [MgmtSrvr] WARNING  -- Failed to allocate nodeid for NDB at 192.168.126.128. Returned error: 'No free node id found for ndbd(NDB).'

== still waiting for node 4 ==

2016-12-18 00:22:05 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]

== node 4 and node 3 issues wrt nodeid -> this is a problem with your config ==

2016-12-18 00:22:51 [MgmtSrvr] INFO     -- Alloc node id 4 failed, no new president yet
2016-12-18 00:22:51 [MgmtSrvr] INFO     -- Nodeid 4 allocated for NDB at 192.168.126.130
2016-12-18 00:22:52 [MgmtSrvr] INFO     -- Node 1: Node 4 Connected
2016-12-18 00:22:53 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]

== cluster still trying to start ==

2016-12-18 00:22:56 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]
...
2016-12-18 00:23:24 [MgmtSrvr] ALERT    -- Node 1: Node 4 Disconnected
2016-12-18 00:23:26 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]
2016-12-18 00:23:27 [MgmtSrvr] INFO     -- Node 1: Node 4 Connected
2016-12-18 00:23:27 [MgmtSrvr] INFO     -- Node 3: Node 4 Connected
2016-12-18 00:23:27 [MgmtSrvr] INFO     -- Node 3: Initial start with nodes 3 and 4 [ missing:  no-wait:  ]
2016-12-18 00:23:27 [MgmtSrvr] INFO     -- Node 3: CM_REGCONF president = 3, own Node = 3, our dynamic id = 0/1
2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 4: CM_REGCONF president = 3, own Node = 4, our dynamic id = 0/2
2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 3: Node 4: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 4: Node 3: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 4: Start phase 1 completed 
2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 3: Start phase 1 completed

== cluster finally starting ==

2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 3: System Restart: master node: 2, num starting: 3, gci: 0
2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 3: CNTR_START_CONF: started: 0000000000000000
2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 3: CNTR_START_CONF: starting: 0000000000000018
2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 3: Local redo log file initialization status:
#Total files: 64, Completed: 0
#Total MBytes: 1024, Completed: 0
2016-12-18 00:23:29 [MgmtSrvr] INFO     -- Node 4: Local redo log file initialization status:
#Total files: 64, Completed: 0
#Total MBytes: 1024, Completed: 0

== node 3 crash -> there has to be error log for this but you did not upload it in the archive ==
== there's a problem with your filesystem ==

2016-12-18 00:23:30 [MgmtSrvr] ALERT    -- Node 1: Node 3 Disconnected
2016-12-18 00:23:30 [MgmtSrvr] ALERT    -- Node 3: Forced node shutdown completed. Occured during startphase 2. Caused by error 2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.

== node 4 shutting down 'cause cluster didn't start ==

2016-12-18 00:23:31 [MgmtSrvr] ALERT    -- Node 4: Forced node shutdown completed. Occured during startphase 2. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
2016-12-18 00:23:31 [MgmtSrvr] ALERT    -- Node 1: Node 4 Disconnected
2016-12-18 00:24:19 [MgmtSrvr] INFO     -- Node 1: Node 2 Connected
2016-12-18 00:24:19 [MgmtSrvr] WARNING  -- Refusing other node, it's in different state: 1, expected: 2

== shutting down ==

2016-12-18 00:24:19 [MgmtSrvr] INFO     -- Node 2 connected
2016-12-18 00:24:40 [MgmtSrvr] INFO     -- Nodeid 3 allocated for NDB at 192.168.126.128
2016-12-18 00:24:40 [MgmtSrvr] INFO     -- Node 1: Node 3 Connected
2016-12-18 00:24:41 [MgmtSrvr] INFO     -- Node 3: Node 2 Connected
2016-12-18 00:24:51 [MgmtSrvr] INFO     -- Node 3: Buffering maximum epochs 100
2016-12-18 00:24:51 [MgmtSrvr] INFO     -- Node 3: Start phase 0 completed 
2016-12-18 00:24:51 [MgmtSrvr] INFO     -- Node 3: Communication to Node 4 opened
2016-12-18 00:24:51 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]
...
2016-12-18 00:25:23 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]
2016-12-18 00:25:26 [MgmtSrvr] INFO     -- Node 3: Node shutdown completed.
2016-12-18 00:25:26 [MgmtSrvr] ALERT    -- Node 1: Node 3 Disconnected
2016-12-18 00:25:29 [MgmtSrvr] INFO     -- Shutting down server...
2016-12-18 00:25:31 [MgmtSrvr] ALERT    -- Node 1: Node 2 Disconnected
2016-12-18 00:25:48 [MgmtSrvr] INFO     -- Shutdown complete

== starting cluster again == 

2016-12-18 00:26:26 [MgmtSrvr] INFO     -- Got initial configuration from '/var/lib/mysql-cluster/config.ini', will try to set it when all ndb_mgmd(s) started
2016-12-18 00:26:26 [MgmtSrvr] INFO     -- Id: 1, Command port: *:1186
2016-12-18 00:26:26 [MgmtSrvr] INFO     -- MySQL Cluster Management Server mysql-5.7.16 ndb-7.5.4 started
2016-12-18 00:26:26 [MgmtSrvr] INFO     -- Node 1: Node 1 Connected
2016-12-18 00:26:26 [MgmtSrvr] INFO     -- Node 1: Node 2 Connected
2016-12-18 00:26:26 [MgmtSrvr] INFO     -- Node 2 connected
2016-12-18 00:26:26 [MgmtSrvr] INFO     -- Node 1 connected
2016-12-18 00:26:26 [MgmtSrvr] INFO     -- Starting initial configuration change
2016-12-18 00:26:26 [MgmtSrvr] INFO     -- Configuration 1 commited
2016-12-18 00:26:26 [MgmtSrvr] INFO     -- Config change completed! New generation: 1
2016-12-18 00:26:33 [MgmtSrvr] INFO     -- Nodeid 4 allocated for NDB at 192.168.126.130
2016-12-18 00:26:34 [MgmtSrvr] INFO     -- Node 1: Node 4 Connected
2016-12-18 00:26:41 [MgmtSrvr] INFO     -- Alloc node id 3 failed, no new president yet
2016-12-18 00:26:41 [MgmtSrvr] INFO     -- Nodeid 3 allocated for NDB at 192.168.126.128
2016-12-18 00:26:41 [MgmtSrvr] INFO     -- Node 1: Node 3 Connected
2016-12-18 00:26:41 [MgmtSrvr] INFO     -- Node 3: Node 2 Connected
2016-12-18 00:26:49 [MgmtSrvr] ALERT    -- Node 1: Node 4 Disconnected
2016-12-18 00:26:51 [MgmtSrvr] INFO     -- Node 3: Buffering maximum epochs 100
2016-12-18 00:26:51 [MgmtSrvr] INFO     -- Node 3: Start phase 0 completed 
2016-12-18 00:26:51 [MgmtSrvr] INFO     -- Node 3: Communication to Node 4 opened
2016-12-18 00:26:51 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]
2016-12-18 00:26:53 [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 4 to connect,  nodes [ all: 3 and 4 connected: 3 no-wait:  ]
2016-12-18 00:26:53 [MgmtSrvr] INFO     -- Node 3: Node 4 Connected
2016-12-18 00:26:53 [MgmtSrvr] INFO     -- Node 3: Initial start with nodes 3 and 4 [ missing:  no-wait:  ]

== this time starting ok ==

2016-12-18 00:26:53 [MgmtSrvr] INFO     -- Node 3: CM_REGCONF president = 3, own Node = 3, our dynamic id = 0/1
2016-12-18 00:26:54 [MgmtSrvr] INFO     -- Node 1: Node 4 Connected
2016-12-18 00:26:55 [MgmtSrvr] INFO     -- Node 4: CM_REGCONF president = 3, own Node = 4, our dynamic id = 0/2
2016-12-18 00:26:55 [MgmtSrvr] INFO     -- Node 3: Node 4: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:26:55 [MgmtSrvr] INFO     -- Node 4: Node 3: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:26:55 [MgmtSrvr] INFO     -- Node 4: Start phase 1 completed 
2016-12-18 00:26:55 [MgmtSrvr] INFO     -- Node 3: Start phase 1 completed 
2016-12-18 00:26:55 [MgmtSrvr] INFO     -- Node 3: System Restart: master node: 2, num starting: 3, gci: 0
2016-12-18 00:26:56 [MgmtSrvr] INFO     -- Node 3: CNTR_START_CONF: started: 0000000000000000
2016-12-18 00:26:56 [MgmtSrvr] INFO     -- Node 3: CNTR_START_CONF: starting: 0000000000000018
2016-12-18 00:26:56 [MgmtSrvr] INFO     -- Node 3: Local redo log file initialization status:
#Total files: 64, Completed: 0
#Total MBytes: 1024, Completed: 0
2016-12-18 00:26:56 [MgmtSrvr] INFO     -- Node 4: Local redo log file initialization status:
#Total files: 64, Completed: 0
#Total MBytes: 1024, Completed: 0
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Local redo log file initialization completed:
#Total files: 64, Completed: 64
#Total MBytes: 1024, Completed: 1024
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Start phase 2 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Local redo log file initialization completed:
#Total files: 64, Completed: 64
#Total MBytes: 1024, Completed: 1024
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Start phase 2 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Start phase 3 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Start phase 3 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Start phase 4 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Start phase 4 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Make On-line Database recoverable by waiting for LCP Starting, LCP id = 1
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Local checkpoint 1 started. Keep GCI = 1 oldest restorable GCI = 1
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: LDM(0): Completed LCP, #frags = 4 #records = 2048, #bytes = 65616
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: LDM(0): Completed LCP, #frags = 4 #records = 2048, #bytes = 65616
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Local checkpoint 1 completed
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Make On-line Database recoverable by waiting for LCP Completed, LCP id = 1
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Start phase 5 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Start phase 5 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Start phase 6 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Start phase 6 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: President restarts arbitration thread [state=1]
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Start phase 7 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Start phase 7 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Start phase 8 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Start phase 8 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Start phase 9 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Start phase 9 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Start phase 101 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Start phase 101 completed (initial start)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Started (mysql-5.7.16 ndb-7.5.4)
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Started (mysql-5.7.16 ndb-7.5.4)

== finally cluster started ==

2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Node 2: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Node 2: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Node 1: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 3: Node 1: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Communication to Node 5 opened
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Communication to Node 6 opened
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Communication to Node 7 opened
2016-12-18 00:26:57 [MgmtSrvr] INFO     -- Node 4: Communication to Node 8 opened

== 2 connections from mysql node == 
== and you see them now connecting and disconnecting ==
== no errors, no problems ==

2016-12-18 00:26:58 [MgmtSrvr] INFO     -- Node 4: Prepare arbitrator node 1 [ticket=1a080001000e87e2]
2016-12-18 00:26:58 [MgmtSrvr] INFO     -- Node 3: Started arbitrator node 1 [ticket=1a080001000e87e2]
2016-12-18 00:26:58 [MgmtSrvr] INFO     -- Node 3: Communication to Node 5 opened
2016-12-18 00:26:58 [MgmtSrvr] INFO     -- Node 3: Communication to Node 6 opened
2016-12-18 00:26:58 [MgmtSrvr] INFO     -- Node 3: Communication to Node 7 opened
2016-12-18 00:26:58 [MgmtSrvr] INFO     -- Node 3: Communication to Node 8 opened
2016-12-18 00:27:50 [MgmtSrvr] INFO     -- Alloc node id 6 succeeded
2016-12-18 00:27:50 [MgmtSrvr] INFO     -- Nodeid 6 allocated for API at 192.168.126.130
2016-12-18 00:27:50 [MgmtSrvr] INFO     -- Node 6: mysqld --server-id=0
2016-12-18 00:27:50 [MgmtSrvr] INFO     -- Node 3: Node 6 Connected
2016-12-18 00:27:50 [MgmtSrvr] INFO     -- Node 4: Node 6 Connected
2016-12-18 00:27:50 [MgmtSrvr] INFO     -- Node 3: Node 6: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:27:50 [MgmtSrvr] INFO     -- Node 4: Node 6: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:28:04 [MgmtSrvr] INFO     -- Alloc node id 5 succeeded
2016-12-18 00:28:04 [MgmtSrvr] INFO     -- Nodeid 5 allocated for API at 192.168.126.128
2016-12-18 00:28:04 [MgmtSrvr] INFO     -- Node 5: mysqld --server-id=0
2016-12-18 00:28:05 [MgmtSrvr] INFO     -- Node 3: Node 5 Connected
2016-12-18 00:28:05 [MgmtSrvr] INFO     -- Node 4: Node 5 Connected
2016-12-18 00:28:05 [MgmtSrvr] INFO     -- Node 3: Node 5: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:28:05 [MgmtSrvr] INFO     -- Node 4: Node 5: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:52:53 [MgmtSrvr] ALERT    -- Node 4: Node 6 Disconnected
2016-12-18 00:52:53 [MgmtSrvr] INFO     -- Node 4: Communication to Node 6 closed
2016-12-18 00:52:53 [MgmtSrvr] INFO     -- Node 3: Communication to Node 6 closed
2016-12-18 00:52:53 [MgmtSrvr] ALERT    -- Node 3: Node 6 Disconnected
2016-12-18 00:52:55 [MgmtSrvr] INFO     -- Alloc node id 7 succeeded
2016-12-18 00:52:55 [MgmtSrvr] INFO     -- Nodeid 7 allocated for API at 192.168.126.130
2016-12-18 00:52:55 [MgmtSrvr] INFO     -- Node 7: mysqld --server-id=0
2016-12-18 00:52:56 [MgmtSrvr] INFO     -- Node 3: Node 7 Connected
2016-12-18 00:52:56 [MgmtSrvr] INFO     -- Node 4: Node 7 Connected
2016-12-18 00:52:56 [MgmtSrvr] INFO     -- Node 3: Node 7: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:52:56 [MgmtSrvr] INFO     -- Node 4: Node 7: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:52:56 [MgmtSrvr] INFO     -- Node 4: Communication to Node 6 opened
2016-12-18 00:52:56 [MgmtSrvr] INFO     -- Node 3: Communication to Node 6 opened
2016-12-18 00:53:17 [MgmtSrvr] ALERT    -- Node 4: Node 5 Disconnected
2016-12-18 00:53:17 [MgmtSrvr] INFO     -- Node 4: Communication to Node 5 closed
2016-12-18 00:53:17 [MgmtSrvr] INFO     -- Node 3: Communication to Node 5 closed
2016-12-18 00:53:17 [MgmtSrvr] ALERT    -- Node 3: Node 5 Disconnected
2016-12-18 00:53:19 [MgmtSrvr] INFO     -- Alloc node id 8 succeeded
2016-12-18 00:53:19 [MgmtSrvr] INFO     -- Nodeid 8 allocated for API at 192.168.126.128
2016-12-18 00:53:19 [MgmtSrvr] INFO     -- Node 8: mysqld --server-id=0
2016-12-18 00:53:20 [MgmtSrvr] INFO     -- Node 4: Node 8 Connected
2016-12-18 00:53:20 [MgmtSrvr] INFO     -- Node 3: Node 8 Connected
2016-12-18 00:53:20 [MgmtSrvr] INFO     -- Node 3: Node 8: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:53:20 [MgmtSrvr] INFO     -- Node 4: Node 8: API mysql-5.7.16 ndb-7.5.4
2016-12-18 00:53:20 [MgmtSrvr] INFO     -- Node 4: Communication to Node 5 opened
2016-12-18 00:53:20 [MgmtSrvr] INFO     -- Node 3: Communication to Node 5 opened

So as you can see, there's nothing in the log showing you have a problem. there is a problem with starting cluster but that's related to your config that's missing nodeid for each defined node, other then that, there's zero issues reported in the management log so no errors happened.

The data node log show only a lot of:
2016-12-18 00:27:00 [ndbd] INFO     -- Watchdog: User time: 78  System time: 2041
2016-12-18 00:27:00 [ndbd] INFO     -- timerHandlingLab, expected 10ms sleep, not scheduled for: 308 (ms)
2016-12-18 00:27:00 [ndbd] INFO     -- Watchdog: User time: 78  System time: 2042

and that's "overload" cpu or io (usually IO) and that's normal if you are running cluster on vm inside a laptop, reconfiguring it differently could help a bit but not much.

So definitively not a bug

take care
Bogdan Kecman

If you were definitely, I will not say too much. It's OK on ndb-7.4.13.
I'm not sure will the log files record everything, why not do a small test using the c++ code I have provided and tell me you run it OK? I think it's more persuasive.

Hi,

> I'm not sure will the log files record everything

They do

> why not do a small test using the c++ code I have
> provided and tell me you run it OK? I think it's more persuasive.

I did that test before I wrote anything to this issue, it works normally, of course cluster is not running in a vm inside my laptop (it's running on 4 nodes running in vm's on a dedicated host running proxmox).

As I said, fix your config for start (add nodeid parameter for each node and manually assign node id's and add few more mysqld slots and at least 2 api slots) and test it on a hw where you can dedicate some io to it.

take care
Bogdan

hi,
Actually, it also happened on a cluster of six nodes, part of the config is like this:
[ndbd default]
NoOfReplicas=2   
DataMemory=435G  # How much memory to allocate for data storage
IndexMemory=11G  # How much memory to allocate for index
TimeBetweenLocalCheckpoints=5
FragmentLogFileSize=64M  #default is 16M
NoOfFragmentLogFiles=256 #default is 16
MaxNoOfConcurrentTransactions=1000000 #default is 4096
MaxNoOfConcurrentOperations=1000000   #default is 32K
MaxNoOfLocalOperations=1100000
MaxNoOfAttributes=400000
MaxNoOfTables=40000
MaxNoOfOrderedIndexes=4096
MaxNoOfExecutionThreads=16
NoOfFragmentLogParts=8
TransactionDeadLockDetectionTimeOut=5000 #default is 1200 ms
# LockPagesInMainMemory=1
[tcp default]
SendBufferMemory=128M
ReceiveBufferMemory=128M
# portnumber=2202
[ndb_mgmd]
NodeId=1
...
But I can't provide the log from this cluster.It may not because of the missing of nodeid parameter or lack of performance.
Whatever, thank you very much.