Bug #84135 | when create table, it thow an exception "Could not connect to storage engine" | ||
---|---|---|---|
Submitted: | 9 Dec 2016 2:01 | Modified: | 21 Dec 2016 1:08 |
Reporter: | ll lucky | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | mysql-5.7.16 ndb-7.5.4 | OS: | CentOS (CentOS Linux release 7.2.1511 (Core)) |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
Tags: | create table error, MySQL Cluster, ndb, storage engine |
[9 Dec 2016 2:01]
ll lucky
[9 Dec 2016 2:33]
ll lucky
In addition, I install mysql-cluster with the version under "Linux-Generic", and I choosed a 64-bit version, my CentOS is also x64.
[12 Dec 2016 12:41]
MySQL Verification Team
Hi, we need a complete log from the cluster in order to see what is wrong. Please use ndb_error_reporter tool to collect logs and upload them to this bug report thanks Bogdan Kecman
[16 Dec 2016 10:10]
ll lucky
error_report
Attachment: ndb_error_report_20161216000024.tar.bz2 (application/octet-stream, text), 194.47 KiB.
[16 Dec 2016 10:12]
ll lucky
I upload a file that error_report generated, and I saw the log file, but there seems nothing useful info.
[16 Dec 2016 12:12]
MySQL Verification Team
hi, node2 and node3 logs are missing from this archive. I see number of restarts, don't see if they are manual or from crashes (missing other log files). In any way I see a lot of heartbeat errors. Can you please check your network between nodes and cpu usage on the nodes. Looking further looks like you are using MCM to maintain your cluster. Can you grab all the logs from the MCM ? https://dev.mysql.com/doc/mysql-cluster-manager/1.3/en/mcm-collect-logs.html thanks Bogdan
[17 Dec 2016 17:01]
ll lucky
a complete bug-report
Attachment: ndb_error_report_20161218005612.tar.bz2 (application/octet-stream, text), 117.75 KiB.
[17 Dec 2016 17:15]
ll lucky
I'm sorry, there was something wrong with the ssh, and I cost some time to fix it. I upload a bug-report file that should be complete. I cleared some log in ndb node before times for better analysis,and then tried to execute the code to create table dozen of times and the error occured again.Seventeen was an interesting number, after about seventeen times for each sql node,the exception began to throw. Below was the cpp code that I use, I think it is easy to reproduce: int testMysql{ sql::Statement *m_stmt = NULL; try { sql::Driver *mdriver = get_driver_instance();; sql::Connection *m_con = mdriver->connect("192.168.126.130", "mysqlcluster", "Grid2016"); m_con->setSchema("test"); m_stmt = m_con->createStatement(); m_stmt->execute("CREATE TABLE IF NOT EXISTS thisisatest(id int(10), name varchar(32), score int(10)) \ ENGINE=ndbcluster AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;"); delete m_stmt; AfxMessageBox(L"εε»Ίζε",0,0); } catch (sql::SQLException &e) { delete m_stmt; CString str(e.what()); LPCTSTR lpstr = str; AfxMessageBox(lpstr,0, 0); } return 0; } In the end, thank you for your attention.
[18 Dec 2016 20:29]
MySQL Verification Team
hi, are there files ndb_2_error.log and ndb_3_error.log? I see number of force shutdowns and restarts in the log - were you doing this manually or? Can you give me a timestamp when you for e.g. have this exception? Did you look at the cpu usage and io usage on your data nodes when this exception occurs? Looking at your log file, there is no bug here, you are just overloading your data nodes with more then the hardware can handle, if you look at out.log files you will see for example this: 2016-12-18 00:26:47 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1358 2016-12-18 00:26:47 [ndbd] INFO -- Watchdog: User time: 21 System time: 690 2016-12-18 00:26:47 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1462 2016-12-18 00:26:47 [ndbd] INFO -- Watchdog: User time: 21 System time: 695 2016-12-18 00:26:47 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1571 2016-12-18 00:26:47 [ndbd] INFO -- Watchdog: User time: 22 System time: 706 2016-12-18 00:26:47 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1674 2016-12-18 00:26:47 [ndbd] INFO -- Watchdog: User time: 22 System time: 712 2016-12-18 00:26:47 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1778 2016-12-18 00:26:47 [ndbd] INFO -- Watchdog: User time: 22 System time: 721 2016-12-18 00:26:47 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1882 2016-12-18 00:26:47 [ndbd] INFO -- Watchdog: User time: 22 System time: 726 2016-12-18 00:26:47 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=1986 2016-12-18 00:26:47 [ndbd] INFO -- Watchdog: User time: 22 System time: 731 2016-12-18 00:26:48 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2089 2016-12-18 00:26:48 [ndbd] INFO -- Watchdog: User time: 22 System time: 736 2016-12-18 00:26:48 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2192 2016-12-18 00:26:48 [ndbd] INFO -- Watchdog: User time: 22 System time: 740 2016-12-18 00:26:48 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2293 2016-12-18 00:26:48 [ndbd] INFO -- Watchdog: User time: 22 System time: 740 2016-12-18 00:26:48 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2424 2016-12-18 00:26:48 [ndbd] INFO -- Watchdog: User time: 22 System time: 758 2016-12-18 00:26:48 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2524 2016-12-18 00:26:48 [ndbd] INFO -- Watchdog: User time: 22 System time: 775 2016-12-18 00:26:48 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=2624 2016-12-18 00:26:48 [ndbd] INFO -- Watchdog: User time: 22 System time: 787 that means your data node is stuck overloaded for many milliseconds more then it should be I do not see this is a bug, this is just a normal behavior of missconfigured / misscaled cluster setup. Properly configuring the MySQL Cluster setup is something our support team can help you out with. take care Bogdan Kecman
[19 Dec 2016 13:32]
ll lucky
This cluster was built on my vmware virtual machine, so frequently shutdown and restart is normal.I can't run the virtual machine all the time on my notebook. I did the test just after 2016-12-18 00:00, and also, for better analysis,I cleared the log on ndb node and initialized the ndb node before the test. So there can't be heavy burden on it. If it was overloaded, how could it was not a bug? I can't believe that. It will not happen in version 7.4.13. By the way, there was no any error log or slow log.
[19 Dec 2016 14:15]
MySQL Verification Team
Hi, > This cluster was built on my vmware virtual machine, > so frequently shutdown and restart is normal. > I can't run the virtual machine all the time on my notebook. A VM is not really a best env for a mysql cluster and a notebook hosting them is 100% not a proper place to test mysql cluster :(. It's enough that your host was opening a file and you had high io load and cluster will report problems. It's a system that depend on consistent io capacity and consisten cpu capacity delivering real time performance. If you don't give it as much IO as it needs it will not work properly. The 7.2 and older would crash in this situations, and 7.4 in this situations just don't allow you to connect not crashing. > I did the test just after 2016-12-18 00:00, > and also, for better analysis,I cleared the > log on ndb node and initialized the ndb node before the test. > So there can't be heavy burden on it. Not sure what you are talking about since I see logs here since 2016-11-18. What did you clear? Did you send me correct logs?
[19 Dec 2016 14:16]
MySQL Verification Team
This is all I have since 2016-12-18: == starting the cluster == 2016-12-18 00:17:46 [MgmtSrvr] INFO -- Got initial configuration from '/var/lib/mysql-cluster/config.ini', will try to set it when all ndb_mgmd(s) started 2016-12-18 00:17:46 [MgmtSrvr] INFO -- Id: 1, Command port: *:1186 2016-12-18 00:17:46 [MgmtSrvr] INFO -- MySQL Cluster Management Server mysql-5.7.16 ndb-7.5.4 started 2016-12-18 00:17:46 [MgmtSrvr] INFO -- Node 1: Node 1 Connected 2016-12-18 00:17:46 [MgmtSrvr] INFO -- Node 1 connected 2016-12-18 00:18:04 [MgmtSrvr] INFO -- Node 1: Node 2 Connected 2016-12-18 00:18:04 [MgmtSrvr] INFO -- Node 2 connected 2016-12-18 00:18:04 [MgmtSrvr] INFO -- Starting initial configuration change 2016-12-18 00:18:04 [MgmtSrvr] INFO -- Configuration 1 commited 2016-12-18 00:18:04 [MgmtSrvr] INFO -- Config change completed! New generation: 1 2016-12-18 00:18:49 [MgmtSrvr] INFO -- Nodeid 3 allocated for NDB at 192.168.126.128 2016-12-18 00:18:49 [MgmtSrvr] INFO -- Node 1: Node 3 Connected 2016-12-18 00:18:50 [MgmtSrvr] INFO -- Node 3: Node 2 Connected 2016-12-18 00:19:04 [MgmtSrvr] INFO -- Node 3: Buffering maximum epochs 100 2016-12-18 00:19:05 [MgmtSrvr] INFO -- Node 3: Start phase 0 completed 2016-12-18 00:19:05 [MgmtSrvr] INFO -- Node 3: Communication to Node 4 opened == data node 3 starting --initial == 2016-12-18 00:19:05 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] ... 2016-12-18 00:19:32 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] == problems with management node 2, disconnected then connected again == == if this is a network problem or something else I'm not sure I don't see that node 2 crashes so == 2016-12-18 00:19:32 [MgmtSrvr] ALERT -- Node 3: Node 2 Disconnected 2016-12-18 00:19:32 [MgmtSrvr] INFO -- Node 3: Communication to Node 2 closed 2016-12-18 00:19:32 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected 2016-12-18 00:19:33 [MgmtSrvr] INFO -- Node 3: Communication to Node 2 opened == node 3 still starting == 2016-12-18 00:19:35 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] ... 2016-12-18 00:22:02 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] == node 4 failed to connect due to "no free node id" == 2016-12-18 00:22:05 [MgmtSrvr] WARNING -- Failed to allocate nodeid for NDB at 192.168.126.128. Returned error: 'No free node id found for ndbd(NDB).' == still waiting for node 4 == 2016-12-18 00:22:05 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] == node 4 and node 3 issues wrt nodeid -> this is a problem with your config == 2016-12-18 00:22:51 [MgmtSrvr] INFO -- Alloc node id 4 failed, no new president yet 2016-12-18 00:22:51 [MgmtSrvr] INFO -- Nodeid 4 allocated for NDB at 192.168.126.130 2016-12-18 00:22:52 [MgmtSrvr] INFO -- Node 1: Node 4 Connected 2016-12-18 00:22:53 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] == cluster still trying to start == 2016-12-18 00:22:56 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] ... 2016-12-18 00:23:24 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected 2016-12-18 00:23:26 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] 2016-12-18 00:23:27 [MgmtSrvr] INFO -- Node 1: Node 4 Connected 2016-12-18 00:23:27 [MgmtSrvr] INFO -- Node 3: Node 4 Connected 2016-12-18 00:23:27 [MgmtSrvr] INFO -- Node 3: Initial start with nodes 3 and 4 [ missing: no-wait: ] 2016-12-18 00:23:27 [MgmtSrvr] INFO -- Node 3: CM_REGCONF president = 3, own Node = 3, our dynamic id = 0/1 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 4: CM_REGCONF president = 3, own Node = 4, our dynamic id = 0/2 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 3: Node 4: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 4: Node 3: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 4: Start phase 1 completed 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 3: Start phase 1 completed
[19 Dec 2016 14:17]
MySQL Verification Team
== cluster finally starting == 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 3: System Restart: master node: 2, num starting: 3, gci: 0 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 3: CNTR_START_CONF: started: 0000000000000000 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 3: CNTR_START_CONF: starting: 0000000000000018 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 3: Local redo log file initialization status: #Total files: 64, Completed: 0 #Total MBytes: 1024, Completed: 0 2016-12-18 00:23:29 [MgmtSrvr] INFO -- Node 4: Local redo log file initialization status: #Total files: 64, Completed: 0 #Total MBytes: 1024, Completed: 0 == node 3 crash -> there has to be error log for this but you did not upload it in the archive == == there's a problem with your filesystem == 2016-12-18 00:23:30 [MgmtSrvr] ALERT -- Node 1: Node 3 Disconnected 2016-12-18 00:23:30 [MgmtSrvr] ALERT -- Node 3: Forced node shutdown completed. Occured during startphase 2. Caused by error 2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'. == node 4 shutting down 'cause cluster didn't start == 2016-12-18 00:23:31 [MgmtSrvr] ALERT -- Node 4: Forced node shutdown completed. Occured during startphase 2. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'. 2016-12-18 00:23:31 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected 2016-12-18 00:24:19 [MgmtSrvr] INFO -- Node 1: Node 2 Connected 2016-12-18 00:24:19 [MgmtSrvr] WARNING -- Refusing other node, it's in different state: 1, expected: 2 == shutting down == 2016-12-18 00:24:19 [MgmtSrvr] INFO -- Node 2 connected 2016-12-18 00:24:40 [MgmtSrvr] INFO -- Nodeid 3 allocated for NDB at 192.168.126.128 2016-12-18 00:24:40 [MgmtSrvr] INFO -- Node 1: Node 3 Connected 2016-12-18 00:24:41 [MgmtSrvr] INFO -- Node 3: Node 2 Connected 2016-12-18 00:24:51 [MgmtSrvr] INFO -- Node 3: Buffering maximum epochs 100 2016-12-18 00:24:51 [MgmtSrvr] INFO -- Node 3: Start phase 0 completed 2016-12-18 00:24:51 [MgmtSrvr] INFO -- Node 3: Communication to Node 4 opened 2016-12-18 00:24:51 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] ... 2016-12-18 00:25:23 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] 2016-12-18 00:25:26 [MgmtSrvr] INFO -- Node 3: Node shutdown completed. 2016-12-18 00:25:26 [MgmtSrvr] ALERT -- Node 1: Node 3 Disconnected 2016-12-18 00:25:29 [MgmtSrvr] INFO -- Shutting down server... 2016-12-18 00:25:31 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected 2016-12-18 00:25:48 [MgmtSrvr] INFO -- Shutdown complete == starting cluster again == 2016-12-18 00:26:26 [MgmtSrvr] INFO -- Got initial configuration from '/var/lib/mysql-cluster/config.ini', will try to set it when all ndb_mgmd(s) started 2016-12-18 00:26:26 [MgmtSrvr] INFO -- Id: 1, Command port: *:1186 2016-12-18 00:26:26 [MgmtSrvr] INFO -- MySQL Cluster Management Server mysql-5.7.16 ndb-7.5.4 started 2016-12-18 00:26:26 [MgmtSrvr] INFO -- Node 1: Node 1 Connected 2016-12-18 00:26:26 [MgmtSrvr] INFO -- Node 1: Node 2 Connected 2016-12-18 00:26:26 [MgmtSrvr] INFO -- Node 2 connected 2016-12-18 00:26:26 [MgmtSrvr] INFO -- Node 1 connected 2016-12-18 00:26:26 [MgmtSrvr] INFO -- Starting initial configuration change 2016-12-18 00:26:26 [MgmtSrvr] INFO -- Configuration 1 commited 2016-12-18 00:26:26 [MgmtSrvr] INFO -- Config change completed! New generation: 1 2016-12-18 00:26:33 [MgmtSrvr] INFO -- Nodeid 4 allocated for NDB at 192.168.126.130 2016-12-18 00:26:34 [MgmtSrvr] INFO -- Node 1: Node 4 Connected 2016-12-18 00:26:41 [MgmtSrvr] INFO -- Alloc node id 3 failed, no new president yet 2016-12-18 00:26:41 [MgmtSrvr] INFO -- Nodeid 3 allocated for NDB at 192.168.126.128 2016-12-18 00:26:41 [MgmtSrvr] INFO -- Node 1: Node 3 Connected 2016-12-18 00:26:41 [MgmtSrvr] INFO -- Node 3: Node 2 Connected 2016-12-18 00:26:49 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected 2016-12-18 00:26:51 [MgmtSrvr] INFO -- Node 3: Buffering maximum epochs 100 2016-12-18 00:26:51 [MgmtSrvr] INFO -- Node 3: Start phase 0 completed 2016-12-18 00:26:51 [MgmtSrvr] INFO -- Node 3: Communication to Node 4 opened 2016-12-18 00:26:51 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] 2016-12-18 00:26:53 [MgmtSrvr] INFO -- Node 3: Initial start, waiting for 4 to connect, nodes [ all: 3 and 4 connected: 3 no-wait: ] 2016-12-18 00:26:53 [MgmtSrvr] INFO -- Node 3: Node 4 Connected 2016-12-18 00:26:53 [MgmtSrvr] INFO -- Node 3: Initial start with nodes 3 and 4 [ missing: no-wait: ]
[19 Dec 2016 14:18]
MySQL Verification Team
== this time starting ok == 2016-12-18 00:26:53 [MgmtSrvr] INFO -- Node 3: CM_REGCONF president = 3, own Node = 3, our dynamic id = 0/1 2016-12-18 00:26:54 [MgmtSrvr] INFO -- Node 1: Node 4 Connected 2016-12-18 00:26:55 [MgmtSrvr] INFO -- Node 4: CM_REGCONF president = 3, own Node = 4, our dynamic id = 0/2 2016-12-18 00:26:55 [MgmtSrvr] INFO -- Node 3: Node 4: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:26:55 [MgmtSrvr] INFO -- Node 4: Node 3: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:26:55 [MgmtSrvr] INFO -- Node 4: Start phase 1 completed 2016-12-18 00:26:55 [MgmtSrvr] INFO -- Node 3: Start phase 1 completed 2016-12-18 00:26:55 [MgmtSrvr] INFO -- Node 3: System Restart: master node: 2, num starting: 3, gci: 0 2016-12-18 00:26:56 [MgmtSrvr] INFO -- Node 3: CNTR_START_CONF: started: 0000000000000000 2016-12-18 00:26:56 [MgmtSrvr] INFO -- Node 3: CNTR_START_CONF: starting: 0000000000000018 2016-12-18 00:26:56 [MgmtSrvr] INFO -- Node 3: Local redo log file initialization status: #Total files: 64, Completed: 0 #Total MBytes: 1024, Completed: 0 2016-12-18 00:26:56 [MgmtSrvr] INFO -- Node 4: Local redo log file initialization status: #Total files: 64, Completed: 0 #Total MBytes: 1024, Completed: 0 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Local redo log file initialization completed: #Total files: 64, Completed: 64 #Total MBytes: 1024, Completed: 1024 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Start phase 2 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Local redo log file initialization completed: #Total files: 64, Completed: 64 #Total MBytes: 1024, Completed: 1024 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Start phase 2 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Start phase 3 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Start phase 3 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Start phase 4 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Start phase 4 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Make On-line Database recoverable by waiting for LCP Starting, LCP id = 1 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Local checkpoint 1 started. Keep GCI = 1 oldest restorable GCI = 1 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: LDM(0): Completed LCP, #frags = 4 #records = 2048, #bytes = 65616 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: LDM(0): Completed LCP, #frags = 4 #records = 2048, #bytes = 65616 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Local checkpoint 1 completed 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Make On-line Database recoverable by waiting for LCP Completed, LCP id = 1 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Start phase 5 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Start phase 5 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Start phase 6 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Start phase 6 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: President restarts arbitration thread [state=1] 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Start phase 7 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Start phase 7 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Start phase 8 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Start phase 8 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Start phase 9 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Start phase 9 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Start phase 101 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Start phase 101 completed (initial start) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Started (mysql-5.7.16 ndb-7.5.4) 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Started (mysql-5.7.16 ndb-7.5.4)
[19 Dec 2016 14:18]
MySQL Verification Team
== finally cluster started == 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Node 2: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Node 2: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Node 1: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 3: Node 1: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Communication to Node 5 opened 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Communication to Node 6 opened 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Communication to Node 7 opened 2016-12-18 00:26:57 [MgmtSrvr] INFO -- Node 4: Communication to Node 8 opened == 2 connections from mysql node == == and you see them now connecting and disconnecting == == no errors, no problems == 2016-12-18 00:26:58 [MgmtSrvr] INFO -- Node 4: Prepare arbitrator node 1 [ticket=1a080001000e87e2] 2016-12-18 00:26:58 [MgmtSrvr] INFO -- Node 3: Started arbitrator node 1 [ticket=1a080001000e87e2] 2016-12-18 00:26:58 [MgmtSrvr] INFO -- Node 3: Communication to Node 5 opened 2016-12-18 00:26:58 [MgmtSrvr] INFO -- Node 3: Communication to Node 6 opened 2016-12-18 00:26:58 [MgmtSrvr] INFO -- Node 3: Communication to Node 7 opened 2016-12-18 00:26:58 [MgmtSrvr] INFO -- Node 3: Communication to Node 8 opened 2016-12-18 00:27:50 [MgmtSrvr] INFO -- Alloc node id 6 succeeded 2016-12-18 00:27:50 [MgmtSrvr] INFO -- Nodeid 6 allocated for API at 192.168.126.130 2016-12-18 00:27:50 [MgmtSrvr] INFO -- Node 6: mysqld --server-id=0 2016-12-18 00:27:50 [MgmtSrvr] INFO -- Node 3: Node 6 Connected 2016-12-18 00:27:50 [MgmtSrvr] INFO -- Node 4: Node 6 Connected 2016-12-18 00:27:50 [MgmtSrvr] INFO -- Node 3: Node 6: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:27:50 [MgmtSrvr] INFO -- Node 4: Node 6: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:28:04 [MgmtSrvr] INFO -- Alloc node id 5 succeeded 2016-12-18 00:28:04 [MgmtSrvr] INFO -- Nodeid 5 allocated for API at 192.168.126.128 2016-12-18 00:28:04 [MgmtSrvr] INFO -- Node 5: mysqld --server-id=0 2016-12-18 00:28:05 [MgmtSrvr] INFO -- Node 3: Node 5 Connected 2016-12-18 00:28:05 [MgmtSrvr] INFO -- Node 4: Node 5 Connected 2016-12-18 00:28:05 [MgmtSrvr] INFO -- Node 3: Node 5: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:28:05 [MgmtSrvr] INFO -- Node 4: Node 5: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:52:53 [MgmtSrvr] ALERT -- Node 4: Node 6 Disconnected 2016-12-18 00:52:53 [MgmtSrvr] INFO -- Node 4: Communication to Node 6 closed 2016-12-18 00:52:53 [MgmtSrvr] INFO -- Node 3: Communication to Node 6 closed 2016-12-18 00:52:53 [MgmtSrvr] ALERT -- Node 3: Node 6 Disconnected 2016-12-18 00:52:55 [MgmtSrvr] INFO -- Alloc node id 7 succeeded 2016-12-18 00:52:55 [MgmtSrvr] INFO -- Nodeid 7 allocated for API at 192.168.126.130 2016-12-18 00:52:55 [MgmtSrvr] INFO -- Node 7: mysqld --server-id=0 2016-12-18 00:52:56 [MgmtSrvr] INFO -- Node 3: Node 7 Connected 2016-12-18 00:52:56 [MgmtSrvr] INFO -- Node 4: Node 7 Connected 2016-12-18 00:52:56 [MgmtSrvr] INFO -- Node 3: Node 7: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:52:56 [MgmtSrvr] INFO -- Node 4: Node 7: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:52:56 [MgmtSrvr] INFO -- Node 4: Communication to Node 6 opened 2016-12-18 00:52:56 [MgmtSrvr] INFO -- Node 3: Communication to Node 6 opened 2016-12-18 00:53:17 [MgmtSrvr] ALERT -- Node 4: Node 5 Disconnected 2016-12-18 00:53:17 [MgmtSrvr] INFO -- Node 4: Communication to Node 5 closed 2016-12-18 00:53:17 [MgmtSrvr] INFO -- Node 3: Communication to Node 5 closed 2016-12-18 00:53:17 [MgmtSrvr] ALERT -- Node 3: Node 5 Disconnected 2016-12-18 00:53:19 [MgmtSrvr] INFO -- Alloc node id 8 succeeded 2016-12-18 00:53:19 [MgmtSrvr] INFO -- Nodeid 8 allocated for API at 192.168.126.128 2016-12-18 00:53:19 [MgmtSrvr] INFO -- Node 8: mysqld --server-id=0 2016-12-18 00:53:20 [MgmtSrvr] INFO -- Node 4: Node 8 Connected 2016-12-18 00:53:20 [MgmtSrvr] INFO -- Node 3: Node 8 Connected 2016-12-18 00:53:20 [MgmtSrvr] INFO -- Node 3: Node 8: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:53:20 [MgmtSrvr] INFO -- Node 4: Node 8: API mysql-5.7.16 ndb-7.5.4 2016-12-18 00:53:20 [MgmtSrvr] INFO -- Node 4: Communication to Node 5 opened 2016-12-18 00:53:20 [MgmtSrvr] INFO -- Node 3: Communication to Node 5 opened So as you can see, there's nothing in the log showing you have a problem. there is a problem with starting cluster but that's related to your config that's missing nodeid for each defined node, other then that, there's zero issues reported in the management log so no errors happened. The data node log show only a lot of: 2016-12-18 00:27:00 [ndbd] INFO -- Watchdog: User time: 78 System time: 2041 2016-12-18 00:27:00 [ndbd] INFO -- timerHandlingLab, expected 10ms sleep, not scheduled for: 308 (ms) 2016-12-18 00:27:00 [ndbd] INFO -- Watchdog: User time: 78 System time: 2042 and that's "overload" cpu or io (usually IO) and that's normal if you are running cluster on vm inside a laptop, reconfiguring it differently could help a bit but not much. So definitively not a bug take care Bogdan Kecman
[20 Dec 2016 7:06]
ll lucky
If you were definitely, I will not say too much. It's OK on ndb-7.4.13. I'm not sure will the log files record everything, why not do a small test using the c++ code I have provided and tell me you run it OK? I think it's more persuasive.
[20 Dec 2016 11:21]
MySQL Verification Team
Hi, > I'm not sure will the log files record everything They do > why not do a small test using the c++ code I have > provided and tell me you run it OK? I think it's more persuasive. I did that test before I wrote anything to this issue, it works normally, of course cluster is not running in a vm inside my laptop (it's running on 4 nodes running in vm's on a dedicated host running proxmox). As I said, fix your config for start (add nodeid parameter for each node and manually assign node id's and add few more mysqld slots and at least 2 api slots) and test it on a hw where you can dedicate some io to it. take care Bogdan
[21 Dec 2016 1:08]
ll lucky
hi, Actually, it also happened on a cluster of six nodes, part of the config is like this: [ndbd default] NoOfReplicas=2 DataMemory=435G # How much memory to allocate for data storage IndexMemory=11G # How much memory to allocate for index TimeBetweenLocalCheckpoints=5 FragmentLogFileSize=64M #default is 16M NoOfFragmentLogFiles=256 #default is 16 MaxNoOfConcurrentTransactions=1000000 #default is 4096 MaxNoOfConcurrentOperations=1000000 #default is 32K MaxNoOfLocalOperations=1100000 MaxNoOfAttributes=400000 MaxNoOfTables=40000 MaxNoOfOrderedIndexes=4096 MaxNoOfExecutionThreads=16 NoOfFragmentLogParts=8 TransactionDeadLockDetectionTimeOut=5000 #default is 1200 ms # LockPagesInMainMemory=1 [tcp default] SendBufferMemory=128M ReceiveBufferMemory=128M # portnumber=2202 [ndb_mgmd] NodeId=1 ... But I can't provide the log from this cluster.It may not because of the missing of nodeid parameter or lack of performance. Whatever, thank you very much.