Description:
Both Data Nodes crash at the same time causing a complete cluster failure!
=cluster manager both nodes
2014-03-29 17:04:18 [MgmtSrvr] WARNING -- Node 4: GCP Monitor: GCP_COMMIT lag 9 seconds (no max lag)
2014-03-29 17:04:22 [MgmtSrvr] WARNING -- Node 4: Node 1 missed heartbeat 2
2014-03-29 17:04:22 [MgmtSrvr] WARNING -- Node 4: Node 11 missed heartbeat 2
2014-03-29 17:04:22 [MgmtSrvr] WARNING -- Node 4: Node 13 missed heartbeat 2
2014-03-29 17:04:22 [MgmtSrvr] WARNING -- Node 4: Node 17 missed heartbeat 2
2014-03-29 17:04:25 [MgmtSrvr] WARNING -- Node 4: Node 1 missed heartbeat 3
2014-03-29 17:04:25 [MgmtSrvr] WARNING -- Node 4: Node 11 missed heartbeat 3
2014-03-29 17:04:25 [MgmtSrvr] WARNING -- Node 4: Node 13 missed heartbeat 3
2014-03-29 17:04:25 [MgmtSrvr] WARNING -- Node 4: Node 17 missed heartbeat 3
2014-03-29 17:04:25 [MgmtSrvr] WARNING -- Node 4: Node 5 missed heartbeat 2
2014-03-29 17:04:33 [MgmtSrvr] ALERT -- Node 4: Forced node shutdown completed, restarting. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2014-03-29 17:04:33 [MgmtSrvr] ALERT -- Node 5: Node 4 Disconnected
2014-03-29 17:05:15 [MgmtSrvr] ALERT -- Node 5: Forced node shutdown completed, restarting. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2014-03-29 17:05:15 [MgmtSrvr] ALERT -- Node 1: Node 5 Disconnected
= mysql.err
140329 17:04:33 [ERROR] /usr/sbin/mysqld: Sort aborted: Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER
140329 17:04:33 [ERROR] /usr/sbin/mysqld: Sort aborted: Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER
140329 17:04:33 [ERROR] /usr/sbin/mysqld: Sort aborted: Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
140329 17:05:15 [ERROR] /usr/sbin/mysqld: Sort aborted: Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER
140329 17:05:15 [Warning] NDB: Could not acquire global schema lock (4009)Cluster Failure
140329 17:05:15 [ERROR] /usr/sbin/mysqld: Sort aborted: Got error 157 'Unknown error code' from NDBCLUSTER
140329 17:04:27 [ERROR] /usr/sbin/mysqld: Sort aborted: Got temporary error 4006 'Connect failure - out of connection objects (increase MaxNoOfConcurrentTransactions)' from NDBCLUSTER
140329 17:04:33 [ERROR] /usr/sbin/mysqld: Sort aborted: Got temporary error 270 'Transaction aborted due to node shutdown' from NDBCLUSTER
140329 17:05:15 [Note] NDB Binlog: Node: 4, down, Subscriber bitmask 00
140329 17:05:15 [Note] NDB Binlog: Node: 5, down, Subscriber bitmask 00
140329 17:05:15 [ERROR] /usr/sbin/mysqld: Sort aborted: Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
140329 17:05:15 [Warning] NDB: Could not acquire global schema lock (4009)Cluster Failure
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
140329 17:05:15 [ERROR] Error 4009 in ::get_auto_increment(): Cluster Failure
= ndb_X_error.log Both data nodes
Time: Friday 28 March 2014 - 17:05:11
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 6 received; Aborted
Error object: /pb2/build/sb_0-8660699-1363118778.75/rpm/BUILD/mysql-cluster-gpl-7.2.12/mysql-cluster-gpl-7.2.12/storage/ndb/src/kernel/ndbd.cpp
Program: ndbmtd
Pid: 5373 thr: 8
Version: mysql-5.5.30 ndb-7.2.12
Trace: /var/lib/mysql/data/ndb_4_trace.log.6
Time: Saturday 29 March 2014 - 17:04:27
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 6 received; Aborted
Error object: /pb2/build/sb_0-8660699-1363118778.75/rpm/BUILD/mysql-cluster-gpl-7.2.12/mysql-cluster-gpl-7.2.12/storage/ndb/src/kernel/ndbd.cpp
Program: ndbmtd
Pid: 22749 thr: 8
Version: mysql-5.5.30 ndb-7.2.12
Trace: /var/lib/mysql/data/ndb_4_trace.log
Time: Friday 28 March 2014 - 17:04:23
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 6 received; Aborted
Error object: /pb2/build/sb_0-8660699-1363118778.75/rpm/BUILD/mysql-cluster-gpl-7.2.12/mysql-cluster-gpl-7.2.12/storage/ndb/src/kernel/ndbd.cpp
Program: ndbmtd
Pid: 23257 thr: 4
Version: mysql-5.5.30 ndb-7.2.12
Trace: /var/lib/mysql/data/ndb_5_trace.log.6
Time: Saturday 29 March 2014 - 17:05:11
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 6 received; Aborted
Error object: /pb2/build/sb_0-8660699-1363118778.75/rpm/BUILD/mysql-cluster-gpl-7.2.12/mysql-cluster-gpl-7.2.12/storage/ndb/src/kernel/ndbd.cpp
Program: ndbmtd
Pid: 31957 thr: 6
Version: mysql-5.5.30 ndb-7.2.12
Trace: /var/lib/mysql/data/ndb_5_trace.log
=ndb_out.log similar on both data nodes
Finished with handling node-failureFinished with handling node-failure
Finished with handling node-failureexecGCP_NOMORETRANS(22784765/1) c_ongoing_take_over_cnt -> seize
NOT returning gcpTcfinished due to nfhandling 22784765/1
execGCP_NOMORETRANS(22784765/1) c_ongoing_take_over_cnt -> seize
REMOVING lcp: 25772 from table: 3 frag: 0 node: 4
REMOVING lcp: 25772 from table: 3 frag: 1 node: 4
REMOVING lcp: 25772 from table: 4 frag: 0 node: 4
REMOVING lcp: 25772 from table: 4 frag: 1 node: 4
.. truncated
4 - sleeploop 10!!
6 - sleeploop 10!!
4 - sleeploop 10!!
6 - sleeploop 10!!
4 - sleeploop 10!!
6 - sleeploop 10!!
6 - sleeploop 10!!
4 - sleeploop 10!!
4 - sleeploop 10!!
6 - sleeploop 10!!
4 - sleeploop 10!!
6 - sleeploop 10!!
.. truncated
job buffer full
2014-03-29 17:05:11 [ndbd] INFO -- Received signal 6. Running error handler.
4 - sleeploop 10!!
2014-03-29 17:05:12 [ndbd] INFO -- Signal 6 received; Aborted
2014-03-29 17:05:12 [ndbd] INFO -- /pb2/build/sb_0-8660699-1363118778.75/rpm/BUILD/mysql-cluster-gpl-7.2.12/mysql-cluster-gpl-7.2.12/storage/ndb/src/kernel/ndbd.cpp
2014-03-29 17:05:12 [ndbd] INFO -- Error handler signal restarting system
2014-03-29 17:05:12 [ndbd] INFO -- Error handler shutdown completed - exiting
2014-03-29 17:05:15 [ndbd] ALERT -- Node 5: Forced node shutdown completed, restarting. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
How to repeat:
Unknown, but it's the second time we have had a both data nodes crash with the same error.