Description:
I want some extra redundancy so I'm testing replica = 3 with 3 NDB nodes, 3 API nodes and 1 MGM node. When the replica is set to 3 (and possibly ODD number). The cluster will be unstable (not completely down but unavailable from time-to-time) due to auto-restarting on the mysqld nodes.
How to repeat:
To simulate a problem, I shutdown one of the NDB node and I got an error on the ndb_mgm console. and all the mysql is not able to get data sometimes. When the ndb_mgm show the mysqld is connected, it will work. But then another second, the ndb_mgm will show mysqld is off and the query fails. I checked the error log on mysqld and it seems like the mysqld is restarting over and over.
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 3 node(s)
id=2 @10.0.0.144 (Version: 5.1.11, Nodegroup: 0, Master)
id=3 (not connected, accepting connect from 10.0.0.145)
id=4 @10.0.0.146 (Version: 5.1.11, Nodegroup: 0)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @10.0.0.140 (Version: 5.1.11)
[mysqld(API)] 3 node(s)
id=5 @10.0.0.141 (Version: 5.1.11)
id=6 @10.0.0.142 (Version: 5.1.11)
id=7 @10.0.0.143 (Version: 5.1.11)
ndb_mgm> Node 3: Forced node shutdown completed. Occured during startphase. Initiated by signal 8. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
ndb_mgm> Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2305: 'Arbitrator shutdown, please investigate error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
mysql> select count(*) from account;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from account;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
ERROR:
Can't connect to the server
the error log:
Number of processes running now: 0
061008 02:32:46 mysqld restarted
061008 2:32:46 InnoDB: Started; log sequence number 0 46403
061008 2:32:53 [Note] Starting MySQL Cluster Binlog Thread
/usr/local/mysql/bin/mysqld: Table 'general_log' is marked as crashed and should be repaired
/usr/local/mysql/bin/mysqld: Table 'slow_log' is marked as crashed and should be repaired
061008 2:32:54 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.1.11-beta' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL)
061008 2:32:54 [Note] SCHEDULER: Manager thread booting
061008 2:32:54 [Note] SCHEDULER: Loaded 0 events
061008 2:32:54 [Note] SCHEDULER: Suspending operations
INVALID SUB_GCP_COMPLETE_REP
gci: 1630
sender: 1010004
count: 5
bucket count: 4294967295
nodes: 3
mysqld got signal 6;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=0
max_connections=100
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Number of processes running now: 0
061008 02:32:55 mysqld restarted
061008 2:32:55 InnoDB: Started; log sequence number 0 46403
061008 2:33:02 [Note] Starting MySQL Cluster Binlog Thread
/usr/local/mysql/bin/mysqld: Table 'general_log' is marked as crashed and should be repaired
/usr/local/mysql/bin/mysqld: Table 'slow_log' is marked as crashed and should be repaired
061008 2:33:04 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.1.11-beta' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL)
061008 2:33:04 [Note] SCHEDULER: Manager thread booting
INVALID SUB_GCP_COMPLETE_REP
gci: 1633
sender: 1010004
count: 5
bucket count: 4294967295
nodes: 3
mysqld got signal 6;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=0
max_connections=100
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Suggested fix:
None, except only use replica = even number?