Bug #51845 NDB Tables left read-only after partial network interruption between NDB/MYSQLD
Submitted: 8 Mar 2010 21:20
Reporter: Matthew Montgomery Email Updates:
Status: Verified Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1-telco-7.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: 7.0.9

[8 Mar 2010 21:20] Matthew Montgomery
When the pooled API node connection of mysqld reconnects to the cluster after heartbeat timeout the tables are read-only for that mysqld instance.  Node remains in "Waiting for ndbcluster to start" status after ndb_mgm has confirmed re-connect.

How to repeat:

Define a cluster with ndbd and mysqld nodes on two different physical machines.
define ndb_cluster_connection_pool > 1;
Identify which port is being used to connected each ndbd and api node. 

Example with 2 ndbd and one mysqld with ndb_cluster_connection_pool=4 from host

$ my_ip=$(ifconfig eth0 | grep inet\ addr | cut -d ':' -f 2 | cut -d ' ' -f 1); sudo netstat -np | grep ndbd | grep $my_ip | sort -k 7 | cut -d ':' -f 2  | grep -v $my_ip

for each port in col1

sudo iptables -F INPUT ; sudo iptables -A INPUT -p tcp --dport <col1> -j REJECT ; watch -n 0 ndb_mgm -e "show"
The node connected via that port will "bink"

Once you map the API slot to the port, restart mysqld to get a clean state.  (as long as you don't restart NDBD nodes the port mapping will remain the same)

Block the port(s) of the first node allocated to the mysqld (binlog subscriber?)

sudo iptables -F INPUT ; 
sudo iptables -A INPUT -p tcp --dport 51872
sudo iptables -A INPUT -p tcp --dport 60954

Wait for node to be disconnected (check ndb_mgm> SHOW or cluster log)

sudo iptables -F INPUT ;

Wait for node to be shown as connected (check ndb_mgm> SHOW or cluster log)

On mysqld$ mysql -e "show status like 'ndb%'; show processlist;"

Repeat until you see all ids in pool listed as Ndb_cluster_node_id

Note that the node remains in "Waiting for ndbcluster to start" and all NDBD tables are read-only while all pooled APIs list Ndb_number_of_ready_data_nodes=2 and latest_epoch progresses in SHOW ENGINE NDB STATUS;
[8 Mar 2010 21:22] MySQL Verification Team
W3 : Restart mysqld to reset good state
[8 Mar 2010 21:50] MySQL Verification Team
I meant:

Block the port(s) of the first node allocated to the mysqld (binlog subscriber?)

sudo iptables -F INPUT
sudo iptables -A INPUT -p tcp --dport 51872 -j REJECT
sudo iptables -A INPUT -p tcp --dport 60954 -j REJECT
[31 May 2010 20:24] Matthew Boehm
affected on 7.1.3 as well
[23 Aug 2013 15:50] Joffrey MICHAIE

Still happening in 7.2.13. Using bonding, unplugged both interfaces of one API node, then waited 5 minutes, and replugged them.

I didn't find a status/show command in the API node (except the log) to check the read-only (or not) state. Is this possible ?

[15 Jun 2015 14:25] Hartmut Holzgraefe
Seen again on 7.2.4 recently. The affected mysqld node was declared dead due to heartbeat problems, then reconnected. Cluster log showed 

  Event buffer status: used=4232B(100%) alloc=4232B(0%)

for it for a few seconds while mysql error log showed

  NDB: Could not get apply status share

around the same time.

The later is an error message I couldn't find much info about,
the only mention of this message seems to be in 

[2 Mar 2016 9:00] Hartmut Holzgraefe
Any news here?
[24 Jun 2016 18:03] MySQL Verification Team
Several read-only after reconnect issues were fixed in 7.2.23,
6.3.12, 7.4.9, and 7.5.0 

BUG #17674771
BUG #19537961 
BUG #22204186
[27 Jun 2016 7:49] Hartmut Holzgraefe

(would be great if this bug #51845 would be mentioned along with internal bug  reports #17674771, #19537961, #22204186, #22361695 in the changelog entries in the manual though)