MySQL Bugs: #51845: NDB Tables left read-only after partial network interruption between NDB/MYSQLD

Bug #51845	NDB Tables left read-only after partial network interruption between NDB/MYSQLD
Submitted:	8 Mar 2010 21:20
Reporter:	Matthew Montgomery	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.1-telco-7.0	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	7.0.9

Description:
When the pooled API node connection of mysqld reconnects to the cluster after heartbeat timeout the tables are read-only for that mysqld instance.  Node remains in "Waiting for ndbcluster to start" status after ndb_mgm has confirmed re-connect.

How to repeat:

Define a cluster with ndbd and mysqld nodes on two different physical machines.
define ndb_cluster_connection_pool > 1;
Identify which port is being used to connected each ndbd and api node. 

Example with 2 ndbd and one mysqld with ndb_cluster_connection_pool=4 from host 192.168.23.10:

$ my_ip=$(ifconfig eth0 | grep inet\ addr | cut -d ':' -f 2 | cut -d ' ' -f 1); sudo netstat -np | grep ndbd | grep $my_ip | sort -k 7 | cut -d ':' -f 2  | grep -v $my_ip
33000     192.168.23.10
48505     192.168.23.10
51872     192.168.23.10
52075     192.168.23.10
39062     192.168.23.10
46502     192.168.23.10
53851     192.168.23.10
60954     192.168.23.10

for each port in col1

sudo iptables -F INPUT ; sudo iptables -A INPUT -p tcp --dport <col1> -j REJECT ; watch -n 0 ndb_mgm -e "show"
The node connected via that port will "bink"

Once you map the API slot to the port, restart mysqld to get a clean state.  (as long as you don't restart NDBD nodes the port mapping will remain the same)

Block the port(s) of the first node allocated to the mysqld (binlog subscriber?)

sudo iptables -F INPUT ; 
sudo iptables -A INPUT -p tcp --dport 51872
sudo iptables -A INPUT -p tcp --dport 60954

Wait for node to be disconnected (check ndb_mgm> SHOW or cluster log)

sudo iptables -F INPUT ;

Wait for node to be shown as connected (check ndb_mgm> SHOW or cluster log)

On mysqld$ mysql -e "show status like 'ndb%'; show processlist;"

Repeat until you see all ids in pool listed as Ndb_cluster_node_id

Note that the node remains in "Waiting for ndbcluster to start" and all NDBD tables are read-only while all pooled APIs list Ndb_number_of_ready_data_nodes=2 and latest_epoch progresses in SHOW ENGINE NDB STATUS;

W3 : Restart mysqld to reset good state

I meant:

Block the port(s) of the first node allocated to the mysqld (binlog subscriber?)

sudo iptables -F INPUT
sudo iptables -A INPUT -p tcp --dport 51872 -j REJECT
sudo iptables -A INPUT -p tcp --dport 60954 -j REJECT

affected on 7.1.3 as well

Hi,

Still happening in 7.2.13. Using bonding, unplugged both interfaces of one API node, then waited 5 minutes, and replugged them.

I didn't find a status/show command in the API node (except the log) to check the read-only (or not) state. Is this possible ?

Regards,
Joffrey

Seen again on 7.2.4 recently. The affected mysqld node was declared dead due to heartbeat problems, then reconnected. Cluster log showed 

  Event buffer status: used=4232B(100%) alloc=4232B(0%)

for it for a few seconds while mysql error log showed

  NDB: Could not get apply status share

around the same time.

The later is an error message I couldn't find much info about,
the only mention of this message seems to be in 

https://bugs.mysql.com/bug.php?id=18561

Any news here?

Several read-only after reconnect issues were fixed in 7.2.23,
6.3.12, 7.4.9, and 7.5.0 

BUG #17674771
BUG #19537961 
BUG #22204186

\0/

(would be great if this bug #51845 would be mentioned along with internal bug  reports #17674771, #19537961, #22204186, #22361695 in the changelog entries in the manual though)