MySQL Bugs: #26483: Slave MySQLD does not reconnect w/ slave cluster after network outage restored

Bug #26483	Slave MySQLD does not reconnect w/ slave cluster after network outage restored
Submitted:	19 Feb 2007 22:26	Modified:	21 Feb 2007 17:35
Reporter:	Jonathan Miller	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	mysql-5.1-telco-6.2	OS:	Linux (Linux 32 Bit)
Assigned to:		CPU Architecture:	Any
Tags:	5.1.16-ndb-6.2.0

Description:
In trying to recreate bug#24694 I found that short network outages on theSlave  MySQLD host would cause it to get into a state where the mysqld would have to be restarted to reconnect to the slave cluster again.

Last_Errno: 1015
Last_Error: Error 'Can't lock file (errno: 4009)' in Write_rows event: when locking tables

070219 23:22:30 [Note] Slave SQL thread initialized, starting replication in log 'ndb09-bin.000001' at position 7868162, relay log './n
12-relay-bin.000006' position: 1338470
mysql> 070219 23:22:30 [ERROR] Slave: Error 'Can't lock file (errno: 4009)' in Write_rows event: when locking tables, Error_code: 1015
070219 23:22:30 [Warning] Slave: Got error 4009 'Cluster Failure' from NDB Error_code: 1296
070219 23:22:30 [Warning] Slave: Can't lock file (errno: 4009) Error_code: 1015
070219 23:22:30 [Warning] Slave: Unknown error Error_code: 1105
070219 23:22:30 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE ST
ART". We stopped at log 'n09-bin.000001' position 7868162

Every "Start Slave" returns the above error message. Only restarting the mysqld will clear the above message.

How to repeat:
see bug#24694

Suggested fix:
MySQLD should automatically reconnect once the outage has been corrected.