Bug #26483 Slave MySQLD does not reconnect w/ slave cluster after network outage restored
Submitted: 19 Feb 2007 22:26 Modified: 21 Feb 2007 17:35
Reporter: Jonathan Miller Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.2 OS:Linux (Linux 32 Bit)
Assigned to: CPU Architecture:Any
Tags: 5.1.16-ndb-6.2.0

[19 Feb 2007 22:26] Jonathan Miller
Description:
In trying to recreate bug#24694 I found that short network outages on theSlave  MySQLD host would cause it to get into a state where the mysqld would have to be restarted to reconnect to the slave cluster again.

Last_Errno: 1015
Last_Error: Error 'Can't lock file (errno: 4009)' in Write_rows event: when locking tables

070219 23:22:30 [Note] Slave SQL thread initialized, starting replication in log 'ndb09-bin.000001' at position 7868162, relay log './n
12-relay-bin.000006' position: 1338470
mysql> 070219 23:22:30 [ERROR] Slave: Error 'Can't lock file (errno: 4009)' in Write_rows event: when locking tables, Error_code: 1015
070219 23:22:30 [Warning] Slave: Got error 4009 'Cluster Failure' from NDB Error_code: 1296
070219 23:22:30 [Warning] Slave: Can't lock file (errno: 4009) Error_code: 1015
070219 23:22:30 [Warning] Slave: Unknown error Error_code: 1105
070219 23:22:30 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE ST
ART". We stopped at log 'n09-bin.000001' position 7868162

Every "Start Slave" returns the above error message. Only restarting the mysqld will clear the above message.

How to repeat:
see bug#24694

Suggested fix:
MySQLD should automatically reconnect once the outage has been corrected.