Bug #71375 Slave I/O thread won't attempt to auto reconnect to the master - error code 1593
Submitted: 14 Jan 2014 14:13 Modified: 16 Jan 2014 20:11
Reporter: Muhammad Irfan Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.5 OS:Linux
Assigned to: CPU Architecture:Any
Triage: Needs Triage: D3 (Medium)

[14 Jan 2014 14:13] Muhammad Irfan
ERROR] Slave I/O: The slave I/O thread stops because SET @master_heartbeat_period on master failed. Error: , Error_code: 1593
[Note] Slave I/O thread exiting, read up to log 'mysql-bin.015318', position 887067847

The error in question is:

$ perror 1593
MySQL error code 1593 (ER_SLAVE_FATAL_ERROR): Fatal error: %s

Percona Server 5.5.29-29.4 is running on the affected slave. Looking at the source code for that release, the problematic code path appears to be:

1536 if (mysql_real_query(mysql, query, strlen(query))
1537 && !check_io_slave_killed(mi->io_thd, mi, NULL))
1538 {
1539 errmsg= "The slave I/O thread stops because SET @master_heartbeat_period "
1540 "on master failed.";
1541 err_code= ER_SLAVE_FATAL_ERROR;
1542 sprintf(err_buff, "%s Error: %s", errmsg, mysql_error(mysql));
1543 mysql_free_result(mysql_store_result(mysql));
1544 goto err;
1545 }
1546 mysql_free_result(mysql_store_result(mysql));

I believe this exhibits a bug. Instead of just assuming the error is fatal, it should do "is_network_error(mysql_errno(mysql))" and determine whether the slave thread should be restarted (this is done in Percona Server 5.6.13, for instance).

Additionally, since there is already an error code from mysql_real_query, should it be later overwritten with ER_SLAVE_FATAL_ERROR?

How to repeat:
This has been fixed in 5.6 and the bug report is a back port request for 5.5

Suggested fix:
This has been fixed in 5.6 and the bug report is a back port request.
[16 Jan 2014 20:11] Sveta Smirnova
Thank you for the report.

Same thing happens in Oracle MySQL servers: version 5.6 doesn't have this bug. But since I could not find bug # which fixes this I will verify this report as: "please backport is_network_error check for master_heartbeat_period".