Description:
In case of replication using SSL, if the connection fails, for any possible reasons, MySQL instead retrying connecting will immediately report an error and abort replication.
This is currently due to an issue in the code on all versions.
Error as below:
"131009 4:08:55 [Warning] "SELECT UNIX_TIMESTAMP()" failed on master, do not trust column Seconds_Behind_Master of SHOW SLAVE STATUS. Error: (1159)
131009 4:09:44 [ERROR] Slave I/O: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: SSL connection error: error:00000005:lib(0):func(0):DH lib, Error_code: 2026
131009 4:09:44 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.xxxxx', position xxxxx" <---------------
How to repeat:
As reported above please note the error number 2026.
This error is reported in the errmsg.h file as follow:
#define CR_SSL_CONNECTION_ERROR 2026
In version 5.5 and 5.6 the function managing the errors that are handled as network error and as such requiring, connection retry is:
slave::is_network_error for 5.5
rpl_slave::is_network_error 5.6
in both cases the function is:
bool is_network_error(uint errorno)
{
if (errorno == CR_CONNECTION_ERROR ||
errorno == CR_CONN_HOST_ERROR ||
errorno == CR_SERVER_GONE_ERROR ||
errorno == CR_SERVER_LOST ||
errorno == ER_CON_COUNT_ERROR ||
errorno == ER_SERVER_SHUTDOWN)
return TRUE;
return FALSE;
}
So in both cases the CR_SSL_CONNECTION_ERROR is not recognize as a network connection error.
Given that the server is not able to redirect the activity to "network_err;" label instead sending execution to the standard "err;" label.
As already mention this will prevent MySQL to reconnect and fix the replication automatically.
Suggested fix:
Update the is_network_error adding CR_SSL_CONNECTION_ERROR to the list of managed errors.