Bug #70612 SSL connection error not correctly manage in Master - Slave replicating scenari
Submitted: 13 Oct 2013 21:51 Modified: 19 Nov 2013 20:05
Reporter: Marco Tusa Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.5 and 5.6, 5.1.83, 5.5.35, 5.6.16 OS:Any
Assigned to: CPU Architecture:Any
Tags: SSL replication
Triage: Needs Triage: D3 (Medium)

[13 Oct 2013 21:51] Marco Tusa
Description:
In case of replication using SSL, if the connection fails, for any possible reasons, MySQL instead retrying connecting will immediately report an error and abort replication.

This is currently due to an issue in the code on all versions. 
Error as below:
"131009 4:08:55 [Warning] "SELECT UNIX_TIMESTAMP()" failed on master, do not trust column Seconds_Behind_Master of SHOW SLAVE STATUS. Error: (1159)

131009 4:09:44 [ERROR] Slave I/O: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: SSL connection error: error:00000005:lib(0):func(0):DH lib, Error_code: 2026
131009 4:09:44 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.xxxxx', position xxxxx"    <---------------

How to repeat:
As reported above please note the error number 2026.
This error is reported in the errmsg.h file as follow:
 #define CR_SSL_CONNECTION_ERROR 2026

In version 5.5 and 5.6 the function managing the errors that are handled as network error and as such requiring, connection retry is:
 slave::is_network_error for 5.5 
 rpl_slave::is_network_error 5.6

in both cases the function is:
bool is_network_error(uint errorno)
{ 
  if (errorno == CR_CONNECTION_ERROR || 
      errorno == CR_CONN_HOST_ERROR ||
      errorno == CR_SERVER_GONE_ERROR ||
      errorno == CR_SERVER_LOST ||
      errorno == ER_CON_COUNT_ERROR ||
      errorno == ER_SERVER_SHUTDOWN)
    return TRUE;

  return FALSE;   
}

So in both cases the CR_SSL_CONNECTION_ERROR is not recognize as a network connection error.

Given that the server is not able to redirect the activity to "network_err;" label instead sending execution to the standard "err;" label.

As already mention this will prevent MySQL to reconnect and fix the replication automatically.

Suggested fix:
Update the is_network_error adding CR_SSL_CONNECTION_ERROR  to the list of managed errors.
[19 Nov 2013 20:05] Sveta Smirnova
Thank you for the report.

Verified as described using code analysis.