Bug #21132 Slave fails to reconnect on update_slave_list
Submitted: 19 Jul 2006 5:03 Modified: 28 Nov 2007 16:09
Reporter: Kolbe Kegel Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.0.21 OS:Linux (Linux)
Assigned to: Rafal Somla CPU Architecture:Any
Tags: bfsm_2007_07_19

[19 Jul 2006 5:03] Kolbe Kegel
Description:
1) A slave will try to reconnect to the master after a period of inactivity to ensure that its connection is stale. This shouldn't be necessary, as it should be possible to simply ping the master or perform some other lightweight check.

2) When the slave reconnects, it tries to get a list of other slaves, which should not be necessary, as slaves do not communicate with one another.

3) In some circumstances, the reconnection can fail because the slave is unable to run SHOW SLAVE HOSTS.

In such a situation, the slave will fail to reconnect to the server.

Also, the slave gives a useless error message:

060515 14:05:44 [ERROR] While trying to obtain the list of slaves from the master 'master:3306', user 'replicator' got the following error: ''
060515 14:05:44 [Note] Slave I/O thread exiting, read up to log 'replicatelogs.001057', position 5573621

The problem is suspected to be in this block of code:

if (mysql_real_query(mysql, STRING_WITH_LEN("SHOW SLAVE HOSTS")) ||
!(res = mysql_store_result(mysql)))
{
error= mysql_error(mysql);
goto err;
}

The way mysql_error() works means that it can return an error from the wrong statement.

How to repeat:
n/a

Suggested fix:
1) Slave should not need to reconnect to the master in the first place
2) Slave should not execute SHOW SLAVE HOSTS when reconnecting to the master
3) Slave should not fail to execute SHOW SLAVE HOSTS when reconnecting to the master
[22 Oct 2006 8:23] Lars Thalmann
Comments:

1) Remove slave reconnect

   One solution that has been discussed for this is to send heartbeat
   events from the master to the slave.  One benefit of this is that
   these events also can be used to update the lag time of the slave
   (slave behind master time).

2-3) Slave not to ask for list of slaves

   I think it should be the master responsibility to check that
   multiple slaves with same id can't connect.  It seems strange that
   SHOW SLAVE HOSTS data is transfered to the slave.  I think
   this is old code that was intended for automatic fail-over, but
   this code has never worked.

Suggested solution:

1) Change the reconnect code to check the socket before trying to
   reconnect.

2) Remove SHOW SLAVE HOSTS query.  If there really is a test that same
   id is not used, then move this test to the master and provide a proper
   error code to the slave if it uses the wrong id.

See also BUG#21869, WL#2860.
[15 Dec 2006 10:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17033

ChangeSet@1.2349, 2006-12-15 11:32:41+01:00, rafal@quant.(none) +1 -0
  BUG#21132 (Slave fails to reconnect on update_slave_list):
  
  The update_slave_list() call is a remainder from attempts to implement failsafe 
  replication. This code is now obsolete and not maintained (see comments in 
  rpl_failsafe.cc). 
  
  Inspecting the code one can see that this function do not interferre with normal 
  slave operation and thus can be safely removed. This will solve the issue 
  reported in the bug (errors on slave reconnection). 
  
  A related issue is to remove unneccessary reconnections done by slave. This is 
  handled in the patch for BUG#20435.
[8 Jan 2007 15:11] Rafal Somla
Need to look at the reconnection code and consolidate with BUG#20435
[12 Mar 2007 18:53] Rafal Somla
Decision on the issue described in HLS of WL#2860 is needed to decide how to proceed with this bug.
[21 May 2007 10:27] Lars Thalmann
** Change the semantics in 5.1 so that the SHOW SLAVE HOSTS 
** statement only shows directly connected slaves 
** (that has reported themselves to the master).
[21 May 2007 10:30] Lars Thalmann
See also BUG#13963.
[28 May 2007 19:15] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/27506

ChangeSet@1.2516, 2007-05-28 21:14:57+02:00, rafal@quant.(none) +1 -0
  BUG#21132 (Slave fails to reconnect on update_slave_list)
  
  This is a one liner which will fix semantics if SHOW SLAVE HOSTS to 
  display the list of slaves currently registered on the host on which
  it was issued.
[29 May 2007 16:00] Guilhem Bichot
approved with minor comments
[8 Jun 2007 14:54] Rafal Somla
Pushed into 5.1-new-rpl tree.
[21 Jun 2007 20:15] Bugs System
Pushed into 5.1.20-beta
[29 Aug 2007 9:50] Lars Thalmann
DEV:
This has already been fixed in 5.1.  After it has also been 
pushed into 5.0, the report can be given to docs.

DOCS: 
Note that careful documentation of this is needed since the
semantics change in 5.0.
[29 Aug 2007 21:43] Rafal Somla
Pushed into 5.0-rpl tree.
[27 Nov 2007 10:49] Bugs System
Pushed into 5.0.54
[27 Nov 2007 10:50] Bugs System
Pushed into 5.1.23-rc
[27 Nov 2007 10:53] Bugs System
Pushed into 6.0.4-alpha
[28 Nov 2007 16:09] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix as follows in the 5.0.54, 5.1.23, and 6.0.4 changelogs:

        A replication slave sometimes failed to reconnect because it was
        unable to run SHOW SLAVE HOSTS. It was not necessary to run this 
        statement on slaves (since the master should track connection
        IDs), and the execution of this statement by slaves was removed.