Bug #7965 Slave_IO_State Stuck at 'Checking Master Version'
Submitted: 17 Jan 2005 17:52 Modified: 18 Jan 2005 13:13
Reporter: Scott Nebor Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:4.1.8 OS:Linux (Red Hat 9)
Assigned to: Guilhem Bichot CPU Architecture:Any

[17 Jan 2005 17:52] Scott Nebor
Description:
I have replication setup to replicate from a mysql master version 3.23.49a-log to a 4.1.8a-log slave.  Upon starting replication by issuing a "slave start", a "show slave status" reveals that the Slave_IO_State remains stuck at "Checking Master Version".  I found that it remains at this state for exactly 2 hours.  After exactly two hours, replication will continue unobstructed. 

Here is an exert of the error log on the 4.1.8a-log slave.  Please note that I ensured that the first command that was replicated from the master to the slave would error out so that it would appear in the error log:

050114 13:57:56 [Note] Slave SQL thread initialized, starting replication in log 'db2-bin.008' at position 173969304, relay log './db1a-relay-bin.000001' position: 4
050114 13:57:56 [Note] Slave I/O thread: connected to master 'replicate@db2-dbnet:3306',  replication started in log 'db2-bin.008' at position 173969304
050114 15:57:56 [ERROR] Slave: Error 'Table 'ApacheAuth.sessions' doesn't exist' on query. Default database: 'ApacheAuth'. Query: 'INSERT
INTO sessions VALUES ('0b7199850a3e96d0734751de4b0f655d', 1105743431, '')', Error_code: 1146
050114 15:57:56 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'db2-bin.008' position 173969304

During the two hours that replication is stuck, a "show processlist" command on the master reveals that the slave has connected, and the thread is sleeping.

I also should note that I tested this behavior a few times and replication also was delayed for exactly two hours.  It is also worth noting that the master is currently replicating to another 3.23.49a-log client correctly, so I do not believe that there is a problem with the master setup

How to repeat:
Setup a replication environment with a master version 3.23.49 and a master version of 4.1.8.  Execute a slave start on the 4.1.8 machine and check the status of Slave_IO_State by execute "show slave status"

Suggested fix:
All documentation on the Slave_IO_State of "Checking Master Version" states that it is a brief state.  The fix should prevent mysql from being stuck in this state under the conditions show above
[17 Jan 2005 20:11] Guilhem Bichot
Hi,
This is specific of 3.23.x->4.1 replication where x is not a recent number (so, for example, 47, 49). The reason is that 4.1 issues a statement (SELECT @@GLOBAL.COLLATION_SERVER) which *hangs* on 3.23.x. It's a bug in 3.23.x. 3.23.58 does not have this bug.
To work around, I will change 4.1.10 to not send this statement if it sees the master is 3.23.
The reason things change after two hours is probably that some network timeout is 2 hours, causing the SELECT to finally fail, then slave I/O thread proceeds further.
[17 Jan 2005 20:20] Guilhem Bichot
Exactly, the hang was fixed in 3.23.50; from the changelog:
Fixed that @code{@@@@unknown_variable} doesn't hang server.
Changing 4.1.10 to work around this problem.
[17 Jan 2005 21:21] Scott Nebor
I noticed that the slave also issues a
"SELECT @@GLOBAL.TIME_ZONE" in addition to the "SELECT @@GLOBAL.COLLATION_SERVER".  I guess that this 4.1.10 should not issue this command either if the master version is below 3.23.50.  

FYI.  I recompiled 4.1.8 so that it doesn't issue these two commands if the version is below 3.23.50 and this seemed to resolve the problem.
[17 Jan 2005 21:51] Guilhem Bichot
Hi!
Thanks for the verification, now we know it's really the cause.
Yes, I also disabled the SELECT TIMEZONE the same way.
[18 Jan 2005 13:13] Guilhem Bichot
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Fixed in 4.1.10 and 5.0.3 in
ChangeSet@1.2130.1.2, 2005-01-17 21:26:14+01:00, guilhem@mysql.com
  Fix for BUG#7965 "Slave_IO_State Stuck at 'Checking Master Version'":
  Working around hang of master < 3.23.50 on SELECT @@unknown_var
  (to enable 3.23.49->4.1.10 replication)