Bug #1256 Replication slave fails to connect to master in 64-bit version
Submitted: 11 Sep 2003 15:00 Modified: 8 Nov 2004 20:19
Reporter: Chad Attermann Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:MySQL Standard 4.0.14 OS:Sun Solaris 8 SPARC 64-bit
Assigned to: Guilhem Bichot

[11 Sep 2003 15:00] Chad Attermann
Description:
Attempt of replication slave to connect to replication master fails in the 64-bit version only.  Following is the error that is logged:

030911 16:53:41  Slave I/O thread: error connecting to master 'repl@x.x.x.x:3306': Error: 'Access denied for user: 'repl@localhost' Using password: YES)'  errno: 1045  retry-time: 60  retries: 86400

Note that IP address of "x.x.x.x" in above log was removed for privacy but was actually the correct IP address of the master.  It seems that the slave is attempting to authenticate as if it were on host "localhost" when in fact it is on a different host from the master.

Note also that replacing the 64-bit version with the 32-bit version (without changing any of the config) resulted in a successful connection by the replication slave to the master.

How to repeat:
Install mysql-standard-4.0.14-sun-solaris2.8-sparc-64bit on two different 64-bit Solaris 8 SPARC servers.  Configure one as master and the other as slave.  Attempt to connect slave to master for replication using a user (ie "repl") with File and Replication_slave privileges.

Suggested fix:
???
[12 Sep 2003 9:45] Guilhem Bichot
Hi,
When you changed to the 32-bit version and it worked: did you change to the 32-bit version on master/slave/both ?
[12 Sep 2003 12:06] Chad Attermann
Hi Guilhem,

In this example I changed both master and slave to the 32-bit version.  It is worth noting however that previously I had tried using the 4.0.14 64-bit version as a slave against a 3.23.57 32-bit version master and had the exact same result.  Switching to the 4.0.14 32-bit version solved the problem in that case as well.

Regards.
[12 Sep 2003 15:59] Guilhem Bichot
Unfortunately I can't repeat the problem.
I have tested this with 4.0.15 versions
slave=linux    master=solaris8,"64bit binary" : works
master=linux   slave=solaris8,"64bit binary" : works
and this:
master=solaris9,3.23.58,"32 bit binary", slave=solaris8,4.0.15,"64 bit binary"
which resembles to one of your failing setups (except that my master is solaris 9, not 8; sorry I have only one Solaris 8 machine and two Solaris 9 machines here, so I can't test replication between two different Solaris 8), and it also works.
Could you do the following test, to know if only replication is affected:
have the 64-bit failing 4.0.14 binary installed on your slave, open a Unix login on your slave, do 'mysql -urepl -hthe_IP_of_your_master'
and tell me if it manages to connect. Thanks.
[12 Sep 2003 16:03] Guilhem Bichot
Another test you could do is, on your slave with the 64-bit binary, call
'resolveip the_IP_of_your_master'; tell me what it says
(resolveip is part of the MySQL distribution).
[26 Sep 2003 14:50] Guilhem Bichot
Hi,
I have found a suspicious line in the replication network code (confusion between ulong and uint32, which differ on 64-bit machines) and changed it for MySQL 4.0.16.
So I would suggest you try again, if you have time, with MySQL 4.0.16 when it is
released.
Thanks for your kind help troubleshooting this!!
[26 Sep 2003 15:31] Chad Attermann
I'd be happy to test build 16 when it's available.  I'll keep checking back, or you can notify me.

Thanks.
[1 Oct 2003 9:53] Guilhem Bichot
Hi!
A very similar bug report (#1391) for filed for HP-UX 64-bit binaries. Correcting a suspicious line fixed it on these machines, so it is likely that it also fixes the case on Solaris. Please, when 4.0.16 is out (this will be announced to announce@lists.mysql.com), can you try again, and if it works or does not work, mention it here?
Thank you!
[8 Nov 2004 10:31] Guilhem Bichot
It's been more than one year without user feedback now, so I'm closing it.
Chad, if you have done more tests with recent 4.0 and even 4.1 versions (which I'd recommend as we cleaned the connection code in 4.1), feel free to write them here and reopen the bug. Thanks.
[8 Nov 2004 16:42] Chad Attermann
My sincerest apologies.  I completely lost track of this bug report.  I have been running 4.0.17 Solaris 64-bit version on a dual CPU SPARC server for several months now without a single problem.  Thanks a lot for the quick fix!

Best Regards,

Chad Attermann.
[8 Nov 2004 20:19] Guilhem Bichot
I am SO happy that it got fixed!