Bug #1381 Bug in replication on HP-UX 64 bit binaries?
Submitted: 23 Sep 2003 9:10 Modified: 1 Oct 2003 9:45
Reporter: Lars-Goran Forsberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.0.15 OS:HP/UX (HP-UX)
Assigned to: Guilhem Bichot CPU Architecture:Any

[23 Sep 2003 9:10] Lars-Goran Forsberg
Description:
We have two 4.0.15 mysql servers set up with replication on HP-UX PA-RISC and Itanium. If the 64 bit binaries (for either PA-RISC or Itanium) are used, the slave seems to always connect to localhost, even if configured to connect to the other server. When the "HP-UX 11.11 (PA-RISC 1.1 and 2.0)" binary is used on PA-RISC, everything works as expected.

I have inserted some brief information from the slave host below. Please contact me if you need more information.

Extracted values from the slave host (Ver 4.0.15-debug-debug for hp-hpux11.22 on ia64, started with safe_mysqld --debug):

bash-2.04# netstat -n -i
Name      Mtu  Network         Address
lan0      1500 192.168.1.0     192.168.1.163

bash-2.04# cat data/my.cnf
[mysqld]
server-id=1
master-port=3306
master-host=192.168.1.164
master-user=sleedefault
master-password=sleedefault
master-connect-retry=30

Parts of /tmp/mysqld.trace:

T@3    : | >connect_to_master
T@3    : | | >mc_mysql_connect
T@3    : | | | enter: host: 192.168.1.164  db: (Null)  user: sleedefault  connec
t_time_out: 30  read_timeout: 3600
T@3    : | | | info: Server name: '192.168.1.164'.  TCP sock: 3306

.... And later:

T@3    : | | | >vio_blocking
T@3    : | | | | enter: set_blocking_mode: 1  old_mode: 1
T@3    : | | | | exit: 0
T@3    : | | | <vio_blocking
T@3    : | | | error: Got error: 1045 (Access denied for user: 'sleedefault@loca
lhost' (Using password: YES))
T@3    : | | | error: message: 1045 (Access denied for user: 'sleedefault@localh
ost' (Using password: YES))

.... And even later:

T@3    : | | <mc_mysql_connect
T@3    : | | >sql_print_error
T@3    : | | | error: Slave I/O thread: error connecting to master 'sleedefault@
192.168.1.164:3306': Error: 'Access denied for user: 'sleedefault@localhost' (Us
ing password: YES)'  errno: 1045  retry-time: 30  retries: 86400
T@3    : | | <sql_print_error

How to repeat:
Install 2 servers using HP-UX 64-bit binary on either Itanium or PA-RISC, configured with replication. The slave will try to connect to localhost even if configured to connect to the other server.

Suggested fix:
?
[23 Sep 2003 12:59] Guilhem Bichot
This is probably the same problem as the one that was reported on Solaris 64-bit
(bug#1256 "Replication slave fails to connect to master in 64-bit version").
Thanks for your bug report which confirms the reality of the bug and even contains a debug trace :) which we will inspect.
It could (possibly, I will check) be the same as
Bug #1282: Replication test failure on FreeBSD5.0-Sparc64

From this:
T@3    : | | <mc_mysql_connect
T@3    : | | >sql_print_error
T@3    : | | | error: Slave I/O thread: error connecting to master
'sleedefault@
192.168.1.164:3306': Error: 'Access denied for user:
'sleedefault@localhost' (Us
ing password: YES)'  errno: 1045  retry-time: 30  retries: 86400
T@3    : | | <sql_print_error
I cannot say if the slave connects to the master, and the master refuses the connection (believing that the user is on host localhost), or if the slave connects to itself and refuses the connection to itself. You could help me know this, please:
could you do something like:
GRANT REPLICATION SLAVE on *.* to sleedefault@localhost;
on your slave?
If replication then starts, it means that the slave is (wrongly) connecting to itself. You should then see a thread called binlog_dump in SHOW PROCESSLIST on the slave (meaning that the slave is also a master, its own master).
If replication still fails to start, with the same error message, it probably means that the slave is connecting to the master but the master does something wrong. Thanks in advance.
[24 Sep 2003 0:10] Lars-Goran Forsberg
Actually, the master server was not running when this debug was collected (sorry, forgot to mention that). Anyway, when I granted privileges for localhost on the slave I got this in the error log (master mysqld is still not running):

030924  9:05:44  Slave I/O thread: connected to master 'sleedefault@192.168.1.16
4:3306',  replication started in log 'FIRST' at position 4
030924  9:05:44  Error reading packet from server: Binary log is not open (serve
r_errno=1236)
030924  9:05:44  Got fatal error 1236: 'Binary log is not open' from master when
 reading data from binary log
[26 Sep 2003 14:49] Guilhem Bichot
Hi,
Thanks for the debug trace. Your slave is (wrongly) connecting to itself.
I have found a suspicious line in the replication code (confusion between ulong and uint32, which differ on 64-bit machines) and changed it for MySQL 4.0.16.
So I would suggest you try again, if you have time, with MySQL 4.0.16 when it is released.
Thanks for your kind help troubleshooting this!!
[1 Oct 2003 9:45] Guilhem Bichot
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Thanks for doing the test. This confirms that the bug is now solved (in 4.0.16).
ChangeSet@1.1579.3.1, 2003-09-26 23:43:22+02:00, guilhem@mysql.com