Description:
I am running MySQL 4.1.14 on HP Alpha machines under OSF1 V5.1 version 2650.
I installed MySQL from the OSF1 binary version downloaded from one of the
official mirror sites, mirror.ac.uk.
I have configured a replication master on a machine named 'mars' and a slave
on a machine named 'venus'.
The master server runs fine, but whenever I try to start the slave, I get a
report like this in the error log file:
050828 13:33:30 mysqld started
050828 13:33:30 [Warning] Can't open and lock time zone table: Table 'mysql.time_zone_leap_second' doesn't exist trying to live without them
/nfs/pathsoft/external/mysql-standard-4.1.14/bin/mysqld: ready for connections.
Version: '4.1.14-standard' socket: '/nfs/arcturus1/mysql/etc/mysql.mars-dev.sock' port: 14644 MySQL Community Edition - Standard (GPL)
050828 13:33:30 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log '/nfs/arcturus1/mysql/data/mars-dev/relay-bin.000001' position: 4
050828 13:33:30 [ERROR] Slave I/O thread: error connecting to master 'slave@mars:14642': Error: 'Unknown MySQL server host 'mars' (1)' errno: 2005 retry-time: 30 retries: 86400
This suggests that the machine running the replication slave cannot perform a
hostname-to-IP address lookup to convert the master's hostname 'mars' to an IP
address.
However, there is nothing wrong with DNS or NIS at our site. I can run the mysql
client program on venus with a command line such as "mysql -h mars -P 14642 ..."
and connect successfully.
I can also run a replication slave on a Linux machine using exactly the same
configuration file, so I know that I have configured both the master and
slave servers correctly.
How to repeat:
1. Set up a replication master on any machine.
2. Set up a replication slave on any HP Alpha machine running OSF1.
3. Try to start the replication slave.
This problem is easily repeatable.
Suggested fix:
Inspection of the source code shows that the source of the problem is the
wrapper code for gethostbyname_r in the file mysys/my_gethostbyname.c, and
specifically the section within the
#elif defined(HAVE_GETHOSTBYNAME_R_RETURN_INT)
...
#elif ...
conditional compilation block.
Under OSF1, the manual page for gethostbyname_r is as follows (irrelevant text
is replaced by ellipsis "..."):
-------------------------------------------------------------------------------
NAME
gethostbyname, gethostbyname_r - Get a network host entry by name
SYNOPSIS
#include <netdb.h>
struct hostent *gethostbyname(
const char *name );
[Tru64 UNIX] The following function is supported in order to maintain
backward compatibility with previous versions of the operating system. You
should not use it in new designs.
int gethostbyname_r(
const char *name,
struct hostent *hptr,
struct hostent_data *hdptr );
...
PARAMETERS
name
Specifies the official network name or alias.
hptr
[Tru64 UNIX] For gethostbyname_r() only, this points to the hostent
structure. The netdb.h header file defines hostent structure.
hdptr
[Tru64 UNIX] For gethostbyname_r() only, this is data for hosts data-
base. The netdb.h header file defines hostent_data structure.
...
NOTES
The gethostbyname() function returns a pointer to thread-specific data.
Subsequent calls to this or a related function from the same thread
overwrite this data.
[Tru64 UNIX] The gethostbyname_r() function is an obsolete reentrant ver-
sion of the gethostbyname() function. It is supported in order to maintain
backward compatibility with previous versions of the operating system and
should not be used in new designs. Note that you must zero-fill the hdptr
structure before its first access by the gethostbyname_r() function.
RETURN VALUES
Upon successful completion, the gethostbyname() function returns a pointer
to a hostent structure. If it reaches the end of the network host name
database, it returns a null pointer.
[Tru64 UNIX] Upon successful completion, the gethostbyname_r() function
stores the hostent structure in the location pointed to by hptr, and
returns a value of 0 (zero). Upon failure, it returns a value of -1.
ERRORS
If the gethostbyname() or gethostbyname_r() function call fails, h_errno is
set to one of the following the values:
...
[Tru64 UNIX] If any of the following conditions occurs, the
gethostbyaddr_r() function sets errno to the corresponding value:
[EINVAL]
The name, hptr, or hdptr is invalid.
-------------------------------------------------------------------------------
The MySQL source code does not set the contents of the "struct hostent_data"
structure to zero, as required by OSF1. As a result, the call to gethostbyname_r
returns a non-zero value and sets h_errno to EINVAL. Unfortunately, the MySQL
source code interprets this to mean that the hostname lookup failed.
This problem is closely related to a bug which I reported in April 2002:
http://lists.mysql.com/bugs/11975
On that occasion, Monty investigated and correctly analysed the problem:
http://lists.mysql.com/bugs/11977
He noted that gethostbyname is thread-safe under OSF1, so it is not necessary
to use the re-entrant version gethostbyname_r, which in any case is flagged as
obsolete.
Monty's workaround should still work, but it seems to have been dropped from
the official builds at sonme point.
Unfortunately, I'm unable to build from source code myself because I lack the
HP C++ compiler, so I can't verify my diagnosis of the problem, but I'm
quite confident that it is correct.