MySQL Bugs: #44344: Backport BUG#37267 connect() EINPROGRESS failures mishandled in client library

Bug #44344	Backport BUG#37267 connect() EINPROGRESS failures mishandled in client library
Submitted:	17 Apr 2009 16:47	Modified:	4 May 2010 13:15
Reporter:	Joerg Bruehe	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	cluster-7.0.5	OS:	Any
Assigned to:	Magnus Blåudd	CPU Architecture:	Any

Description:
Test failure in the build of cluster-7.0.5,
all runs on Solaris 9, all CPUs (and nowhere else!):

=====
analyze: sync_with_master
mysqltest: At line NNN: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 747, 300)'
 returned -1 indicating timeout after 300 seconds

The result from queries just before the failure was:
< snip >
Master_SSL_Cipher
Master_SSL_Key  MYSQL_TEST_DIR/std_data/client-key.pem
Seconds_Behind_Master   #
Master_SSL_Verify_Server_Cert   No
Last_IO_Errno   #
Last_IO_Error   #
Last_SQL_Errno  0
Last_SQL_Error
Master_Bind
stop slave;
change master to
master_host="localhost",
master_ssl=1 ,
master_ssl_ca ='MYSQL_TEST_DIR/std_data/cacert.pem',
master_ssl_cert='MYSQL_TEST_DIR/std_data/client-cert.pem',
master_ssl_key='MYSQL_TEST_DIR/std_data/client-key.pem',
master_ssl_verify_server_cert=1;
start slave;
create table t1 (t int);
insert into t1 values (1);

More results from queries before failure can be found in /PATH/mysql-test/var/log/rpl_ssl1.log
=====

To me, this looks very similar to bug#41055, but I file it separate because of the strict coupling to only one operating system.

How to repeat:
Run the test suite on Solaris 9.

Problem persists in cluster-7.0.6:

Same failure,
again hitting all Solaris-9 platforms (all CPU types),
and not showing on any other platform.

Almost exact same failure on SLES 11

Occurs on linux with gcc 4.2.4

Only in 7.0 and upwards. Exact same host and build settings with 6.3 does not show it.

This is a duplicate of bug#37267

Stealing this to Cluster team so we can get the patch included. necessary since we have  IPV6 backport.

This was seen in 7.0 as rpl_ssl1.test failing
- on a machine where "localhost" is resolved both as an IPV6 and IPV4 address
- the IPv6 address is blocked and comes first in the list returned by 'getaddrinfo'

The failure shows that 'mysql_real_connect' does not properly retry the next possible address info available, but instead reports connect failed.

Since MySQL Cluster 7.0 have IPv6 functionality backported from 5.5, this patch should be included into 7.0

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105724

Pushed into 5.1.44-ndb-7.0.15 (revid:magnus.blaudd@sun.com-20100429073554-0wswepmdyx9mibt2) (version source revid:magnus.blaudd@sun.com-20100429073554-0wswepmdyx9mibt2) (merge vers: 5.1.44-ndb-7.0.15) (pib:16)

Documented fix in the NDB 7.0.15 & 7.1.4 changelogs -- see BUG#37267 for details.

Closed.