Bug #44344 Backport BUG#37267 connect() EINPROGRESS failures mishandled in client library
Submitted: 17 Apr 2009 16:47 Modified: 4 May 2010 13:15
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:cluster-7.0.5 OS:Any
Assigned to: Magnus Blåudd CPU Architecture:Any

[17 Apr 2009 16:47] Joerg Bruehe
Description:
Test failure in the build of cluster-7.0.5,
all runs on Solaris 9, all CPUs (and nowhere else!):

=====
analyze: sync_with_master
mysqltest: At line NNN: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 747, 300)'
 returned -1 indicating timeout after 300 seconds

The result from queries just before the failure was:
< snip >
Master_SSL_Cipher
Master_SSL_Key  MYSQL_TEST_DIR/std_data/client-key.pem
Seconds_Behind_Master   #
Master_SSL_Verify_Server_Cert   No
Last_IO_Errno   #
Last_IO_Error   #
Last_SQL_Errno  0
Last_SQL_Error
Master_Bind
stop slave;
change master to
master_host="localhost",
master_ssl=1 ,
master_ssl_ca ='MYSQL_TEST_DIR/std_data/cacert.pem',
master_ssl_cert='MYSQL_TEST_DIR/std_data/client-cert.pem',
master_ssl_key='MYSQL_TEST_DIR/std_data/client-key.pem',
master_ssl_verify_server_cert=1;
start slave;
create table t1 (t int);
insert into t1 values (1);

More results from queries before failure can be found in /PATH/mysql-test/var/log/rpl_ssl1.log
=====

To me, this looks very similar to bug#41055, but I file it separate because of the strict coupling to only one operating system.

How to repeat:
Run the test suite on Solaris 9.
[2 Jun 2009 21:06] Joerg Bruehe
Problem persists in cluster-7.0.6:

Same failure,
again hitting all Solaris-9 platforms (all CPU types),
and not showing on any other platform.
[6 Sep 2009 12:07] Kent Boortz
Almost exact same failure on SLES 11
[14 Apr 2010 15:59] Magnus Blåudd
Occurs on linux with gcc 4.2.4

Only in 7.0 and upwards. Exact same host and build settings with 6.3 does not show it.
[15 Apr 2010 8:48] Magnus Blåudd
This is a duplicate of bug#37267
[15 Apr 2010 9:01] Magnus Blåudd
Stealing this to Cluster team so we can get the patch included. necessary since we have  IPV6 backport.
[15 Apr 2010 9:14] Magnus Blåudd
This was seen in 7.0 as rpl_ssl1.test failing
- on a machine where "localhost" is resolved both as an IPV6 and IPV4 address
- the IPv6 address is blocked and comes first in the list returned by 'getaddrinfo'

The failure shows that 'mysql_real_connect' does not properly retry the next possible address info available, but instead reports connect failed.

Since MySQL Cluster 7.0 have IPv6 functionality backported from 5.5, this patch should be included into 7.0
[15 Apr 2010 11:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105724
[29 Apr 2010 9:43] Bugs System
Pushed into 5.1.44-ndb-7.0.15 (revid:magnus.blaudd@sun.com-20100429073554-0wswepmdyx9mibt2) (version source revid:magnus.blaudd@sun.com-20100429073554-0wswepmdyx9mibt2) (merge vers: 5.1.44-ndb-7.0.15) (pib:16)
[4 May 2010 13:15] Jon Stephens
Documented fix in the NDB 7.0.15 & 7.1.4 changelogs -- see BUG#37267 for details.

Closed.