Bug #42704 "Slave_IO_Running" differs in test result (2)
Submitted: 9 Feb 2009 16:27 Modified: 2 Jun 2009 17:05
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S3 (Non-critical)
Version:cluster 6.4.2 OS:Any
Assigned to: Magnus Blåudd CPU Architecture:Any

[9 Feb 2009 16:27] Joerg Bruehe
Description:
Test fails on several platforms, especially Linux (RPM builds) and OS X, both x86 and x86_64 CPUs:

=====
rpl.rpl_packet 'row'           [ fail ]

--- /PATH/mysql-test/suite/rpl/r/rpl_packet.result
+++ /PATH/mysql-test/suite/rpl/r/rpl_packet.reject
@@ -31,7 +31,7 @@
 START SLAVE;
 CREATE TABLE `t1` (`f1` LONGTEXT) ENGINE=MyISAM;
 INSERT INTO `t1`(`f1`) VALUES ('aaaaa...aaaaa2048');
-Slave_IO_Running = No (expect No)
+Slave_IO_Running = Yes (expect No)
 ==== clean up ====
 DROP TABLE t1;
 DROP TABLE t1;

mysqltest: Result length mismatch
=====

How to repeat:
Run the suite.
[20 May 2009 10:54] Magnus Blåudd
Looking in the slave servers log file, I can see that the loop in  include/wait_for_slave_io_to_stop.inc got "No such row" as result, when it then later compares that against 'No', the loop is terminated prematurely.

'No such row' can be returned from 'query_get_value', will double check that function to see why. If function looks ok, just change to SELECT 'Yes' = '<result from query_get_value on SHOW SLAVE STATUS>'

                    2 Query     SHOW SLAVE STATUS^@
                    2 Query     SELECT 'No' != 'No such row'
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                    2 Query     SELECT '' = ''
                    2 Query     SHOW SLAVE STATUS^@
                    2 Query     SELECT NOT('No' = 'No') AND 3000 > 0
                    2 Query     SHOW SLAVE STATUS
                    2 Query     DROP TABLE t1
                    2 Query     SHOW WARNINGS
[20 May 2009 11:01] Magnus Blåudd
Hmm, someone has coded 'No such row' into the .inc file ...
[1 Jun 2009 15:08] Magnus Blåudd
The slave IO thread reads the packet header from master and find it is a bigger packet than it can handle and will set an error code, so far it's fine and same as 6.3. For some reason "it" also call 'vio_was_interrupted', that function read errno which is 11 from an old operation -> yes, it was interrupted and thus a different path is taken where the slave tries to reconnect to the master instead of just erroring out as it should. While reconnecting it reset the Slave_IO_Running to YES for a brief time and thus this race condition occurs.
[1 Jun 2009 15:15] Bugs System
Pushed into 5.1.34-ndb-7.0.7 (revid:magnus.blaudd@sun.com-20090601151346-9scmtzv2lzhueou4) (version source revid:magnus.blaudd@sun.com-20090601151346-9scmtzv2lzhueou4) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)
[1 Jun 2009 15:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75385
[1 Jun 2009 15:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75389
[1 Jun 2009 15:35] Bugs System
Pushed into 5.1.34-ndb-7.0.7 (revid:magnus.blaudd@sun.com-20090601153434-ai9qfmpwwmorscn0) (version source revid:magnus.blaudd@sun.com-20090601153434-ai9qfmpwwmorscn0) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)
[2 Jun 2009 17:05] Jon Stephens
Test failure only, no changes to document. Closed.