MySQL Bugs: #42704: "Slave_IO_Running" differs in test result (2)

Bug #42704	"Slave_IO_Running" differs in test result (2)
Submitted:	9 Feb 2009 16:27	Modified:	2 Jun 2009 17:05
Reporter:	Joerg Bruehe	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Tests: Replication	Severity:	S3 (Non-critical)
Version:	cluster 6.4.2	OS:	Any
Assigned to:	Magnus Blåudd	CPU Architecture:	Any

Description:
Test fails on several platforms, especially Linux (RPM builds) and OS X, both x86 and x86_64 CPUs:

=====
rpl.rpl_packet 'row'           [ fail ]

--- /PATH/mysql-test/suite/rpl/r/rpl_packet.result
+++ /PATH/mysql-test/suite/rpl/r/rpl_packet.reject
@@ -31,7 +31,7 @@
 START SLAVE;
 CREATE TABLE `t1` (`f1` LONGTEXT) ENGINE=MyISAM;
 INSERT INTO `t1`(`f1`) VALUES ('aaaaa...aaaaa2048');
-Slave_IO_Running = No (expect No)
+Slave_IO_Running = Yes (expect No)
 ==== clean up ====
 DROP TABLE t1;
 DROP TABLE t1;

mysqltest: Result length mismatch
=====

How to repeat:
Run the suite.

Looking in the slave servers log file, I can see that the loop in  include/wait_for_slave_io_to_stop.inc got "No such row" as result, when it then later compares that against 'No', the loop is terminated prematurely.

'No such row' can be returned from 'query_get_value', will double check that function to see why. If function looks ok, just change to SELECT 'Yes' = '<result from query_get_value on SHOW SLAVE STATUS>'

                    2 Query     SHOW SLAVE STATUS^@
                    2 Query     SELECT 'No' != 'No such row'
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                    2 Query     SELECT '' = ''
                    2 Query     SHOW SLAVE STATUS^@
                    2 Query     SELECT NOT('No' = 'No') AND 3000 > 0
                    2 Query     SHOW SLAVE STATUS
                    2 Query     DROP TABLE t1
                    2 Query     SHOW WARNINGS

Hmm, someone has coded 'No such row' into the .inc file ...

The slave IO thread reads the packet header from master and find it is a bigger packet than it can handle and will set an error code, so far it's fine and same as 6.3. For some reason "it" also call 'vio_was_interrupted', that function read errno which is 11 from an old operation -> yes, it was interrupted and thus a different path is taken where the slave tries to reconnect to the master instead of just erroring out as it should. While reconnecting it reset the Slave_IO_Running to YES for a brief time and thus this race condition occurs.

Pushed into 5.1.34-ndb-7.0.7 (revid:magnus.blaudd@sun.com-20090601151346-9scmtzv2lzhueou4) (version source revid:magnus.blaudd@sun.com-20090601151346-9scmtzv2lzhueou4) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75385

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75389

Pushed into 5.1.34-ndb-7.0.7 (revid:magnus.blaudd@sun.com-20090601153434-ai9qfmpwwmorscn0) (version source revid:magnus.blaudd@sun.com-20090601153434-ai9qfmpwwmorscn0) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)

Test failure only, no changes to document. Closed.