Bug #51398 Test "rpl_heartbeat_basic" fails "while waiting for slave_received_heartbeats"
Submitted: 22 Feb 2010 18:32 Modified: 24 Mar 2010 17:14
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.5.2-m2 OS:Any
Assigned to: Andrei Elkin CPU Architecture:Any

[22 Feb 2010 18:32] Joerg Bruehe
Description:
In the release build of 5.5.2-m2, test "rpl_heartbeat_basic" fails on several platforms with similar symptoms (just some numbers differ).

The failure is new, has not been observed in any previous build.

Affected platforms:
   aix5.3-powerpc-64bit
   hpux11.23-ia64
   hpux11.31-ia64
   solaris10-sparc-64bit
   solaris10-sparc-64bit-pkgadd
   RPM-ia64

Typical symptom:
=====
rpl.rpl_heartbeat_basic 'row'            [ retry-fail ]
        Test ended at YYYY-MM-DD HH:MM:SS

CURRENT_TEST: rpl.rpl_heartbeat_basic
--- /PATH/mysql-test/suite/rpl/r/rpl_heartbeat_basic.result
+++ /PATH/mysql-test/suite/rpl/r/rpl_heartbeat_basic.reject
@@ -288,13 +288,15 @@
 1      on master
 2      on slave
 Heartbeat event received on master
-Heartbeat event received on slave
-Slave has received more events than master (1 means 'yes'): 1
-
-*** Clean up ***
-include/stop_slave.inc
-DROP TABLE t1;
-include/stop_slave.inc
-SET @@global.slave_net_timeout=@restore_slave_net_timeout;
-
-End of 6.0 test
+**** ERROR: failed while waiting for  slave_received_heartbeats  3 ****
+Note: the following output may have changed since the failure was detected
+**** Showing STATUS, PROCESSLIST ****
+SHOW  STATUS LIKE 'slave_received_heartbeats';
+Variable_name  Value
+Slave_received_heartbeats      3
+SHOW PROCESSLIST;
+Id     User    Host    db      Command Time    State   Info
+1      root    localhost:54588 test    Query   0       NULL    SHOW PROCESSLIST
+22     root    localhost:54612 NULL    Binlog Dump     67      Master has sent all binlog to slave; waiting for binlog to be updated   NULL
+23     system user             NULL    Connect 67      Waiting for master to send event        NULL
+24     system user             NULL    Connect 67      Slave has read all relay log; waiting for the slave I/O thread to update it     NULL

mysqltest: In included file "./include/wait_for_status_var.inc": Result length mismatch

 - saving '/PATH/mysql-test/var/log/rpl.rpl_heartbeat_basic-row/' to '/PATH/mysql-test/var/log/rpl.rpl_heartbeat_basic-row/'
=====

Differences are:
- the log mode: "mix", "row", "stmt";
- the values reported for "Slave_received_heartbeats": 2, 3, 4, 6, 7, 8, 9, 10;
- the number following "localhost:";
- the value reported for "Time": 61, 62, 63, 67, 68, 69, 70.

How to repeat:
Found by running the test suite during build.
[12 Mar 2010 18:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103118

2992 Andrei Elkin	2010-03-12
      Bug #51398 Test "rpl_heartbeat_basic" fails "while waiting for slave_received_heartbeats"
      
      Non-determinism of the test was caused by false expectations for received hb:s
      in the circular setup of the test.
      Even though the HB periods ratio was as choosen to be as high as 10, it's incorrect 
      to compare number of hb-events basing only a relation between their periods.
      
      Fixed with removing the incorrect snippet of the test.
     @ mysql-test/suite/rpl/r/rpl_heartbeat_basic.result
        results are changed.
     @ mysql-test/suite/rpl/t/rpl_heartbeat_basic.test
        Reducing the test env to run in only with MIXED mode;
        Simplifying logics of the circular setup to verify only
        that HB flows both directions.
[16 Mar 2010 16:34] Andrei Elkin
The main issue of the current bug was fixed by BUG#50767.
Earlier commit addresses a possible issue of non-determinism due to
the fact that in a real env hb rate can not bound to any minimum value.
Therefore a configured as more hb-active server can sent hb-events slows at some
rare times.
[16 Mar 2010 18:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103503

2997 Andrei Elkin	2010-03-16
      Bug #51398 Test "rpl_heartbeat_basic" fails "while waiting for slave_received_heartbeats"
      
      Non-determinism of the test was caused by lack of setting a proper value to hb period,
      actually fixed by BUG@50767.
      
      These fixes aims at possible non-determinism in comparison of received
      hb events by master and slave in the circular part of the test.
      Even though the HB periods ratio was as choosen to be as high as 10, it's incorrect 
      to compare number of hb-events basing only a relation between their periods.
      
      Yet another issue is relatively short 60 secs timeout of wait_for_status_var.inc
      makes valgrind runs to fail. 
      Fixed with deploying wait_for_slave_io_to_start afront of calling wait_for_status_var.
      
      The test is made runnable only with MIXED binlog-format as it has close to 1 min
      total exec time and there is nothing format specific in it.
     @ mysql-test/suite/rpl/r/rpl_heartbeat_basic.result
        results are changed.
     @ mysql-test/suite/rpl/t/rpl_heartbeat_basic.test
        Reducing the test env to run in only with MIXED mode;
        Simplifying logics of the circular setup to verify only
        that HB flows both directions.
[16 Mar 2010 19:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103507

2997 Andrei Elkin	2010-03-16
      Bug #51398 Test "rpl_heartbeat_basic" fails "while waiting for slave_received_heartbeats"
      
      Non-determinism of the test was caused by lack of setting a proper value to hb period,
      actually fixed by BUG@50767.
      
      These fixes aim at possible non-determinism in comparison of received
      hb events by master and slave in the circular part of the test.
      Even though the HB periods ratio was choosen to be as high as 10, it's still incorrect 
      to compare number of hb-events basing only a relation between their periods.
      
      Yet another issue is relatively short 60 secs timeout of wait_for_status_var.inc
      makes valgrind runs to fail. 
      Fixed with deploying wait_for_slave_io_to_start afront of calling wait_for_status_var.
      
      The test is made runnable only with MIXED binlog-format as it has close to 1 min
      total exec time and there is nothing format specific in it.
     @ mysql-test/suite/rpl/r/rpl_heartbeat_basic.result
        results are changed.
     @ mysql-test/suite/rpl/t/rpl_heartbeat_basic.test
        Reducing the test env to run in only with MIXED mode;
        Simplifying logics of the circular setup to verify only
        that HB flows both directions.
[21 Mar 2010 18:30] Andrei Elkin
Pushed to mysql- trunk,next-rm,6-0-codebase -bugfixing.
[24 Mar 2010 8:14] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100324081249-yfwol7qtcek6dh7w) (version source revid:alik@sun.com-20100324081113-kc7x1iytnplww91u) (merge vers: 6.0.14-alpha) (pib:16)
[24 Mar 2010 8:16] Bugs System
Pushed into 5.5.4-m3 (revid:alik@sun.com-20100324081056-6bndv6f0nrvbblhp) (version source revid:alik@sun.com-20100324081056-6bndv6f0nrvbblhp) (merge vers: 5.5.4-m3) (pib:16)
[24 Mar 2010 8:17] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100324081159-5b8juv8ldiqwce8v) (version source revid:alik@sun.com-20100324081105-y72rautcea375zxm) (pib:16)
[24 Mar 2010 17:14] Paul DuBois
Changes to test case. No changelog entry needed.