Bug #49557 "semisync" plugin test fails on Windows
Submitted: 9 Dec 2009 11:36 Modified: 15 Mar 2010 12:05
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.5.0-m2 OS:Windows
Assigned to: Zhenxing He CPU Architecture:Any
Tags: disabled, pb2, test failure

[9 Dec 2009 11:36] Joerg Bruehe
Description:
This issue is already known, but I need a bug number for the status page.

Specific to Windows, both 32 and 64 bit,
after MTR got fixed to resolve bug#49170:

=====
rpl.rpl_semi_sync 'row'                  [ retry-fail ]
        Test ended at YYYY-MM-DD HH:MM:SS

CURRENT_TEST: rpl.rpl_semi_sync
--- /PATH/mysql-test/suite/rpl/r/rpl_semi_sync.result
+++ /PATH/mysql-test/suite/rpl/r/rpl_semi_sync.reject
@@ -88,10 +88,10 @@
 Rpl_semi_sync_master_status    ON
 show status like 'Rpl_semi_sync_master_no_tx';
 Variable_name  Value
-Rpl_semi_sync_master_no_tx     0
+Rpl_semi_sync_master_no_tx     1
 show status like 'Rpl_semi_sync_master_yes_tx';
 Variable_name  Value
-Rpl_semi_sync_master_yes_tx    1
+Rpl_semi_sync_master_yes_tx    0
 select CONNECTIONS_NORMAL_SLAVE - CONNECTIONS_NORMAL_SLAVE as 'Should be 0';
 Should be 0
 0
@@ -102,10 +102,10 @@
 Rpl_semi_sync_master_status    ON
 show status like 'Rpl_semi_sync_master_no_tx';
 Variable_name  Value
-Rpl_semi_sync_master_no_tx     0
+Rpl_semi_sync_master_no_tx     1
 show status like 'Rpl_semi_sync_master_yes_tx';
 Variable_name  Value
-Rpl_semi_sync_master_yes_tx    301
+Rpl_semi_sync_master_yes_tx    300
 [ on slave ]
 [ slave status after replicated inserts ]
 show status like 'Rpl_semi_sync_slave_status';
@@ -132,10 +132,10 @@
 Rpl_semi_sync_master_status    ON
 show status like 'Rpl_semi_sync_master_no_tx';
 Variable_name  Value
-Rpl_semi_sync_master_no_tx     0
+Rpl_semi_sync_master_no_tx     1
 show status like 'Rpl_semi_sync_master_yes_tx';
 Variable_name  Value
-Rpl_semi_sync_master_yes_tx    301
+Rpl_semi_sync_master_yes_tx    300
 show status like 'Rpl_semi_sync_master_clients';
 Variable_name  Value
 Rpl_semi_sync_master_clients   1
...
... and more such differences
...
=====

How to repeat:
Run the test suite on Windows.
[10 Dec 2009 7:56] Zhenxing He
Hi Joerg,

I cannot reproduce the problem, do you get this mismatch often
or sporadically?

And could you please try the following patch to see if the problem disappear?

=== modified file 'mysql-test/suite/rpl/r/rpl_semi_sync.result'
--- mysql-test/suite/rpl/r/rpl_semi_sync.result 2009-10-23 04:56:30 +0000
+++ mysql-test/suite/rpl/r/rpl_semi_sync.result 2009-12-09 04:43:49 +0000
@@ -398,8 +398,8 @@
 #
 set sql_log_bin=0;
 INSTALL PLUGIN rpl_semi_sync_master SONAME 'SEMISYNC_MASTER_PLUGIN';
-set global rpl_semi_sync_master_timeout= 5000;
-/* 5s */
+set global rpl_semi_sync_master_timeout= 10000;
+/* 10s */
 set sql_log_bin=1;
 set global rpl_semi_sync_master_enabled= 1;
 [ on slave ]

=== modified file 'mysql-test/suite/rpl/t/rpl_semi_sync.test'
--- mysql-test/suite/rpl/t/rpl_semi_sync.test   2009-10-23 04:56:30 +0000
+++ mysql-test/suite/rpl/t/rpl_semi_sync.test   2009-12-09 04:39:48 +0000
@@ -61,7 +61,7 @@
 {
     set sql_log_bin=0;
     eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_PLUGIN';
-    set global rpl_semi_sync_master_timeout= 5000; /* 5s */
+    set global rpl_semi_sync_master_timeout= 10000; /* 10s */
     set sql_log_bin=1;
 }
 enable_query_log;
@@ -537,7 +537,7 @@
 set sql_log_bin=0;
 replace_result $SEMISYNC_MASTER_PLUGIN SEMISYNC_MASTER_PLUGIN;
 eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_PLUGIN';
-set global rpl_semi_sync_master_timeout= 5000; /* 5s */
+set global rpl_semi_sync_master_timeout= 10000; /* 10s */
 set sql_log_bin=1;
 set global rpl_semi_sync_master_enabled= 1;
[10 Dec 2009 13:21] Joerg Bruehe
I got the problem in the release build of 5.5.0-m2.

I got it on both Windows platforms (32 and 64 bit),
in each run of the test suite ("normal" and "ps" mode),
with each binlog mode ("row", "stmt", and "mix),
both on the first run and on the retry.

In other words: Of 24 attempts to run this test, 24 failed;
no run passed (on Windows).

There are minor variations of the symptoms,
but the schema is the same in all these 24 runs.

Elena had a similar symptom with 5.5.0-beta, so it seems to remain fairly constant.

Once I will have finished and delivered 5.5.0-m2, I can try to re-run the test with this patch. Still, I find it remarkable that no Unix/Linux platform showed this symptom.
[23 Feb 2010 9:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101161

3129 He Zhenxing	2010-02-23
      Bug#49557 "semisync" plugin test fails on Windows
      
      The problem was because the gettimeofday function was incorrect
      implemented for Windows, and so the semisync master did not wait
      for slave reply properly on Windows.
      
      Fixed by removing the gettimeofday function for Windows, and using
      set_timespec function to get current time for all platforms.
     @ plugin/semisync/semisync_master.cc
        removing the gettimeofday function for Windows, using set_timespec
        function to get current time for all platforms.
[26 Feb 2010 6:04] Alexander Nozdrin
For some magical reason, the test started to fail regularly
after the push 'vvaintroub@mysql... 2010-02-25 11:06:10'
in next-mr-bugfixing.

The diff was merely a change to SunOS.cmake:

=== modified file 'cmake/os/SunOS.cmake'
--- cmake/os/SunOS.cmake 2010-01-27 00:50:19 +0000
+++ cmake/os/SunOS.cmake 2010-02-25 09:57:23 +0000
@@ -30,7 +30,7 @@ SET(LIBM m)
# CMake defined -lthread as thread flag. This crashes in dlopen # when
trying to load plugins workaround with -lpthread
-SET(CMAKE_THREADS_LIBS_INIT -lpthread CACHE INTERNAL "")
+SET(CMAKE_THREADS_LIBS_INIT -lpthread CACHE INTERNAL "" FORCE)
# Solaris specific large page support
CHECK_SYMBOL_EXISTS(MHA_MAPSIZE_VA sys/mman.h
HAVE_DECL_MHA_MAPSIZE_VA)

Log:
http://pb2.norway.sun.com/web.py?action=archive_download&archive_id=1461197&pretty=please

The test case will be disabled on Windows due to this bug.
[10 Mar 2010 7:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102816

3130 He Zhenxing	2010-03-10
      Post fix for Bug#49557
[10 Mar 2010 7:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102817

3130 He Zhenxing	2010-03-10
      Post fix for Bug#49557
[10 Mar 2010 10:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102853

3130 He Zhenxing	2010-03-10
      Post fix for Bug#49557
[12 Mar 2010 18:10] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100312180926-0emfjrj8e9xnvl8h) (version source revid:alik@sun.com-20100312180447-2r0ak22y13s05134) (merge vers: 6.0.14-alpha) (pib:16)
[12 Mar 2010 18:11] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100312180838-rk60kn38g0qwt78n) (version source revid:alik@sun.com-20100312180435-wk7nvsbfntfus5bu) (pib:16)
[12 Mar 2010 18:22] Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100312181131-0b7v8r2htpd9jz2a) (version source revid:alik@sun.com-20100312181131-0b7v8r2htpd9jz2a) (merge vers: 5.5.3-m3) (pib:16)
[15 Mar 2010 12:05] Jon Stephens
Documented bugfix in the 5.5.3 and 6.0.14 changelogs as follows:

        When using the semi-synchronous replication plugin on Windows,
        the wait time calculated when the master was waiting for reply
        from the slave was incorrect. In addition, when the wait time
        was less than the current time, the master did not wait for a
        reply at all.

        This issue was caused by the fact that a different internal
        function was used to get current time by the plugin on Windows
        as opposed to other platforms, and this function was not
        correctly implemented. Now the Windows version of the plugin
        uses the same function as other platforms for this purpose.

Closed.