Bug #59179 binlog_checksum and rpl_checksum_cache time out on Linux in release builds
Submitted: 27 Dec 2010 7:19 Modified: 24 Jan 2011 15:59
Reporter: Alexander Nozdrin Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S7 (Test Cases)
Version:5.6.1 OS:Linux
Assigned to: Andrei Elkin CPU Architecture:Any
Tags: pb2, test failure

[27 Dec 2010 7:19] Alexander Nozdrin
Description:
Tree: mysql-5.6.1-m5-release

The tests time out on two linux boxes.
This is *not* sporadic time outs.

How to repeat:
XRef: http://pb2.norway.sun.com/?template=mysql_show_test_failure&search=yes&push_id=1856111&tes...

XRef: http://pb2.norway.sun.com/?template=mysql_show_test_failure&search=yes&push_id=1856111&tes...

Log: http://pb2.norway.sun.com/?action=archive_download&archive_id=2713685&pretty=please
[28 Dec 2010 12:33] Joerg Bruehe
For the record:

1) Of all platforms in the 5.6.1-m5 release build, Linux/ia64 (generic, both tar.gz and RPM) is the only one on which this happens.
For these builds, we are using gcc.
Using icc ("specific" RPMs for SLES and RHEL), the test passes.

2) It happens in all test modes, the timeout is not specific to the (slow) "debug" run, and even in "debug" it does not happen on any other platform.

3) On the other platforms, run time ranges from about 20 ms to (at most) 2,650 ms, and of the 270 runs which pass only 7 have a run time above 2,000 ms.
The timeout is at 900,000 ms, so there is still a huge distance.
[28 Dec 2010 12:44] Joerg Bruehe
The above timing info is about "binlog_checksum",
for "rpl_checksum_cache" the run times vary from 2,300 ms to 53,030 ms.

About pass/fail, both tests behave identical.
[29 Dec 2010 10:18] Andrei Elkin
=== modified file 'sql/sys_vars.cc'
--- sql/sys_vars.cc	2010-12-20 13:26:51 +0000
+++ sql/sys_vars.cc	2010-12-28 23:32:47 +0000
@@ -1941,9 +1941,10 @@ bool Sys_var_enum_binlog_checksum::globa
   mysql_mutex_lock(mysql_bin_log.get_log_lock());
   if(mysql_bin_log.is_open())
   {
-    uint flags= RP_FORCE_ROTATE | RP_LOCK_LOG_IS_ALREADY_LOCKED |
-      (binlog_checksum_options != (uint) var->save_result.ulonglong_value?
-       RP_BINLOG_CHECKSUM_ALG_CHANGE : 0);
+    uint8 flags= (RP_FORCE_ROTATE | RP_LOCK_LOG_IS_ALREADY_LOCKED);
+    flags |= (binlog_checksum_options !=
+              (ulong) var->save_result.ulonglong_value?
+              RP_BINLOG_CHECKSUM_ALG_CHANGE : 0);
     if (flags & RP_BINLOG_CHECKSUM_ALG_CHANGE)
       mysql_bin_log.checksum_alg_reset= (uint8) var->save_result.ulonglong_value;
     mysql_bin_log.rotate_and_purge(flags);
[29 Dec 2010 12:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/127662

3447 Andrei Elkin	2010-12-29
      bug#59179 binlog_checksum and rpl_checksum_cache time out on Linux in release builds
      
      The issue appeared to be a hang in attempt to lock a mutex in MYSQL_BIN_LOG::rotate_and_purge()
      which should not be attempted in an execution branch that calls the methods
      from  Sys_var_enum_binlog_checksum::global_update().
      The hang is ia64 compilation env specific and is caused by incorrect computation of `flags'
      when its assigment include the third bitwise OR argument that is ( ? : ) expression.
      In that case the value of `flags' is always the value of `( ? : )' that is 4 or zero.
      
      This indicates most probably the ia64 compiler issue.
      
      Fixed with splitting flags caclulation into two parts which does not create the reported
      issue but rather the value becomes correct.
      
      
      
      
      ******
      an experimental commit to fixing bug59179.
     @ sql/binlog.cc
        signature changed.
     @ sql/binlog.h
        Explicit size of a integer is set to rotate_and_purge()'s argument.
     @ sql/sys_vars.cc
        splitting flags caclulation into two parts to work around ia64 wrong OR evaluation.
[10 Jan 2011 16:27] Joerg Bruehe
This bug fix has been transferred into the 5.6.1-m5 release build.

Please reset the status to "Patch approved" after documenting it.
[12 Jan 2011 17:59] Andrei Elkin
Notice, we still need to sort out a possible compiler/ia64 env issue as reported today in Bug #59436.
[13 Jan 2011 10:56] Joerg Bruehe
The patch is included in a rebuild, but contrary to the expectations the problem still occurs in the RPM builds for Linux/IA64 ("generic", using gcc).

There are no logs of the Linux/IA64/tar.gz build yet (still running).
[24 Jan 2011 15:59] Jon Stephens
Affected tests only, not reported in release. Closed.
[19 Aug 2011 18:42] MySQL Verification Team
this was still a server bug, no matter if it was reported in tests or not.