Bug #66411 MySQL crashed in is_tranx_end_pos within an assertion for semisync plugin
Submitted: 16 Aug 2012 1:59 Modified: 7 Feb 2014 4:47
Reporter: hui liu (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.5.28, 5.7.0 OS:Any
Assigned to:
Tags: assertion failed, is_tranx_end_pos, semisync
Triage: Needs Triage: D1 (Critical)

[16 Aug 2012 1:59] hui liu
Description:
There was a server crash when the code first checks if semisync is enabled without lock, then if so takes the lock and tests again. If semisync was disabled in-between the first and the second test, an assert was incorrectly made that referenced a NULL pointer.

It's fixed in MariaDB long time ago but still exists in MySQL-5.5.27.

Crashed with debug mode:
 ./mtr --mysqld="--plugin-dir=/u01/download/5527/plugin/semisync" rpl.semisync_assert_failed.test 
 
stack_bottom = 4043b090 thread_stack 0x40000
/u01/download/5527/sql/mysqld(my_print_stacktrace+0x32)[0x89d718]
/u01/download/5527/sql/mysqld(handle_fatal_signal+0x425)[0x7233c1]
/lib64/libpthread.so.0[0x35eba0ebe0]
/u01/download/5527/plugin/semisync/semisync_master.so(Trace::function_enter(char const*)+0x14)[0x7f013b6e8c74]
/u01/download/5527/plugin/semisync/semisync_master.so(ActiveTranx::is_tranx_end_pos(char const*, unsigned long long)+0x2c)[0x7f013b6e77be]
/u01/download/5527/plugin/semisync/semisync_master.so(ReplSemiSyncMaster::commitTrx(char const*, unsigned long long)+0x520)[0x7f013b6e8006]
/u01/download/5527/plugin/semisync/semisync_master.so(repl_semi_report_commit+0x50)[0x7f013b6e998c]
/u01/download/5527/sql/mysqld(Trans_delegate::after_commit(THD*, bool)+0x1be)[0x6a395e]
/u01/download/5527/sql/mysqld(ha_commit_trans(THD*, bool)+0x5b3)[0x72b6d5]
/u01/download/5527/sql/mysqld(trans_commit_stmt(THD*)+0x70)[0x6ad864]
/u01/download/5527/sql/mysqld(mysql_execute_command(THD*)+0x72fc)[0x5cad18]
/u01/download/5527/sql/mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*)+0x1d8)[0x5cb104]
/u01/download/5527/sql/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xb20)[0x5cc6fa]
/u01/download/5527/sql/mysqld(do_command(THD*)+0x248)[0x5cdbd2]
/u01/download/5527/sql/mysqld(do_handle_one_connection(THD*)+0x16c)[0x69c79e]
/u01/download/5527/sql/mysqld(handle_one_connection+0x2d)[0x69c85b]
/u01/download/5527/sql/mysqld(pfs_spawn_thread+0x92)[0xac6758]
/lib64/libpthread.so.0[0x35eba0677d]
/lib64/libc.so.6(clone+0x6d)[0x35eb2d49ad]

How to repeat:
$cat suite/rpl/t/semisync_assert_failed.test
--source include/have_semisync_plugin.inc
--source include/not_embedded.inc
--source include/have_binlog_format_mixed_or_statement.inc
--source include/master-slave.inc
--source include/have_debug_sync.inc
 
# MDEV-359: There was a server crash when the code first checks if semisync
# is enabled without lock, then if so takes the lock and tests again.
# If semisync was disabled in-between the first and the second test, an
# assert was incorrectly made that referenced a NULL pointer.
#
# This tests uses debug_sync to pause one thread at the critical point in
# the code, disable the semisync, and then continue the paused thread.
 
CREATE TABLE t1 (a INT) ENGINE=MyISAM;
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
SET GLOBAL rpl_semi_sync_master_enabled = ON;
--connection master1
SET DEBUG_SYNC = "rpl_semisync_master_commit_trx_before_lock SIGNAL m1_ready WAIT_FOR m1_cont";
--send
INSERT INTO t1 SELECT * FROM t1;
--connection master
SET DEBUG_SYNC= "now WAIT_FOR m1_ready";
SET GLOBAL rpl_semi_sync_master_enabled = OFF;
SET DEBUG_SYNC= "now SIGNAL m1_cont";
--connection master1
--reap
 
connection master;
DROP TABLE t1;
 
disable_warnings;
UNINSTALL PLUGIN rpl_semi_sync_master;
enable_warnings;
 
--source include/rpl_end.inc

Suggested fix:
--- download.orig/5527/plugin/semisync/semisync_master.cc
+++ download/5527/plugin/semisync/semisync_master.cc
@@ -17,6 +17,7 @@
 
 
 #include "semisync_master.h"
+#include "debug_sync.h"
 
 #define TIME_THOUSAND 1000
 #define TIME_MILLION  1000000
@@ -612,6 +613,8 @@ int ReplSemiSyncMaster::commitTrx(const 
 
     set_timespec(start_ts, 0);
 
+    DEBUG_SYNC(current_thd, "rpl_semisync_master_commit_trx_before_lock");
+
     /* Acquire the mutex. */
     lock();
 
@@ -747,13 +750,13 @@ int ReplSemiSyncMaster::commitTrx(const 
       }
     }
 
-  l_end:
     /*
       At this point, the binlog file and position of this transaction
       must have been removed from ActiveTranx.
     */
     assert(!active_tranxs_->is_tranx_end_pos(trx_wait_binlog_name,
                                              trx_wait_binlog_pos));
+  l_end:
     
     /* Update the status counter. */
     if (is_on())
[17 Aug 2012 17:17] Sveta Smirnova
Thank you for the report.

I can not repeat crash. Do you use built-in semisync plugin? Which MySQL package (file name you downloaded) do you use?
[18 Aug 2012 4:50] hui liu
Hi Sveta, maybe you forgot to add the DBUG_SYNC(xxx) in semisync_master.cc, which makes the low frequent hit race condition happen every time with the provided test case.

Please add the DBUG_SYNC in plugin/semisync/semisync_master.cc mentioned before:
+    DEBUG_SYNC(current_thd, "rpl_semisync_master_commit_trx_before_lock");

And try it again. You can not miss it:)

Here is my build cmd:
$cat make.sh 
CFLAGS="-O0 -g"  CXX=gcc CXXFLAGS="-O0 -g  -felide-constructors -fno-exceptions -fno-rtti" cmake -DWITH_INNOBASE_STORAGE_ENGINE=1 -DCMAKE_INSTALL_PREFIX=/u01/mysqld -DWITH_EXTRA_CHARSETS:STRING=all -DDEFAULT_CHARSET=gbk -DDEFAULT_COLLATION=gbk_chinese_ci -DWITH_DEBUG=1

And the semisync is dynamic installed, as described in test suite:
--source include/have_semisync_plugin.inc
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';

This race risk should be existed in all versions, including the latest released version I tested(MySQL-5.5.27.tar.gz).

Any extra info, please let me know.
[18 Aug 2012 13:05] Sveta Smirnova
Thank you for the feedback.

Verified as described.
[18 Dec 2012 12:22] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html
[18 Dec 2012 12:23] Jon Stephens
Fixed in trunk. Documented as follows in the 5.7.1 changelog:

      When the server starts, it checks whether semisynchronous replication has
      been enabled without a lock, and, if so, it takes the lock, then tests
      again. Disabling semisynchronous replication following the first of the
      these tests, but prior to the second one, could lead to a crash of the
      server.

Closed.
[28 Nov 2013 9:49] Shane Bester
wasn't this fixed in 5.6?
[3 Feb 2014 13:38] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

Fixed in 5.6+. Documented in the 5.6.17 and 5.7.4 changelogs as follows:

        The server checks to determine whether semisynchronous
        replication has been enabled without a lock, and if this is the
        case, it takes the lock and checks again. If semisynchronous
        replication was disabled after the first but prior to the second
        one, this could cause the server to fail.
      
Closed.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html
[3 Feb 2014 13:41] Jon Stephens
Disregard the reference to 5.7.4 in my previous comment. The fix for this bug is in 5.6.17 and 5.7.1.

Thanks.
[7 Feb 2014 4:47] hui liu
Yes, 5.7.1 has fixed this issue verified by code review, and 5.6.17 has not yet released.
[28 Mar 2014 19:27] Laurynas Biveinis
5.6$ bzr log -r 5788 -n0
------------------------------------------------------------
revno: 5788
committer: Astha Pareek <astha.pareek@oracle.com>
branch nick: mysql-5.6-b17920923
timestamp: Mon 2014-02-03 12:07:06 +0530
message:
  BUG#17920923 - BACKPORT PATCH FOR BUG#14511533 INTO 5.6
        
  Problem:
  Server crash was observed when code first checked if 
  semisync was enabled without lock, if so it takes the
  lock and checks again. If semisync gets disabled
  in-between the first and second check, an assert
  incorrectly referenced a null pointer for active
  transaction which leads to the crash. 
      
  Solution: 
  The assert is relocated onto a position where
  active_tranxs buffer is valid.