Bug #69369 BGC and slave binlog rotate causes slave sql_thread to stop
Submitted: 31 May 2013 21:45 Modified: 9 Jul 2013 16:43
Reporter: Santosh Praneeth Banda Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.6.12 OS:Any
Assigned to: CPU Architecture:Any
Tags: multi threaded slave, replication

[31 May 2013 21:45] Santosh Praneeth Banda
Description:
In mysql-5.6.12-pre sql_thread crashes with the error when MTS is turned on and slave is rotating it's binlog.

Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756

How to repeat:
The following mtr test reproduces the error consistently

diff --git a/mysql-test/suite/rpl/t/rpl_gtid_mts-master.opt b/mysql-test/suite/rpl/t/rpl_gtid_mts-master.opt
new file mode 100644
index 0000000..8e4c7b4
--- /dev/null
+++ b/mysql-test/suite/rpl/t/rpl_gtid_mts-master.opt
@@ -0,0 +1 @@
+--gtid-mode=on --enforce-gtid-consistency --log-slave-updates
diff --git a/mysql-test/suite/rpl/t/rpl_gtid_mts-slave.opt b/mysql-test/suite/rpl/t/rpl_gtid_mts-slave.opt
new file mode 100644
index 0000000..0ac745e
--- /dev/null
+++ b/mysql-test/suite/rpl/t/rpl_gtid_mts-slave.opt
@@ -0,0 +1 @@
+--gtid-mode=on --enforce-gtid-consistency --log-slave-updates --slave_parallel_workers=10 --max_binlog_size=50000
diff --git a/mysql-test/suite/rpl/t/rpl_gtid_mts.test b/mysql-test/suite/rpl/t/rpl_gtid_mts.test
new file mode 100644
index 0000000..978331c
--- /dev/null
+++ b/mysql-test/suite/rpl/t/rpl_gtid_mts.test
@@ -0,0 +1,7 @@
+-- source include/have_gtid.inc
+-- source include/master-slave.inc
+
+-- connection master
+-- source extra/rpl_tests/rpl_parallel_load.test
+
+-- source include/rpl_end.inc

Suggested fix:
diff --git a/sql/binlog.cc b/sql/binlog.cc
index c2cf854..9685eb1 100644
--- a/sql/binlog.cc
+++ b/sql/binlog.cc
@@ -6938,11 +6938,18 @@ int MYSQL_BIN_LOG::ordered_commit(THD *thd, bool all, bool skip_commit,
     mysql_mutex_lock(&LOCK_log);
     int error= rotate(false, &check_purge);
     mysql_mutex_unlock(&LOCK_log);
-
-    if (!error && check_purge)
+    if (error)
+    {
+      sql_print_error("Failure during 'rotate' in ordered_commit");
+      DBUG_RETURN(error);
+    }
+    if (check_purge)
       purge();
-    else
-      thd->commit_error= THD::CE_COMMIT_ERROR;
   }
[11 Jun 2013 18:09] Andrei Elkin
There's no crash really. The slave applier stops as reported. The reason is
being investigated.
[11 Jun 2013 19:30] Santosh Praneeth Banda
Yes. It's not really a crash, but sql_thread stops on every master rotate and it is a bug
[13 Jun 2013 14:33] Andrei Elkin
Thanks for reporting this issue.
I am adjusting the synopsis to correspond to identified reason.

A patch is being tested.
[9 Jul 2013 16:42] Jon Stephens
Fixed in 5.6+. Documented in the 5.6.13 and 5.7.2 changelogs as follows:

      The condition leading to the issue fixed in Bug #16579083 continued to
      raise an error even though the condition itself no longer cause the issue
      to occur.

Closed.
[9 Jul 2013 16:43] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html